Search This Blog

Wednesday, May 27, 2020

Runtime system

From Wikipedia, the free encyclopedia

In computer programming, a runtime system, also called runtime environment, primarily implements portions of an execution model. This is not to be confused with the runtime lifecycle phase of a program, during which the runtime system is in operation.

Most programming languages have some form of runtime system that provides an environment in which programs run. This environment may address a number of issues including the management of application memory, how the program accesses variables, mechanisms for passing parameters between procedures, interfacing with the operating system, and otherwise. The compiler makes assumptions depending on the specific runtime system to generate correct code. Typically the runtime system will have some responsibility for setting up and managing the stack and heap, and may include features such as garbage collection, threads or other dynamic features built into the language.

Overview

Every programming language specifies an execution model, and many implement at least part of that model in a runtime system. One possible definition of runtime system behavior is, among others, any behavior not directly attributable to the program itself. This definition includes, as part of the runtime system, things such as putting parameters onto the stack before a function call, the behavior of disk I/O, and parallel execution of related behaviors.

By this definition, essentially every language has a runtime system, including compiled languages, interpreted languages, and embedded domain-specific languages. Even API invoked stand alone execution models such as Pthreads (POSIX threads) have a runtime system that is the implementation of execution model's behavior.

Most scholarly papers on runtime systems focus on the implementation details of parallel runtime systems. A notable example of a parallel runtime system is that of Cilk, a popular parallel programming model. In addition, the proto-runtime toolkit was created to simplify the creation of parallel runtime systems.

In addition to the execution model behavior, a runtime system may also perform support services such as type checking, debugging, or code generation and optimization.

The runtime system is also the gateway by which a running program interacts with the runtime environment, which contains not only state values that are accessible during program execution, but also active entities that can be interacted with during program execution like disk drives and people via keyboards. For example, environment variables are features of many operating systems, and are part of the runtime environment; a running program can access them via the runtime system. Likewise, hardware devices such as a DVD drive are active entities that a program can interact with via a runtime system.

A unique application of a runtime environment (RTE) is within an operating system (OS) that only allows that RTE to run, meaning from boot until power-down the entire OS is dedicated to only the application(s) running within that RTE. Any other code that tries to run or any failures in the application(s) break the RTE which breaks the OS which stops all processing and requires a re-boot. If the boot is from read-only memory, an extremely secure, simple, single-mission system is created.
Examples for such kind of directly bundled runtime systems include:
  • Between 1983 and 1984, Digital Research offered several of their business and educations applications for the IBM PC on bootable floppy diskettes bundled with SpeedStart CP/M-86, a reduced version of CP/M-86 as runtime environment.
  • Some stand-alone versions of Ventura Publisher (1986-1993), Artline (1988-1991), Timeworks Publisher (1988-1991) and ViewMAX (1990-1992) contained special runtime versions of Digital Research's GEM as their runtime environment.
  • In the late 1990s JP Software's command line processor 4DOS was optionally available in a special runtime version to be linked with BATCOMP pre-compiled and encrypted batch jobs in order to create unmodifyable executables from batch scripts and run them on systems without 4DOS installed.

Examples

As a simple example of a basic runtime system, the runtime system of the C language is a particular set of instructions inserted into the executable image by the compiler. Among other things, these instructions manage the processor stack, create space for local variables, and copy function-call parameters onto the top of the stack. There are often no clear criteria for deciding which language behavior is considered inside the runtime system versus which behavior is part of the source program. For C, the setup of the stack is part of the runtime system, as opposed to part of the semantics of an individual program, because it maintains a global invariant that holds over all executions. This systematic behavior implements the execution model of the language, as opposed to implementing semantics of the particular program text which is directly translated into code that computes results.

One way to observe this separation between the semantics of a particular program and the runtime environment is to compile a program into an object file containing all the functions versus compiling an entire program to an executable binary. The object file will only contain assembly code relevant to those functions, while the executable binary will contain additional code used to implement the runtime environment. The object file, on one hand, may be missing information from the runtime environment that will be resolved by linking. On the other hand, the code in the object file still depends on assumptions in the runtime system; for example, a function may read parameters from a particular register or stack location, depending on the calling convention used by the runtime environment.

Another example is the case of using an application programming interface (API) to interact with a runtime system. The calls to that API look the same as calls to a regular software library, however at some point during the call the execution model changes. The runtime system implements an execution model different from that of the language the library is written in terms of. A person reading the code of a normal library would be able to understand the library's behavior by just knowing the language the library was written in. However, a person reading the code of the API that invokes a runtime system would not be able to understand the behavior of the API call just by knowing the language the call was written in. At some point, via some mechanism, the execution model stops being that of the language the call is written in and switches over to being the execution model implemented by the runtime system. For example, the trap instruction is one method of switching execution models. This difference is what distinguishes an API-invoked execution model, such as Pthreads, from a usual software library. Both Pthreads calls and software library calls are invoked via an API, but Pthreads behavior cannot be understood in terms of the language of the call. Rather, Pthreads calls bring into play an outside execution model, which is implemented by the Pthreads runtime system (this runtime system is often the OS kernel).

As an extreme example, the physical CPU itself can be viewed as an implementation of the runtime system of a specific assembly language. In this view, the execution model is implemented by the physical CPU and memory systems. As an analogy, runtime systems for higher-level languages are themselves implemented using some other languages. This creates a hierarchy of runtime systems, with the CPU itself—or actually its logic at the microcode layer or below—acting as the lowest-level runtime system.

Advanced features

Some compiled or interpreted languages provide an interface that allows application code to interact directly with the runtime system. An example is the Thread class in the Java language. The class allows code (that is animated by one thread) to do things such as start and stop other threads. Normally, core aspects of a language's behavior such as task scheduling and resource management are not accessible in this fashion. 

Higher-level behaviors implemented by a runtime system may include tasks such as drawing text on the screen or making an Internet connection. It is often the case that operating systems provide these kinds of behaviors as well, and when available, the runtime system is implemented as an abstraction layer that translates the invocation of the runtime system into an invocation of the operating system. This hides the complexity or variations in the services offered by different operating systems. This also implies that the OS kernel can itself be viewed as a runtime system, and that the set of OS calls that invoke OS behaviors may be viewed as interactions with a runtime system. 

In the limit, the runtime system may provide services such as a P-code machine or virtual machine, that hide even the processor's instruction set. This is the approach followed by many interpreted languages such as AWK, and some languages like Java, which are meant to be compiled into some machine-independent intermediate representation code (such as bytecode). This arrangement simplifies the task of language implementation and its adaptation to different machines, and improves efficiency of sophisticated language features such as reflection. It also allows the same program to be executed on any machine without an explicit recompiling step, a feature that has become very important since the proliferation of the World Wide Web. To speed up execution, some runtime systems feature just-in-time compilation to machine code. 

A modern aspect of runtime systems is parallel execution behaviors, such as the behaviors exhibited by mutex constructs in Pthreads and parallel section constructs in OpenMP. A runtime system with such parallel execution behaviors may be modularized according to the proto-runtime approach.

History

Notable early examples of runtime systems are the interpreters for BASIC and Lisp. These environments also included a garbage collector. Forth is an early example of a language designed to be compiled into intermediate representation code; its runtime system was a virtual machine that interpreted that code. Another popular, if theoretical, example is Donald Knuth's MIX computer. 

In C and later languages that supported dynamic memory allocation, the runtime system also included a library that managed the program's memory pool.

In the object-oriented programming languages, the runtime system was often also responsible for dynamic type checking and resolving method references.

Abstract data type

From Wikipedia, the free encyclopedia

In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a user, of the data, specifically in terms of possible values, possible operations on data of this type, and the behavior of these operations. This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user.

Formally, an ADT may be defined as a "class of objects whose logical behavior is defined by a set of values and a set of operations"; this is analogous to an algebraic structure in mathematics. What is meant by "behavior" varies by author, with the two main types of formal specifications for behavior being axiomatic (algebraic) specification and an abstract model; these correspond to axiomatic semantics and operational semantics of an abstract machine, respectively. Some authors also include the computational complexity ("cost"), both in terms of time (for computing operations) and space (for representing values). In practice many common data types are not ADTs, as the abstraction is not perfect, and users must be aware of issues like arithmetic overflow that are due to the representation. For example, integers are often stored as fixed-width values (32-bit or 64-bit binary numbers), and thus experience integer overflow if the maximum value is exceeded.

ADTs are a theoretical concept, in computer science, used in the design and analysis of algorithms, data structures, and software systems, and do not correspond to specific features of computer languages—mainstream computer languages do not directly support formally specified ADTs. However, various language features correspond to certain aspects of ADTs, and are easily confused with ADTs proper; these include abstract types, opaque data types, protocols, and design by contract. ADTs were first proposed by Barbara Liskov and Stephen N. Zilles in 1974, as part of the development of the CLU language.

Examples

For example, integers are an ADT, defined as the values ..., −2, −1, 0, 1, 2, ..., and by the operations of addition, subtraction, multiplication, and division, together with greater than, less than, etc., which behave according to familiar mathematics (with care for integer division), independently of how the integers are represented by the computer. Explicitly, "behavior" includes obeying various axioms (associativity and commutativity of addition, etc.), and preconditions on operations (cannot divide by zero). Typically integers are represented in a data structure as binary numbers, most often as two's complement, but might be binary-coded decimal or in ones' complement, but the user is abstracted from the concrete choice of representation, and can simply use the data as data types.

An ADT consists not only of operations but also of values of the underlying data and of constraints on the operations. An "interface" typically refers only to the operations, and perhaps some of the constraints on the operations, notably pre-conditions and post-conditions, but not other constraints, such as relations between the operations.

For example, an abstract stack, which is a last-in-first-out structure, could be defined by three operations: push, that inserts a data item onto the stack; pop, that removes a data item from it; and peek or top, that accesses a data item on top of the stack without removal. An abstract queue, which is a first-in-first-out structure, would also have three operations: enqueue, that inserts a data item into the queue; dequeue, that removes the first data item from it; and front, that accesses and serves the first data item in the queue. There would be no way of differentiating these two data types, unless a mathematical constraint is introduced that for a stack specifies that each pop always returns the most recently pushed item that has not been popped yet. When analyzing the efficiency of algorithms that use stacks, one may also specify that all operations take the same time no matter how many data items have been pushed into the stack, and that the stack uses a constant amount of storage for each element.

Introduction

Abstract data types are purely theoretical entities, used (among other things) to simplify the description of abstract algorithms, to classify and evaluate data structures, and to formally describe the type systems of programming languages. However, an ADT may be implemented by specific data types or data structures, in many ways and in many programming languages; or described in a formal specification language. ADTs are often implemented as modules: the module's interface declares procedures that correspond to the ADT operations, sometimes with comments that describe the constraints. This information hiding strategy allows the implementation of the module to be changed without disturbing the client programs. 

The term abstract data type can also be regarded as a generalized approach of a number of algebraic structures, such as lattices, groups, and rings. The notion of abstract data types is related to the concept of data abstraction, important in object-oriented programming and design by contract methodologies for software development.

Defining an abstract data type

An abstract data type is defined as a mathematical model of the data objects that make up a data type as well as the functions that operate on these objects. There are no standard conventions for defining them. A broad division may be drawn between "imperative" and "functional" definition styles.

Imperative-style definition

In the philosophy of imperative programming languages, an abstract data structure is conceived as an entity that is mutable—meaning that it may be in different states at different times. Some operations may change the state of the ADT; therefore, the order in which operations are evaluated is important, and the same operation on the same entities may have different effects if executed at different times—just like the instructions of a computer, or the commands and procedures of an imperative language. To underscore this view, it is customary to say that the operations are executed or applied, rather than evaluated. The imperative style is often used when describing abstract algorithms. (See The Art of Computer Programming by Donald Knuth for more details)

Abstract variable

Imperative-style definitions of ADT often depend on the concept of an abstract variable, which may be regarded as the simplest non-trivial ADT. An abstract variable V is a mutable entity that admits two operations:
  • store(V, x) where x is a value of unspecified nature;
  • fetch(V), that yields a value,
with the constraint that
  • fetch(V) always returns the value x used in the most recent store(V, x) operation on the same variable V.
As in so many programming languages, the operation store(V, x) is often written Vx (or some similar notation), and fetch(V) is implied whenever a variable V is used in a context where a value is required. Thus, for example, VV + 1 is commonly understood to be a shorthand for store(V,fetch(V) + 1).




In this definition, it is implicitly assumed that storing a value into a variable U has no effect on the state of a distinct variable V. To make this assumption explicit, one could add the constraint that

  • if U and V are distinct variables, the sequence { store(U, x); store(V, y) } is equivalent to { store(V, y); store(U, x) }.
More generally, ADT definitions often assume that any operation that changes the state of one ADT instance has no effect on the state of any other instance (including other instances of the same ADT) — unless the ADT axioms imply that the two instances are connected (aliased) in that sense. For example, when extending the definition of abstract variable to include abstract records, the operation that selects a field from a record variable R must yield a variable V that is aliased to that part of R.

The definition of an abstract variable V may also restrict the stored values x to members of a specific set X, called the range or type of V. As in programming languages, such restrictions may simplify the description and analysis of algorithms, and improve their readability.

Note that this definition does not imply anything about the result of evaluating fetch(V) when V is un-initialized, that is, before performing any store operation on V. An algorithm that does so is usually considered invalid, because its effect is not defined. (However, there are some important algorithms whose efficiency strongly depends on the assumption that such a fetch is legal, and returns some arbitrary value in the variable's range.)

Instance creation

Some algorithms need to create new instances of some ADT (such as new variables, or new stacks). To describe such algorithms, one usually includes in the ADT definition a create() operation that yields an instance of the ADT, usually with axioms equivalent to
  • the result of create() is distinct from any instance in use by the algorithm.
This axiom may be strengthened to exclude also partial aliasing with other instances. On the other hand, this axiom still allows implementations of create() to yield a previously created instance that has become inaccessible to the program.

Example: abstract stack (imperative)

As another example, an imperative-style definition of an abstract stack could specify that the state of a stack S can be modified only by the operations
  • push(S, x), where x is some value of unspecified nature;
  • pop(S), that yields a value as a result,
with the constraint that
  • For any value x and any abstract variable V, the sequence of operations { push(S, x); Vpop(S) } is equivalent to Vx.
Since the assignment Vx, by definition, cannot change the state of S, this condition implies that Vpop(S) restores S to the state it had before the push(S, x). From this condition and from the properties of abstract variables, it follows, for example, that the sequence
{ push(S, x); push(S, y); Upop(S); push(S, z); Vpop(S); Wpop(S) }
where x, y, and z are any values, and U, V, W are pairwise distinct variables, is equivalent to
{ Uy; Vz; Wx }
Here it is implicitly assumed that operations on a stack instance do not modify the state of any other ADT instance, including other stacks; that is,
  • For any values x, y, and any distinct stacks S and T, the sequence { push(S, x); push(T, y) } is equivalent to { push(T, y); push(S, x) }.
An abstract stack definition usually includes also a Boolean-valued function empty(S) and a create() operation that returns a stack instance, with axioms equivalent to
  • create() ≠ S for any prior stack S (a newly created stack is distinct from all previous stacks);
  • empty(create()) (a newly created stack is empty);
  • not empty(push(S, x)) (pushing something into a stack makes it non-empty).

Single-instance style

Sometimes an ADT is defined as if only one instance of it existed during the execution of the algorithm, and all operations were applied to that instance, which is not explicitly notated. For example, the abstract stack above could have been defined with operations push(x) and pop(), that operate on the only existing stack. ADT definitions in this style can be easily rewritten to admit multiple coexisting instances of the ADT, by adding an explicit instance parameter (like S in the previous example) to every operation that uses or modifies the implicit instance.

On the other hand, some ADTs cannot be meaningfully defined without assuming multiple instances. This is the case when a single operation takes two distinct instances of the ADT as parameters. For an example, consider augmenting the definition of the abstract stack with an operation compare(S, T) that checks whether the stacks S and T contain the same items in the same order.

Functional-style definition

Another way to define an ADT, closer to the spirit of functional programming, is to consider each state of the structure as a separate entity. In this view, any operation that modifies the ADT is modeled as a mathematical function that takes the old state as an argument, and returns the new state as part of the result. Unlike the imperative operations, these functions have no side effects. Therefore, the order in which they are evaluated is immaterial, and the same operation applied to the same arguments (including the same input states) will always return the same results (and output states).

In the functional view, in particular, there is no way (or need) to define an "abstract variable" with the semantics of imperative variables (namely, with fetch and store operations). Instead of storing values into variables, one passes them as arguments to functions.

Example: abstract stack (functional)

For example, a complete functional-style definition of an abstract stack could use the three operations:
  • push: takes a stack state and an arbitrary value, returns a stack state;
  • top: takes a stack state, returns a value;
  • pop: takes a stack state, returns a stack state.
In a functional-style definition there is no need for a create operation. Indeed, there is no notion of "stack instance". The stack states can be thought of as being potential states of a single stack structure, and two stack states that contain the same values in the same order are considered to be identical states. This view actually mirrors the behavior of some concrete implementations, such as linked lists with hash cons.

Instead of create(), a functional-style definition of an abstract stack may assume the existence of a special stack state, the empty stack, designated by a special symbol like Λ or "()"; or define a bottom() operation that takes no arguments and returns this special stack state. Note that the axioms imply that
  • push(Λ, x) ≠ Λ.
In a functional-style definition of a stack one does not need an empty predicate: instead, one can test whether a stack is empty by testing whether it is equal to Λ. 

Note that these axioms do not define the effect of top(s) or pop(s), unless s is a stack state returned by a push. Since push leaves the stack non-empty, those two operations are undefined (hence invalid) when s = Λ. On the other hand, the axioms (and the lack of side effects) imply that push(s, x) = push(t, y) if and only if x = y and s = t.

As in some other branches of mathematics, it is customary to assume also that the stack states are only those whose existence can be proved from the axioms in a finite number of steps. In the abstract stack example above, this rule means that every stack is a finite sequence of values, that becomes the empty stack (Λ) after a finite number of pops. By themselves, the axioms above do not exclude the existence of infinite stacks (that can be poped forever, each time yielding a different state) or circular stacks (that return to the same state after a finite number of pops). In particular, they do not exclude states s such that pop(s) = s or push(s, x) = s for some x. However, since one cannot obtain such stack states with the given operations, they are assumed "not to exist".

Whether to include complexity

Aside from the behavior in terms of axioms, it is also possible to include, in the definition of an ADT operation, their algorithmic complexity. Alexander Stepanov, designer of the C++ Standard Template Library, included complexity guarantees in the STL specification, arguing:
The reason for introducing the notion of abstract data types was to allow interchangeable software modules. You cannot have interchangeable modules unless these modules share similar complexity behavior. If I replace one module with another module with the same functional behavior but with different complexity tradeoffs, the user of this code will be unpleasantly surprised. I could tell him anything I like about data abstraction, and he still would not want to use the code. Complexity assertions have to be part of the interface.
— Alexander Stepanov

Advantages of abstract data typing

Encapsulation

Abstraction provides a promise that any implementation of the ADT has certain properties and abilities; knowing these is all that is required to make use of an ADT object. The user does not need any technical knowledge of how the implementation works to use the ADT. In this way, the implementation may be complex but will be encapsulated in a simple interface when it is actually used. 

Localization of change

Code that uses an ADT object will not need to be edited if the implementation of the ADT is changed. Since any changes to the implementation must still comply with the interface, and since code using an ADT object may only refer to properties and abilities specified in the interface, changes may be made to the implementation without requiring any changes in code where the ADT is used.

Flexibility

Different implementations of the ADT, having all the same properties and abilities, are equivalent and may be used somewhat interchangeably in code that uses the ADT. This gives a great deal of flexibility when using ADT objects in different situations. For example, different implementations of the ADT may be more efficient in different situations; it is possible to use each in the situation where they are preferable, thus increasing overall efficiency.

Typical operations

Some operations that are often specified for ADTs (possibly under other names) are
  • compare(s, t), that tests whether two instances' states are equivalent in some sense;
  • hash(s), that computes some standard hash function from the instance's state;
  • print(s) or show(s), that produces a human-readable representation of the instance's state.
In imperative-style ADT definitions, one often finds also
  • create(), that yields a new instance of the ADT;
  • initialize(s), that prepares a newly created instance s for further operations, or resets it to some "initial state";
  • copy(s, t), that puts instance s in a state equivalent to that of t;
  • clone(t), that performs screate(), copy(s, t), and returns s;
  • free(s) or destroy(s), that reclaims the memory and other resources used by s.
The free operation is not normally relevant or meaningful, since ADTs are theoretical entities that do not "use memory". However, it may be necessary when one needs to analyze the storage used by an algorithm that uses the ADT. In that case one needs additional axioms that specify how much memory each ADT instance uses, as a function of its state, and how much of it is returned to the pool by free.

High-level programming language

From Wikipedia, the free encyclopedia
 
In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language elements, be easier to use, or may automate (or even hide entirely) significant areas of computing systems (e.g. memory management), making the process of developing a program simpler and more understandable than when using a lower-level language. The amount of abstraction provided defines how "high-level" a programming language is.

In the 1960s, high-level programming languages using a compiler were commonly called autocodes. Examples of autocodes are COBOL and Fortran.

The first high-level programming language designed for computers was Plankalkül, created by Konrad Zuse. However, it was not implemented in his time, and his original contributions were largely isolated from other developments due to World War II, aside from the language's influence on the "Superplan" language by Heinz Rutishauser and also to some degree Algol. The first significantly widespread high-level language was Fortran, a machine-independent development of IBM's earlier Autocode systems. Algol, defined in 1958 and 1960 by committees of European and American computer scientists, introduced recursion as well as nested functions under lexical scope. It was also the first language with a clear distinction between value and name-parameters and their corresponding semantics. Algol also introduced several structured programming concepts, such as the while-do and if-then-else constructs and its syntax was the first to be described in formal notation – "Backus–Naur form" (BNF). During roughly the same period, Cobol introduced records (also called structs) and Lisp introduced a fully general lambda abstraction in a programming language for the first time.

Features

"High-level language" refers to the higher level of abstraction from machine language. Rather than dealing with registers, memory addresses and call stacks, high-level languages deal with variables, arrays, objects, complex arithmetic or boolean expressions, subroutines and functions, loops, threads, locks, and other abstract computer science concepts, with a focus on usability over optimal program efficiency. Unlike low-level assembly languages, high-level languages have few, if any, language elements that translate directly into a machine's native opcodes. Other features, such as string handling routines, object-oriented language features, and file input/output, may also be present. One thing to note about high-level programming languages is that these languages allow the programmer to be detached and separated from the machine. That is, unlike low-level languages like assembly or machine language, high-level programming can amplify the programmer's instructions and trigger a lot of data movements in the background without their knowledge. The responsibility and power of executing instructions have been handed over to the machine from the programmer.

Abstraction penalty

High-level languages intend to provide features which standardize common tasks, permit rich debugging, and maintain architectural agnosticism; while low-level languages often produce more efficient code through optimization for a specific system architecture. Abstraction penalty is the cost that high-level programming techniques pay for being unable to optimize performance or use certain hardware because they don't take advantage of certain low-level architectural resources. High-level programming exhibits features like more generic data structures and operations, run-time interpretation, and intermediate code files; which often result in execution of far more operations than necessary, higher memory consumption, and larger binary program size. For this reason, code which needs to run particularly quickly and efficiently may require the use of a lower-level language, even if a higher-level language would make the coding easier. In many cases, critical portions of a program mostly in a high-level language can be hand-coded in assembly language, leading to a much faster, more efficient, or simply reliably functioning optimised program.

However, with the growing complexity of modern microprocessor architectures, well-designed compilers for high-level languages frequently produce code comparable in efficiency to what most low-level programmers can produce by hand, and the higher abstraction may allow for more powerful techniques providing better overall results than their low-level counterparts in particular settings.[9] High-level languages are designed independent of a specific computing system architecture. This facilitates executing a program written in such a language on any computing system with compatible support for the Interpreted or JIT program. High-level languages can be improved as their designers develop improvements. In other cases, new high-level languages evolve from one or more others with the goal of aggregating the most popular constructs with new or improved features. An example of this is Scala which maintains backward compatibility with Java which means that programs and libraries written in Java will continue to be usable even if a programming shop switches to Scala; this makes the transition easier and the lifespan of such high-level coding indefinite. In contrast, low-level programs rarely survive beyond the system architecture which they were written for without major revision. This is the engineering 'trade-off' for the 'Abstraction Penalty'.

Relative meaning

Examples of high-level programming languages in active use today include Python, Visual Basic, Delphi, Perl, PHP, ECMAScript, Ruby, C#, Java and many others.

The terms high-level and low-level are inherently relative. Some decades ago, the C language, and similar languages, were most often considered "high-level", as it supported concepts such as expression evaluation, parameterised recursive functions, and data types and structures, while assembly language was considered "low-level". Today, many programmers might refer to C as low-level, as it lacks a large runtime-system (no garbage collection, etc.), basically supports only scalar operations, and provides direct memory addressing. It, therefore, readily blends with assembly language and the machine level of CPUs and microcontrollers.

Assembly language may itself be regarded as a higher level (but often still one-to-one if used without macros) representation of machine code, as it supports concepts such as constants and (limited) expressions, sometimes even variables, procedures, and data structures. Machine code, in its turn, is inherently at a slightly higher level than the microcode or micro-operations used internally in many processors.

Execution modes

There are three general modes of execution for modern high-level languages:
Interpreted
When code written in a language is interpreted, its syntax is read and then executed directly, with no compilation stage. A program called an interpreter reads each program statement, following the program flow, then decides what to do, and does it. A hybrid of an interpreter and a compiler will compile the statement into machine code and execute that; the machine code is then discarded, to be interpreted anew if the line is executed again. Interpreters are commonly the simplest implementations of the behavior of a language, compared to the other two variants listed here.
Compiled
When code written in a language is compiled, its syntax is transformed into an executable form before running. There are two types of compilation:
Machine code generation
Some compilers compile source code directly into machine code. This is the original mode of compilation, and languages that are directly and completely transformed to machine-native code in this way may be called truly compiled languages. See assembly language.
Intermediate representations
When code written in a language is compiled to an intermediate representation, that representation can be optimized or saved for later execution without the need to re-read the source file. When the intermediate representation is saved, it may be in a form such as bytecode. The intermediate representation must then be interpreted or further compiled to execute it. Virtual machines that execute bytecode directly or transform it further into machine code have blurred the once clear distinction between intermediate representations and truly compiled languages.
Source-to-source translated or transcompiled
Code written in a language may be translated into terms of a lower-level language for which native code compilers are already common. JavaScript and the language C are common targets for such translators. See CoffeeScript, Chicken Scheme, and Eiffel as examples. Specifically, the generated C and C++ code can be seen (as generated from the Eiffel language when using the EiffelStudio IDE) in the EIFGENs directory of any compiled Eiffel project. In Eiffel, the translated process is referred to as transcompiling or transcompiled, and the Eiffel compiler as a transcompiler or source-to-source compiler.
Note that languages are not strictly interpreted languages or compiled languages. Rather, implementations of language behavior use interpreting or compiling. For example, ALGOL 60 and Fortran have both been interpreted (even though they were more typically compiled). Similarly, Java shows the difficulty of trying to apply these labels to languages, rather than to implementations; Java is compiled to bytecode which is then executed by either interpreting (in a Java virtual machine (JVM)) or compiling (typically with a just-in-time compiler such as HotSpot, again in a JVM). Moreover, compiling, transcompiling, and interpreting are not strictly limited to only a description of the compiler artifact (binary executable or IL assembly).

High-level language computer architecture

Alternatively, it is possible for a high-level language to be directly implemented by a computer – the computer directly executes the HLL code. This is known as a high-level language computer architecture – the computer architecture itself is designed to be targeted by a specific high-level language. The Burroughs large systems were target machines for ALGOL 60, for example.

Software

From Wikipedia, the free encyclopedia

A diagram showing how the user interacts with application software on a typical desktop computer. The application software layer interfaces with the operating system, which in turn communicates with the hardware. The arrows indicate information flow.

Computer software, or simply software, is a collection of data or computer instructions that tell the computer how to work. This is in contrast to physical hardware, from which the system is built and actually performs the work. In computer science and software engineering, computer software is all information processed by computer systems, programs and data. Computer software includes computer programs, libraries and related non-executable data, such as online documentation or digital media. Computer hardware and software require each other and neither can be realistically used on its own.

At the lowest programming level, executable code consists of machine language instructions supported by an individual processor—typically a central processing unit (CPU) or a graphics processing unit (GPU). A machine language consists of groups of binary values signifying processor instructions that change the state of the computer from its preceding state. For example, an instruction may change the value stored in a particular storage location in the computer—an effect that is not directly observable to the user. An instruction may also invoke one of many input or output operations, for example displaying some text on a computer screen; causing state changes which should be visible to the user. The processor executes the instructions in the order they are provided, unless it is instructed to "jump" to a different instruction, or is interrupted by the operating system. As of 2015, most personal computers, smartphone devices and servers have processors with multiple execution units or multiple processors performing computation together, and computing has become a much more concurrent activity than in the past.

The majority of software is written in high-level programming languages. They are easier and more efficient for programmers because they are closer to natural languages than machine languages. High-level languages are translated into machine language using a compiler or an interpreter or a combination of the two. Software may also be written in a low-level assembly language, which has strong correspondence to the computer's machine language instructions and is translated into machine language using an assembler.

History

An outline (algorithm) for what would have been the first piece of software was written by Ada Lovelace in the 19th century, for the planned Analytical Engine. She created proofs to show how the engine would calculate Bernoulli Numbers. Because of the proofs and the algorithm, she is considered the first computer programmer.

The first theory about software—prior to the creation of computers as we know them today—was proposed by Alan Turing in his 1935 essay On Computable Numbers, with an Application to the Entscheidungsproblem (decision problem).

This eventually led to the creation of the academic fields of computer science and software engineering; Both fields study software and its creation. Computer science is the theoretical study of computer and software (Turing's essay is an example of computer science), whereas software engineering is the application of engineering and development of software.

However, prior to 1946, software was not yet the programs stored in the memory of stored-program digital computers, as we now understand it. The first electronic computing devices were instead rewired in order to "reprogram" them.

In 2000, Fred Shapiro, a librarian at the Yale Law School, published a letter revealing that John Wilder Tukey's 1958 paper "The Teaching of Concrete Mathematics" contained the earliest known usage of the term "software" found in a search of JSTOR's electronic archives, predating the OED's citation by two years. This led many to credit Tukey with coining the term, particularly in obituaries published that same year, although Tukey never claimed credit for any such coinage. In 1995, Paul Niquette claimed he had originally coined the term in October 1953, although he could not find any documents supporting his claim. The earliest known publication of the term "software" in an engineering context was in August 1953 by Richard R. Carhart, in a Rand Corporation Research Memorandum.

Types


On virtually all computer platforms, software can be grouped into a few broad categories.

Purpose, or domain of use

Based on the goal, computer software can be divided into:
  • Application software
    which is software that uses the computer system to perform special functions or provide entertainment functions beyond the basic operation of the computer itself. There are many different types of application software, because the range of tasks that can be performed with a modern computer is so large.
  • System software
    which is software for managing computer hardware behaviour, as to provide basic functionalities that are required by users, or for other software to run properly, if at all. System software is also designed for providing a platform for running application software, and it includes the following:
    • Operating systems
      which are essential collections of software that manage resources and provide common services for other software that runs "on top" of them. Supervisory programs, boot loaders, shells and window systems are core parts of operating systems. In practice, an operating system comes bundled with additional software (including application software) so that a user can potentially do some work with a computer that only has one operating system.
    • Device drivers
      which operate or control a particular type of device that is attached to a computer. Each device needs at least one corresponding device driver; because a computer typically has at minimum at least one input device and at least one output device, a computer typically needs more than one device driver.
    • Utilities
      which are computer programs designed to assist users in the maintenance and care of their computers.
  • Malicious software or malware
    which is software that is developed to harm and disrupt computers. As such, malware is undesirable. Malware is closely associated with computer-related crimes, though some malicious programs may have been designed as practical jokes.

Nature or domain of execution

  • Desktop applications such as web browsers and Microsoft Office, as well as smartphone and tablet applications (called "apps"). (There is a push in some parts of the software industry to merge desktop applications with mobile apps, to some extent. Windows 8, and later Ubuntu Touch, tried to allow the same style of application user interface to be used on desktops, laptops and mobiles.)
  •  
  • JavaScript scripts are pieces of software traditionally embedded in web pages that are run directly inside the web browser when a web page is loaded without the need for a web browser plugin. Software written in other programming languages can also be run within the web browser if the software is either translated into JavaScript, or if a web browser plugin that supports that language is installed; the most common example of the latter is ActionScript scripts, which are supported by the Adobe Flash plugin.
  • Server software, including:
  • Plugins and extensions are software that extends or modifies the functionality of another piece of software, and require that software be used in order to function;
  • Embedded software resides as firmware within embedded systems, devices dedicated to a single use or a few uses such as cars and televisions (although some embedded devices such as wireless chipsets can themselves be part of an ordinary, non-embedded computer system such as a PC or smartphone). In the embedded system context there is sometimes no clear distinction between the system software and the application software. However, some embedded systems run embedded operating systems, and these systems do retain the distinction between system software and application software (although typically there will only be one, fixed application which is always run).
  • Microcode is a special, relatively obscure type of embedded software which tells the processor itself how to execute machine code, so it is actually a lower level than machine code. It is typically proprietary to the processor manufacturer, and any necessary correctional microcode software updates are supplied by them to users (which is much cheaper than shipping replacement processor hardware). Thus an ordinary programmer would not expect to ever have to deal with it.

Programming tools

Programming tools are also software in the form of programs or applications that software developers (also known as programmers, coders, hackers or software engineers) use to create, debug, maintain (i.e. improve or fix), or otherwise support software.

Software is written in one or more programming languages; there are many programming languages in existence, and each has at least one implementation, each of which consists of its own set of programming tools. These tools may be relatively self-contained programs such as compilers, debuggers, interpreters, linkers, and text editors, that can be combined together to accomplish a task; or they may form an integrated development environment (IDE), which combines much or all of the functionality of such self-contained tools. IDEs may do this by either invoking the relevant individual tools or by re-implementing their functionality in a new way. An IDE can make it easier to do specific tasks, such as searching in files in a particular project. Many programming language implementations provide the option of using both individual tools or an IDE.

Topics

Architecture

Users often see things differently from programmers. People who use modern general purpose computers (as opposed to embedded systems, analog computers and supercomputers) usually see three layers of software performing a variety of tasks: platform, application, and user software.
  • Platform software
    The Platform includes the firmware, device drivers, an operating system, and typically a graphical user interface which, in total, allow a user to interact with the computer and its peripherals (associated equipment). Platform software often comes bundled with the computer. On a PC one will usually have the ability to change the platform software.
  • Application software
    Application software or Applications are what most people think of when they think of software. Typical examples include office suites and video games. Application software is often purchased separately from computer hardware. Sometimes applications are bundled with the computer, but that does not change the fact that they run as independent applications. Applications are usually independent programs from the operating system, though they are often tailored for specific platforms. Most users think of compilers, databases, and other "system software" as applications.
  • User-written software
    End-user development tailors systems to meet users' specific needs. User software includes spreadsheet templates and word processor templates. Even email filters are a kind of user software. Users create this software themselves and often overlook how important it is. Depending on how competently the user-written software has been integrated into default application packages, many users may not be aware of the distinction between the original packages, and what has been added by co-workers.

Execution

Computer software has to be "loaded" into the computer's storage (such as the hard drive or memory). Once the software has loaded, the computer is able to execute the software. This involves passing instructions from the application software, through the system software, to the hardware which ultimately receives the instruction as machine code. Each instruction causes the computer to carry out an operation—moving data, carrying out a computation, or altering the control flow of instructions.

Data movement is typically from one place in memory to another. Sometimes it involves moving data between memory and registers which enable high-speed data access in the CPU. Moving data, especially large amounts of it, can be costly. So, this is sometimes avoided by using "pointers" to data instead. Computations include simple operations such as incrementing the value of a variable data element. More complex computations may involve many operations and data elements together.

Quality and reliability

Software quality is very important, especially for commercial and system software like Microsoft Office, Microsoft Windows and Linux. If software is faulty (buggy), it can delete a person's work, crash the computer and do other unexpected things. Faults and errors are called "bugs" which are often discovered during alpha and beta testing. Software is often also a victim to what is known as software aging, the progressive performance degradation resulting from a combination of unseen bugs. 

Many bugs are discovered and eliminated (debugged) through software testing. However, software testing rarely—if ever—eliminates every bug; some programmers say that "every program has at least one more bug" (Lubarsky's Law). In the waterfall method of software development, separate testing teams are typically employed, but in newer approaches, collectively termed agile software development, developers often do all their own testing, and demonstrate the software to users/clients regularly to obtain feedback. Software can be tested through unit testing, regression testing and other methods, which are done manually, or most commonly, automatically, since the amount of code to be tested can be quite large. For instance, NASA has extremely rigorous software testing procedures for many operating systems and communication functions. Many NASA-based operations interact and identify each other through command programs. This enables many people who work at NASA to check and evaluate functional systems overall. Programs containing command software enable hardware engineering and system operations to function much easier together.

License

The software's license gives the user the right to use the software in the licensed environment, and in the case of free software licenses, also grants other rights such as the right to make copies. 

Proprietary software can be divided into two types:
  • freeware, which includes the category of "free trial" software or "freemium" software (in the past, the term shareware was often used for free trial/freemium software). As the name suggests, freeware can be used for free, although in the case of free trials or freemium software, this is sometimes only true for a limited period of time or with limited functionality.
  • software available for a fee, often inaccurately termed "commercial software", which can only be legally used on purchase of a license.
Open-source software, on the other hand, comes with a free software license, granting the recipient the rights to modify and redistribute the software.

Patents

Software patents, like other types of patents, are theoretically supposed to give an inventor an exclusive, time-limited license for a detailed idea (e.g. an algorithm) on how to implement a piece of software, or a component of a piece of software. Ideas for useful things that software could do, and user requirements, are not supposed to be patentable, and concrete implementations (i.e. the actual software packages implementing the patent) are not supposed to be patentable either—the latter are already covered by copyright, generally automatically. So software patents are supposed to cover the middle area, between requirements and concrete implementation. In some countries, a requirement for the claimed invention to have an effect on the physical world may also be part of the requirements for a software patent to be held valid—although since all useful software has effects on the physical world, this requirement may be open to debate. Meanwhile, American copyright law was applied to various aspects of the writing of the software code.

Software patents are controversial in the software industry with many people holding different views about them. One of the sources of controversy is that the aforementioned split between initial ideas and patent does not seem to be honored in practice by patent lawyers—for example the patent for Aspect-Oriented Programming (AOP), which purported to claim rights over any programming tool implementing the idea of AOP, howsoever implemented. Another source of controversy is the effect on innovation, with many distinguished experts and companies arguing that software is such a fast-moving field that software patents merely create vast additional litigation costs and risks, and actually retard innovation. In the case of debates about software patents outside the United States, the argument has been made that large American corporations and patent lawyers are likely to be the primary beneficiaries of allowing or continue to allow software patents.

Design and implementation

Design and implementation of software varies depending on the complexity of the software. For instance, the design and creation of Microsoft Word took much more time than designing and developing Microsoft Notepad because the latter has much more basic functionality.

Software is usually designed and created (aka coded/written/programmed) in integrated development environments (IDE) like Eclipse, IntelliJ and Microsoft Visual Studio that can simplify the process and compile the software (if applicable). As noted in a different section, software is usually created on top of existing software and the application programming interface (API) that the underlying software provides like GTK+, JavaBeans or Swing. Libraries (APIs) can be categorized by their purpose. For instance, the Spring Framework is used for implementing enterprise applications, the Windows Forms library is used for designing graphical user interface (GUI) applications like Microsoft Word, and Windows Communication Foundation is used for designing web services. When a program is designed, it relies upon the API. For instance, a Microsoft Windows desktop application might call API functions in the .NET Windows Forms library like Form1.Close() and Form1.Show() to close or open the application. Without these APIs, the programmer needs to write these functionalities entirely themselves. Companies like Oracle and Microsoft provide their own APIs so that many applications are written using their software libraries that usually have numerous APIs in them.

Data structures such as hash tables, arrays, and binary trees, and algorithms such as quicksort, can be useful for creating software. 

Computer software has special economic characteristics that make its design, creation, and distribution different from most other economic goods.

A person who creates software is called a programmer, software engineer or software developer, terms that all have a similar meaning. More informal terms for programmer also exist such as "coder" and "hacker" – although use of the latter word may cause confusion, because it is more often used to mean someone who illegally breaks into computer systems.

Industry and organizations

A great variety of software companies and programmers in the world comprise a software industry. Software can be quite a profitable industry: Bill Gates, the co-founder of Microsoft was the richest person in the world in 2009, largely due to his ownership of a significant number of shares in Microsoft, the company responsible for Microsoft Windows and Microsoft Office software products - both market leaders in their respective product categories.

Non-profit software organizations include the Free Software Foundation, GNU Project and the Mozilla Foundation. Software standard organizations like the W3C, IETF develop recommended software standards such as XML, HTTP and HTML, so that software can interoperate through these standards. 

Other well-known large software companies include Google, IBM, TCS, Infosys, Wipro, HCL Technologies, Oracle, Novell, SAP, Symantec, Adobe Systems, Sidetrade and Corel, while small companies often provide innovation.

Cooperative

From Wikipedia, the free encyclopedia ...