In computer science, an abstract machine is a theoretical model that allows for a detailed and precise analysis of how a computer system functions. It is similar to a mathematical function in that it receives inputs and produces outputs based on predefined rules. Abstract machines vary from literal machines in that they are expected to perform correctly and independently of hardware. Abstract machines are "machines" because they allow step-by-step execution of programmes; they are "abstract" because they ignore many aspects of actual (hardware) machines. A typical abstract machine consists of a definition in terms of input, output, and the set of allowable operations used to turn the former into the latter. They can be used for purely theoretical reasons as well as models for real-world computer systems. In the theory of computation, abstract machines are often used in thought experiments regarding computability or to analyse the complexity of algorithms. This use of abstract machines is fundamental to the field of computational complexity theory, such as finite state machines, Mealy machines, push-down automata, and Turing machines.
Classification
Abstract machines are generally classified into two types, depending on the number of operations they are allowed to undertake at any one time: deterministic abstract machines and non-deterministic abstract machines. A deterministic abstract machine is a system in which a particular beginning state or condition always yields the same outputs. There is no randomness or variation in how inputs are transformed into outputs. In contrast, a non-deterministic abstract machine can provide various outputs for the same input on different executions. Unlike a deterministic algorithm, which gives the same result for the same input regardless of the number of iterations, a non-deterministic algorithm takes various paths to arrive to different outputs. Non-deterministic algorithms are helpful for obtaining approximate answers when deriving a precise solution using a deterministic approach is difficult or costly.
Turing machines, for example, are some of the most fundamental abstract machines in computer science. These machines conduct operations on a tape (a string of symbols) of any length. Their instructions provide for both modifying the symbols and changing the symbol that the machine’s pointer is currently at. For example, a rudimentary Turing machine could have a single command, "convert symbol to 1 then move right", and this machine would only produce a string of 1s. This basic Turing machine is deterministic; however, nondeterministic Turing machines that can execute several actions given the same input may also be built.
Implementation
Any implementation of an abstract machine in the case of physical implementation (in hardware) uses some kind of physical device (mechanical or electronic) to execute the instructions of a programming language. An abstract machine, however, can also be implemented in software or firmware at levels between the abstract machine and underlying physical device.
- Implementation in hardware: The direct implementation of abstract machine in hardware is a matter of using physical devices such as memory, arithmetic and logic circuits, buses, etc., to implement a physical machine whose machine language coincides with the programming language. Once constructed, it would be virtually hard to change such a machine. A CPU may be thought of as a concrete hardware realisation of an abstract machine, particularly the processor's design.
- Simulation using software: Implementing an abstract machine with software entails writing programmes in a different language to implement the data structures and algorithms needed by the abstract machine. This provides the most flexibility since programmes implementing abstract machine constructs can be easily changed. An abstract machine implemented as a software simulation, or for which an interpreter exists, is called a virtual machine.
- Emulation using firmware: Firmware implementation sits between hardware and software implementation. It consists of microcode simulations of data structures and algorithms for abstract machines. Microcode allows a computer programmer to write machine instructions without needing to fabricate electrical circuitry.
Programming language implementation
An abstract machine is, intuitively, just an abstraction of the idea of a physical computer. For actual execution, algorithms must be properly formalised using the constructs offered by a programming language. This implies that the algorithms to be executed must be expressed using programming language instructions. The syntax of a programming language enables the construction of programs using a finite set of constructs known as instructions. Most abstract machines share a program store and a state, which often includes a stack and registers. In digital computers, the stack is simply a memory unit with an address register that can count only positive integers (after an initial value is loaded into it). The address register for the stack is known as a stack pointer because its value always refers to the top item on the stack. The program consists of a series of instructions, with a stack pointer indicating the next instruction to be performed. When the instruction is completed, a stack pointer is advanced. This fundamental control mechanism of an abstract machine is also known as its execution loop. Thus, an abstract machine for a programming language is any collection of data structures and algorithms capable of storing and running programs written in the programming language. It bridges the gap between the high level of a programming language and the low level of an actual machine by providing an intermediate language step for compilation. An abstract machine's instructions are adapted to the unique operations necessary to implement operations of a certain source language or set of source languages.
Imperative languages
In the late 1950s, the Association for Computing Machinery (ACM) and other allied organisations developed many proposals for Universal Computer Oriented Language (UNCOL), such as Conway's machine. The UNCOL concept is good, but it has not been widely used due to the poor performance of the generated code. In many areas of computing, its performance will continue to be an issue despite the development of the Java Virtual Machine in the late 1990s. Algol Object Code (1964), P4-machine (1976), UCSD P-machine (1977), and Forth (1970) are some successful abstract machines of this kind.
Object-oriented languages
Abstract machines for object-oriented programming languages are often stack-based and have special access instructions for object fields and methods. In these machines, memory management is often implicit performed by a garbage collector (memory recovery feature built into programming languages). Smalltalk-80 (1980), Self (1989), and Java (1994) are examples of this implementation.
String processing languages
A string processing language is a computer language that focuses on processing strings rather than numbers. There have been string processing languages in the form of command shells, programming tools, macro processors, and scripting languages for decades. Using a suitable abstract machine has two benefits: increased execution speed and enhanced portability. Snobol4 and ML/I are two notable instances of early string processing languages that use an abstract machine to gain machine independence.
Functional programming languages
The early abstract machines for functional languages, including the SECD machine (1964) and Cardelli's Functional Abstract Machine (1983), defined strict evaluation, also known as eager or call-by-value evaluation, in which function arguments are evaluated before the call and precisely once. Recently, the majority of research has been on lazy (or call-by-need) evaluation, such as the G-machine (1984), Krivine machine (1985), and Three Instruction Machine (1986), in which function arguments are evaluated only if necessary and at most once. One reason is because effective implementation of strict evaluation is now well-understood, therefore the necessity for an abstract machine has diminished.
Logical languages
Predicate calculus (first order logic) is the foundation of logic programming languages. The most well-known logic programming language is Prolog. The rules in Prolog are written in a uniform format known as universally quantified 'Horn clauses', which means to begin the calculation that attempts to discover a proof of the objective. The Warren Abstract Machine WAM (1983), which has become the de facto standard in Prolog program compilation, has been the focus of most study. It provides special purpose instructions such as data unification instructions and control flow instructions to support backtracking (searching algorithm).
Structure
A generic abstract machine is made up of a memory and an interpreter. The memory is used to store data and programs, while the interpreter is the component that executes the instructions included in programs.
The interpreter must carry out the operations that are unique to the language it is interpreting. However, given the variety of languages, it is conceivable to identify categories of operations and an "execution mechanism" shared by all interpreters. The interpreter's operations and accompanying data structures are divided into the following categories:
- Operations for processing primitive data:
- Operations and data structures for controlling the sequence of execution of operations;
- Operations and data structures for controlling data transfers;
- Operations and data structures for memory management.
Processing primitive data
An abstract machine must contain operations for manipulating primitive data types such as strings and integers. For example, integers are nearly universally considered a basic data type for both physical abstract machines and the abstract machines used by many programming languages. The machine carries out the arithmetic operations necessary, such as addition and multiplication, within a single time step.
Sequence control
Operations and structures for "sequence control" allow controlling the execution flow of program instructions. When certain conditions are met, it is necessary to change the typical sequential execution of a program. Therefore, the interpreter employs data structures (such as those used to store the address of the next instruction to execute) that are modified by operations distinct from those used for data manipulation (for example, operations to update the address of the next instruction to execute).
Controlling data transfers
Data transfer operations are used to control how operands and data are transported from memory to the interpreter and vice versa. These operations deal with the store and the retrieval order of operands from the store.
Memory management
Memory management is concerned with the operations performed in memory to allocate data and applications. In the abstract machine, data and programmes can be held indefinitely, or in the case of programming languages, memory can be allocated or deallocated using a more complex mechanism.
Hierarchies
Abstract machine hierarchies are often employed, in which each machine uses the functionality of the level immediately below and adds additional functionality of its own to meet the level immediately above. A hardware computer, constructed with physical electronic devices, can be added at the most basic level. Above this level, the abstract microprogrammed machine level may be introduced. The abstract machine supplied by the operating system, which is implemented by a program written in machine language, is located immediately above (or directly above the hardware if the firmware level is not there). On the one hand, the operating system extends the capability of the physical machine by providing higher-level primitives that are not available on the physical machine (for example, primitives that act on files). The host machine is formed by the abstract machine given by the operating system, on which a high-level programming language is implemented using an intermediary machine, such as the Java Virtual machine and its byte code language. The level given by the abstract machine for the high-level language (for example, Java) is not usually the final level of hierarchy. At this point, one or more applications that deliver additional services together may be introduced. A "web machine" level, for example, can be added to implement the functionalities necessary to handle Web communications (communications protocols or HTML code presentation). The "Web Service" level is located above this, and it provides the functionalities necessary to make web services communicate, both in terms of interaction protocols and the behaviour of the processes involved. At this level, entirely new languages that specify the behaviour of so-called "business processes" based on Web services may be developed (an example is the Business Process Execution Language). Finally, a specialised application can be found at the highest level (for example, E-commerce) which has very specific and limited functionality.