Search This Blog

Friday, April 24, 2026

P-code machine

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/P-code_machine

In computer programming, a P-code machine (portable code machine) is a virtual machine designed to execute P-code, the assembly language or machine code of a hypothetical central processing unit (CPU). The term P-code machine is applied generically to all such machines (such as the Java virtual machine (JVM) and MATLAB pre-compiled code), as well as specific implementations using those machines. One of the most notable uses of P-Code machines is the P-Machine of the Pascal-P system. The developers of the UCSD Pascal implementation within this system construed the P in P-code to mean pseudo more often than portable; they adopted a unique label for pseudo-code meaning instructions for a pseudo-machine.

Although the concept was first implemented circa 1966 as O-code for the Basic Combined Programming Language (BCPL) and P code for the language Euler, the term P-code first appeared in the early 1970s. Two early compilers generating P-code were the Pascal-P compiler in 1973, by Kesav V. Nori, Urs Ammann, Kathleen Jensen, Hans-Heinrich Nägeli, and Christian Jacobi, and the Pascal-S compiler in 1975, by Niklaus Wirth.

Programs that have been translated to P-code can either be interpreted by a software program that emulates the behaviour of the hypothetical CPU, or translated into the machine code of the CPU on which the program is to run and then executed. If there is sufficient commercial interest, a hardware implementation of the CPU specification may be built (e.g., the Pascal MicroEngine or a version of a Java processor).

P-code versus machine code

While a typical compiler model is aimed at translating a program code into machine code, the idea of a P-code machine follows a two-stage approach involving translation into P-code and execution by interpreting or just-in-time compilation (JIT) through the P-code machine.

This separation makes it possible to detach the development of a P-code interpreter from the underlying machine code compiler, which has to consider machine-dependent behaviour in generating its bytecode. This way a P-code interpreter can also be implemented quicker, and the ability to interpret the code at runtime allows for additional run-time checks which might not be similarly available in native code. Further, as P-code is based on an ideal virtual machine, a P-code program can often be smaller than the same program translated to machine code. Conversely, the two-step interpretation of a P-code-based program leads to a slower execution speed, though this can sometimes be addressed with just-in-time compilation, and its simpler structure is easier to reverse-engineer than native code.

Implementations of P-code

In the early 1980s, at least two operating systems achieved machine independence through extensive use of P-code . The Business Operating System (BOS) was a cross-platform operating system designed to run P-code programs exclusively. The UCSD p-System, developed at The University of California, San Diego, was a self-compiling and self-hosting operating system based on P-code optimized for generation by the Pascal language.

In the 1990s, translation into p-code became a popular strategy for implementations of languages such as Python, Microsoft P-Code in Visual Basic and Java bytecode in Java.

The language Go uses a generic, portable assembly as a form of p-code, implemented by Ken Thompson as an extension of the work on Plan 9 from Bell Labs. Unlike Common Language Runtime (CLR) bytecode or JVM bytecode, there is no stable specification and the Go build tools do not emit a bytecode format to be used at a later time. The Go assembler uses the generic assembly language as an intermediate representation and the Go executables are machine-specific statically linked binaries.

UCSD P-Machine

Architecture

Like many other P-code machines, the UCSD P-Machine is a stack machine, which means that most instructions take their operands from a stack, and place results back on the stack. Thus, the add instruction replaces the two topmost elements of the stack with their sum. A few instructions take an immediate argument. Like Pascal, the P-code is strongly typed, supporting Boolean (b), character (c), integer (i), real (r), set (s), and pointer (a) data types natively.

Some simple instructions:

Insn.   Stack   Stack   Description
        before  after
 
adi     i1 i2   i1+i2   add two integers
adr     r1 r2   r1+r2   add two reals
inn     i1 s1   b1      set membership; b1 = whether i1 is a member of s1
ldi     i1 i1   i1      load integer constant
mov     a1 a2   a2      move
not     b1 b1   -b1     Boolean negation

Environment

Similar to a real target CPU, the P-System has only one stack shared by procedure stack frames (providing return address, etc.) and the arguments to local instructions. Three of the machine's registers point into the stack (which grows upwards):

  • SP points to the top of the stack (the stack pointer).
  • MP marks the beginning of the active stack frame (the mark pointer).
  • EP points to the highest stack location used in the current procedure (the extreme pointer).

Also present is a constant area, and, below that, the heap growing down towards the stack. The NP (the new pointer) register points to the top (lowest used address) of the heap. When EP gets greater than NP, the machine's memory is exhausted.

The fifth register, PC, points at the current instruction in the code area.

Calling conventions

Stack frames look like this:

EP ->
      local stack
SP -> ...
      locals
      ...
      parameters
      ...
      return address (previous PC)
      previous EP
      dynamic link (previous MP)
      static link (MP of surrounding procedure)
MP -> function return value

The procedure calling sequence works as follows: The call is introduced with

 mst n

where n specifies the difference in nesting levels (remember that Pascal supports nested procedures). This instruction will mark the stack, i.e. reserve the first five cells of the above stack frame, and initialize previous EP, dynamic, and static link. The caller then computes and pushes any parameters for the procedure, and then issues

 cup n, p

to call a user procedure (n being the number of parameters, p the procedure's address). This will save the PC in the return address cell, and set the procedure's address as the new PC.

User procedures begin with the two instructions

 ent 1, i
 ent 2, j

The first sets SP to MP + i, the second sets EP to SP + j. So i essentially specifies the space reserved for locals (plus the number of parameters plus 5), and j gives the number of entries needed locally for the stack. Memory exhaustion is checked at this point.

Returning to the caller is accomplished via

 retC

with C giving the return type (i, r, c, b, a as above, and p for no return value). The return value has to be stored in the appropriate cell previously. On all types except p, returning will leave this value on the stack.

Instead of calling a user procedure (cup), standard procedure q can be called with

 csp q

These standard procedures are Pascal procedures like readln() (csp rln), sin() (csp sin), etc. Peculiarly eof() is a p-Code instruction instead.

Example machine

Niklaus Wirth specified a simple p-code machine in the 1976 book Algorithms + Data Structures = Programs. The machine had 3 registers - a program counter p, a base register b and a top-of-stack register t. There were 8 instructions:

  1. lit 0, a : load constant a
  2. opr 0, a : execute operation a (13 operations: RETURN, 5 mathematical functions, and 7 comparison functions)
  3. lod l, a : load variable l, a
  4. sto l, a : store variable l, a
  5. cal l, a : call procedure a at level l
  6. int 0, a : increment t-register by a
  7. jmp 0, a : jump to a
  8. jpc 0, a : jump conditional to a

This is the code for the machine, written in Pascal:

const
 amax=2047;      {maximum address}
 levmax=3;       {maximum depth of block nesting}
 cxmax=200;      {size of code array}

type
 fct=(lit,opr,lod,sto,cal,int,jmp,jpc);
 instruction=packed record
  f:fct;
  l:0..levmax;
  a:0..amax;
 end;

var
 code: array [0..cxmax] of instruction;

procedure interpret;

  const stacksize = 500;

  var
    p, b, t: integer; {program-, base-, topstack-registers}
    i: instruction; {instruction register}
    s: array [1..stacksize] of integer; {datastore}

  function base(l: integer): integer;
    var b1: integer;
  begin
    b1 := b; {find base l levels down}
    while l > 0 do begin
      b1 := s[b1];
      l := l - 1
    end;
    base := b1
  end {base};

begin
  writeln(' start pl/0');
  t := 0; b := 1; p := 0;
  s[1] := 0; s[2] := 0; s[3] := 0;
  repeat
    i := code[p]; p := p + 1;
    with i do
      case f of
        lit: begin t := t + 1; s[t] := a end;
        opr:
          case a of {operator}
            0:
              begin {return}
                t := b - 1; p := s[t + 3]; b := s[t + 2];
              end;
            1: s[t] := -s[t];
            2: begin t := t - 1; s[t] := s[t] + s[t + 1] end;
            3: begin t := t - 1; s[t] := s[t] - s[t + 1] end;
            4: begin t := t - 1; s[t] := s[t] * s[t + 1] end;
            5: begin t := t - 1; s[t] := s[t] div s[t + 1] end;
            6: s[t] := ord(odd(s[t]));
            8: begin t := t - 1; s[t] := ord(s[t] = s[t + 1]) end;
            9: begin t := t - 1; s[t] := ord(s[t] <> s[t + 1]) end;
            10: begin t := t - 1; s[t] := ord(s[t] < s[t + 1]) end;
            11: begin t := t - 1; s[t] := ord(s[t] >= s[t + 1]) end;
            12: begin t := t - 1; s[t] := ord(s[t] > s[t + 1]) end;
            13: begin t := t - 1; s[t] := ord(s[t] <= s[t + 1]) end;
          end;
        lod: begin t := t + 1; s[t] := s[base(l) + a] end;
        sto: begin s[base(l)+a] := s[t]; writeln(s[t]); t := t - 1 end;
        cal:
          begin {generate new block mark}
            s[t + 1] := base(l); s[t + 2] := b; s[t + 3] := p;
            b := t + 1; p := a
          end;
        int: t := t + a;
        jmp: p := a;
        jpc: begin if s[t] = 0 then p := a; t := t - 1 end
      end {with, case}
  until p = 0;
  writeln(' end pl/0');
end {interpret};

This machine was used to run Wirth's PL/0, a Pascal subset compiler used to teach compiler development.

Microsoft P-code

As the goal of the company was to release software for all the major platforms and architectures that existed then. Between 1980 and 1982, Microsoft developed an early C compiler that produced P-Code (the C language itself was not standardized and wouldn't be until later in the 80s). The P-Code allowed software to run on most platforms with minimal code change. UCSD Pascal was using a similar approach. This C to P-Code was a success but was very slow. In 1983, Microsoft released the Microsoft C Compiler, MSC, based on a license of the Lattice C compiler for versions 1.0 and 2.0; then, from version 3.0 onward, the MSC was a complete rewrite by Microsoft.

P-code is a name later used by some of Microsoft's intermediate languages. They provided an alternate binary format to machine code. At various times, Microsoft has said P-code is an abbreviation for either packed code or pseudo code.

Some Microsoft P-code flavor, quite different from the one used by the C compiler, was widely used with Visual Basic which had a runtime that included a VM or could be directly compiled to native code. Like other P-code implementations, Microsoft P-code enabled a more compact executable at the expense of slower execution.

Machine code

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Machine_code
Machine language monitor running on a W65C816S microprocessor, displaying code disassembly and dumps of processor register and memory

In computing, machine code is data encoded and structured to control a computer's central processing unit (CPU) via its programmable interface. A computer program consists primarily of sequences of machine-code instructions. Machine code is classified as native with respect to its host CPU since it is the language that the CPU interprets directly. Some software interpreters translate the programming language that they interpret into a virtual machine code (bytecode) and process it with a P-code machine.

A machine-code instruction causes the CPU to perform a specific task such as:

An instruction set architecture (ISA) defines the interface to a CPU and varies by groupings or families of CPU design such as x86 and ARM. Generally, machine code compatible with one family is not with others, but there are exceptions. The VAX architecture includes optional support of the PDP-11 instruction set. The IA-64 architecture includes optional support of the IA-32 instruction set. And, the PowerPC 615 can natively process both PowerPC and x86 instructions.

Assembly language

Translation of assembly into machine code

Assembly language provides a relatively direct mapping from a human-readable source code to machine code. The assembly language source code represents numerical codes in machine code, as mnemonics and labels. For example, NOP in assembly for an x86 processor represents the x86 architecture opcode 0x90 in machine code. While it is possible to write a program in machine code, doing so is tedious and error-prone. Therefore, programs are usually written in assembly or, more commonly, in a high-level programming language.

Instruction set

A machine instruction encodes an operation as a pattern of bits based on the specified format for the machine's instruction set.

Instruction sets differ in various ways. Instructions of a set might all be the same length or different instructions might have different lengths; they might be smaller than, the same size as, or larger than the word size of the architecture. The number of instructions may be relatively small or large. Instructions may or may not have to be aligned on particular memory boundaries, such as the architecture's word boundary.

An instruction set needs to execute the circuits of a computer's digital logic level. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components. To control a computer's architectural features, machine instructions are created. Examples of features that are controlled using machine instructions:

The criteria for instruction formats include:

  • Instructions most commonly used should be shorter than instructions rarely used.
  • The memory transfer rate of the underlying hardware determines the flexibility of the memory fetch instructions.
  • The number of bits in the address field requires special consideration.

Determining the size of the address field is a choice between space and speed. On some computers, the number of bits in the address field may be too small to access all of the physical memory. Also, virtual address space needs to be considered. Another constraint may be a limitation on the size of registers used to construct the address. Whereas a shorter address field allows the instructions to execute more quickly, other physical properties need to be considered when designing the instruction format.

Instructions can be separated into two types: general-purpose and special-purpose. Special-purpose instructions exploit architectural features that are unique to a computer. General-purpose instructions control architectural features common to all computers.

General-purpose instructions control:

  • Data movement from one place to another
  • Monadic operations that have one operand to produce a result
  • Dyadic operations that have two operands to produce a result
  • Comparisons and conditional jumps
  • Procedure calls
  • Loop control
  • Input/output

Overlapping instruction

On processor architectures with variable-length instruction sets (such as Intel's x86 processor family) it is, within the limits of the control-flow resynchronizing phenomenon known as the Kruskal count, sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences. These are called overlapping instructions, overlapping opcodes, overlapping code, overlapped code, instruction scission, or jump into the middle of an instruction.

In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example was in the implementation of error tables in Microsoft's Altair BASIC, where interleaved instructions mutually shared their instruction bytes. The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on the byte level such as in the implementation of boot loaders which have to fit into boot sectors.

It is also sometimes used as a code obfuscation technique as a measure against disassembly and tampering.

The principle is also used in shared code sequences of fat binaries which must run on multiple instruction-set-incompatible processor platforms.

This property is also used to find unintended instructions called gadgets in existing code repositories and is used in return-oriented programming as alternative to code injection for exploits such as return-to-libc attacks.

Microcode

In some computers, the machine code of the architecture is implemented by an even more fundamental underlying layer called microcode, providing a common machine language interface across a line or family of different models of computer with widely different underlying dataflows. This is done to facilitate porting of machine language programs between different models. An example of this use is the IBM System/360 family of computers and their successors.

Examples

IBM 709x

The IBM 704, 709, 704x and 709x store one instruction in each instruction word; IBM numbers the bit from the left as S, 1, ..., 35. Most instructions have one of two formats:

Generic
S,1-11
12-13 Flag, ignored in some instructions
14-17 unused
18-20 Tag
21-35 Y
Index register control, other than TSX
S,1-2 Opcode
3-17 Decrement
18-20 Tag
21-35 Y

For all but the IBM 7094 and 7094 II, there are three index registers designated A, B and C; indexing with multiple 1 bits in the tag subtracts the logical or of the selected index registers and loading with multiple 1 bits in the tag loads all of the selected index registers. The 7094 and 7094 II have seven index registers, but when they are powered on they are in multiple tag mode, in which they use only the three of the index registers in a fashion compatible with earlier machines, and require a Leave Multiple Tag Mode (LMTM) instruction in order to access the other four index registers.

The effective address is normally Y-C(T), where C(T) is either 0 for a tag of 0, the logical or of the selected index registers in multiple tag mode or the selected index register if not in multiple tag mode. However, the effective address for index register control instructions is just Y.

A flag with both bits 1 selects indirect addressing; the indirect address word has both a tag and a Y field.

In addition to transfer (branch) instructions, these machines have skip instruction that conditionally skip one or two words, e.g., Compare Accumulator with Storage (CAS) does a three way compare and conditionally skips to NSI, NSI+1 or NSI+2, depending on the result.

MIPS

The MIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long. The general type of instruction is given by the op (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by op. R-type (register) instructions include an additional funct (function) field to determine the exact operation. The fields used in these types are:

   6      5     5     5     5      6 bits
[  op  |  rs |  rt |  rd |shamt| funct]  R-type
[  op  |  rs |  rt | address/immediate]  I-type
[  op  |        target address        ]  J-type

rs, rt, and rd indicate register operands; shamt gives a shift amount; and the address or immediate fields contain an operand directly.

For example, adding the registers 1 and 2 and placing the result in register 6 is encoded:

[  op  |  rs |  rt |  rd |shamt| funct]
    0     1     2     6     0     32     decimal
 000000 00001 00010 00110 00000 100000   binary

Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3:

[  op  |  rs |  rt | address/immediate]
   35     3     8           68           decimal
 100011 00011 01000 00000 00001 000100   binary

Jumping to the address 1024:

[  op  |        target address        ]
    2                 1024               decimal
 000010 00000 00000 00000 10000 000000   binary

Bytecode

Machine code is similar to yet fundamentally different from bytecode. Like machine code, bytecode is typically generated (i.e. by a compiler) from source code. But, unlike machine code, bytecode is not directly executable by a CPU. An exception is if a processor is designed to use bytecode as its machine code, such as the Pascal MicroEngine or a Java processor. If bytecode is processed by an software interpreter, then that interpreter is a virtual machine for which the bytecode is its machine code.

Storage

During execution, machine code is generally stored in RAM although running from ROM is supported by some devices. Regardless, the code may also be cached in more specialized memory to enhance performance. There may be different caches for instructions and data, depending on the architecture.

From the point of view of a process, the machine code lives in code space, a designated part of its address space. In a multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of context switching considerably as compared to process switching.

Readability

Machine code is generally considered to be not human readable, with Douglas Hofstadter comparing it to examining the atoms of a DNA molecule. However, various tools and methods support understanding machine code.

Disassembly decodes machine code to assembly language which is possible since assembly instructions can often be mapped one-to-one to machine instructions.

A decompiler converts machine code to a high-level language, but the result can be relatively obfuscated (hard to understand).

A program can be associated with debug symbols (either embedded in the native executable or in a separate file) that allow it to be mapped to external source code. A debugger reads the symbols to help a programmer interactively debug the program. Examples include:

Library (computing)

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Library_(computing)
Illustration of an application which uses libvorbisfile to play an Ogg Vorbis file

In computing, a library is a collection of resources that can be used during software development to implement a computer program. Commonly, a library consists of executable code such as compiled functions and classes, or a library can be a collection of source code. A resource library may contain data such as images and text.

A library can be used by multiple, independent consumers (programs and other libraries). This differs from resources defined in a program which can usually only be used by that program. When a consumer uses a library resource, it gains the value of the library without having to implement it itself. Libraries encourage software reuse in a modular fashion. Libraries can use other libraries resulting in a hierarchy of libraries in a program.

When writing code that uses a library, a programmer only needs to know how to use that library's application programming interface (API) without understanding its internal mechanics. For example, a program could use a library that abstracts a complicated system call so that the programmer can use the system feature without spending time to learn the intricacies of the system function.

History

The idea of a computer library dates back to the first computers created by Charles Babbage. An 1888 paper on his Analytical Engine suggested that computer operations could be punched on separate cards from numerical input. If these operation punch cards were saved for reuse then "by degrees the engine would have a library of its own."

A woman working next to a filing cabinet containing the subroutine library on reels of punched tape for the EDSAC computer

In 1947 Goldstine and von Neumann speculated that it would be useful to create a "library" of subroutines for their work on the IAS machine, an early computer that was not yet operational at that time. They envisioned a physical library of magnetic wire recordings, with each wire storing reusable computer code.

Inspired by von Neumann, Wilkes and his team constructed EDSAC. A filing cabinet of punched tape held the subroutine library for this computer. Programs for EDSAC consisted of a main program and a sequence of subroutines copied from the subroutine library. In 1951 the team published the first textbook on programming, The Preparation of Programs for an Electronic Digital Computer, which detailed the creation and the purpose of the library.

COBOL included "primitive capabilities for a library system" in 1959, but Jean Sammet described them as "inadequate library facilities" in retrospect.

JOVIAL has a Communication Pool (COMPOOL), roughly a library of header files.

Another major contributor to the modern library concept came in the form of the subprogram innovation of FORTRAN. FORTRAN subprograms can be compiled independently of each other, but the compiler lacked a linker. So prior to the introduction of modules in Fortran-90, type checking between FORTRAN subprograms was impossible.

By the mid 1960s, copy and macro libraries for assemblers were common. Starting with the popularity of the IBM System/360, libraries containing other types of text elements, e.g., system parameters, also became common.

In IBM's OS/360 and its successors this is called a partitioned data set.

The first object-oriented programming language, Simula, developed in 1965, supported adding classes to libraries via its compiler.

Linking

The linking (or binding) process resolves references known as symbols (or links) by searching for them in various locations including configured libraries. If a linker (or binder) does not find a symbol, then it fails, but multiple matches may or may not cause failure.

Static linking is linking at build time, such that the library executable code is included in the program. Dynamic linking is linking at run time; it involves building the program with information that supports run-time linking to a dynamic link library (DLL). For dynamic linking, a compatible DLL file must be available to the program at run time, but for static linking, the program is standalone.

Smart linking is performed by a build tool that excludes unused code in the linking process. For example, a program that only uses integers for arithmetic, or does no arithmetic operations at all, can exclude floating-point library routines. This can lead to smaller program file size and reduced memory usage.

Relocation

Some references in a program or library module are stored in a relative or symbolic form which cannot be resolved until all code and libraries are assigned final static addresses. Relocation is the process of adjusting these references, and is done either by the linker or the loader. In general, relocation cannot be done to individual libraries themselves because the addresses in memory may vary depending on the program using them and other libraries they are combined with. Position-independent code avoids references to absolute addresses and therefore does not require relocation.

Categories

Executable

An executable library consists of code that has been converted from source code into machine code or an intermediate form such as bytecode. A linker allows for using library objects by associating each reference with an address at which the object is located. For example, in C, a library function is invoked via C's normal function call syntax and semantics.

A variant is a library containing compiled code (object code in IBM's nomenclature) in a form that cannot be loaded by the OS but that can be read by the linker.

Static

A static library is an executable library that is linked into a program at build-time by a linker (or whatever the build tool is called that does linking). This process, and the resulting stand-alone file, is known as a static build of the program. A static build may not need any further relocation if virtual memory is used and no address space layout randomization is desired.

A static library is sometimes called an archive on Unix-like systems.

Dynamic

A dynamic library is linked when the program is run – either at load-time or runtime. The dynamic library was intended after the static library to support additional software deployment flexibility.

Sources

A source library consists of source code, not compiled code.

Shared

A shared library is a library that contains executable code designed to be used by multiple computer programs or other libraries at runtime, with only one copy of that code in memory, shared by all programs using the code.

Object

Although generally an obsolete technology today, an object library exposes resources for object-oriented programming (OOP) and a distributed object is a remote object library. Examples include: COM/DCOM, SOM/DSOM, DOE, PDO and various CORBA-based systems.

The object library technology was developed since as OOP became popular, it became apparent that OOP runtime binding required information than contemporary libraries did not provide. In addition to the names and entry points of the code located within, due to inheritance, OOP binding also requires a list of dependencies – since the full definition of a method may be in different places. Further, this requires more than listing that one library requires the services of another. In OOP, the libraries themselves may not be known at compile time, and vary from system to system.

The remote object technology was developed in parallel to support multi-tier programs with a user interface application running on a personal computer (PC) using services of a mainframe or minicomputer such as data storage and processing. For instance, a program on a PC would send messages to a minicomputer via remote procedure call (RPC) to retrieve relatively small samples from a relatively large dataset. In response, distributed object technology was developed.

Class

A class library contains classes that can be used to create objects. In Java, for example, classes are contained in JAR files and objects are created at runtime from the classes. However, in Smalltalk, a class library is the starting point for a system image that includes the entire state of the environment, classes and all instantiated objects. Most class libraries are stored in a package repository (such as Maven Central for Java). Client code explicitly specifies dependencies to external libraries in build configuration files (such as a Maven Pom in Java).

Remote

A remote library runs on another computer and its assets are accessed via remote procedure call (RPC) over a network. This distributed architecture allows for minimizing installation of the library and support for it on each consuming system and ensuring consistent versioning. A significant downside is that each library call entails significantly more overhead than for a local library.

Runtime

A runtime library provides access to the runtime environment that is available to a program – tailored to the host platform.

Language standard

Many modern programming languages specify a standard library that provides a base level of functionality for the language environment.

Code generation

A code generation library has a high-level API generating or transforming byte code for Java. They are used by aspect-oriented programming, some data access frameworks, and for testing to generate dynamic proxy objects. They also are used to intercept field access.

File naming

Unix-like

On most modern Unix-like systems, library files are stored in directories such as /lib, /usr/lib and /usr/local/lib. A filename typically starts with lib, and ends with .a for a static library (archive) or .so for a shared object (dynamically linked library). For example, libfoo.a and libfoo.so.

Often, symbolic link files are used to manage versioning of a library by providing a link file named without a version that links to a file named with a version. For example, libfoo.so.2 might be version 2 of library foo and a link file named libfoo.so provides a version independent name to that file that programs link to. The link file could be changed to a refer to a version 3 (libfoo.so.3) such that consuming programs will then use version 3 without having to change the program.

Files with the extension .la are libtool archives; they are not usable by the system.

macOS

The macOS system inherits static library conventions from BSD, with the library stored in a .a file. It uses either .so or .dylib for dynamic libraries. Most libraries in macOS, however, consist of "frameworks", placed inside special directories called "bundles" which wrap the library's required files and metadata. For example, a framework called Abc would be implemented in a bundle called Abc.framework, with Abc.framework/Abc being either the dynamically linked library file or a symlink to the dynamically linked library file in Abc.framework/Versions/Current/Abc.

Windows

Often, a Windows dynamic-link library (DLL) has the file extension .dll, although sometimes different extensions are used to indicate general content, e.g. .ocx for a OLE library.

A .lib file can be either a static library or contain the information needed to build an application that consumes the associated DLL. In the latter case, the associated DLL file must be present at runtime.

Inheritance (object-oriented programming)

From Wikipedia, the free encyclopedia

In object-oriented programming, inheritance is the mechanism of basing an object or class upon another object (prototype-based inheritance) or class (class-based inheritance), retaining similar implementation. It is also defined as deriving new classes (sub classes) from existing ones such as super class or base class and then forming them into a hierarchy of classes. In most class-based object-oriented languages like C++, an object created through inheritance, a "child object", acquires all the properties and behaviors of the "parent object", with the exception of: constructors, destructors, overloaded operators and friend functions of the base class. Inheritance allows programmers to create classes that are built upon existing classes, to specify a new implementation while maintaining the same behaviors (realizing an interface), to reuse code and to independently extend original software via public classes and interfaces. The relationships of objects or classes through inheritance give rise to a directed acyclic graph.

An inherited class is called a subclass of its parent class or super class. The term inheritance is loosely used for both class-based and prototype-based programming, but in narrow use the term is reserved for class-based programming (one class inherits from another), with the corresponding technique in prototype-based programming being instead called delegation (one object delegates to another). Class-modifying inheritance patterns can be pre-defined according to simple network interface parameters such that inter-language compatibility is preserved.

Inheritance should not be confused with subtyping. In some languages, generally statically-typed class-based OO languages, like C++, C#, Java, and Scala, inheritance and subtyping agree, whereas in others they differ. In general, subtyping establishes an is-a relationship, whereas inheritance only reuses implementation and establishes a syntactic relationship, not necessarily a semantic relationship (inheritance does not ensure behavioral subtyping). To distinguish these concepts, subtyping is sometimes referred to as interface inheritance (without acknowledging that the specialization of type variables also induces a subtyping relation), whereas inheritance as defined here is known as implementation inheritance or code inheritance. Still, inheritance is a commonly used mechanism for establishing subtype relationships.

Inheritance is contrasted with object composition, where one object contains another object (or objects of one class contain objects of another class); see composition over inheritance. In contrast to subtyping’s is-a relationship, composition implements a has-a relationship.

Mathematically speaking, inheritance in any system of classes induces a strict partial order on the set of classes in that system.

History

In 1966, Tony Hoare presented some remarks on records, and in particular, the idea of record subclasses, record types with common properties but discriminated by a variant tag and having fields private to the variant. Influenced by this, in 1967 Ole-Johan Dahl and Kristen Nygaard presented a design that allowed specifying objects that belonged to different classes but had common properties. The common properties were collected in a superclass, and each superclass could itself potentially have a superclass. The values of a subclass were thus compound objects, consisting of some number of prefix parts belonging to various superclasses, plus a main part belonging to the subclass. These parts were all concatenated together. The attributes of a compound object would be accessible by dot notation. This idea was first adopted in the Simula 67 programming language. The idea then spread to Smalltalk, C++, Java, Python, and many other languages.

Types

Single inheritance
Multiple inheritance

There are various types of inheritance, based on paradigm and specific language.

Single inheritance
where subclasses inherit the features of one superclass. A class acquires the properties of another class.
Multiple inheritance
where one class can have more than one superclass and inherit features from all parent classes.

"Multiple inheritance ... was widely supposed to be very difficult to implement efficiently. For example, in a summary of C++ in his book on Objective C, Brad Cox actually claimed that adding multiple inheritance to C++ was impossible. Thus, multiple inheritance seemed more of a challenge. Since I had considered multiple inheritance as early as 1982 and found a simple and efficient implementation technique in 1984, I couldn't resist the challenge. I suspect this to be the only case in which fashion affected the sequence of events."

Multilevel inheritance
where a subclass is inherited from another subclass. It is not uncommon that a class is derived from another derived class as shown in the figure "Multilevel inheritance".
Multilevel inheritance
The class A serves as a base class for the derived class B, which in turn serves as a base class for the derived class C. The class B is known as intermediate base class because it provides a link for the inheritance between A and C. The chain ABC is known as inheritance path.
A derived class with multilevel inheritance is declared as follows:
// C++ language implementation
class A { ... };      // Base class
class B : public A { ... };   // B derived from A
class C : public B { ... };   // C derived from B
This process can be extended to any number of levels.
Hierarchical inheritance
This is where one class serves as a superclass (base class) for more than one sub class. For example, a parent class, A, can have two subclasses B and C. Both B and C's parent class is A, but B and C are two separate subclasses.
Hybrid inheritance
Hybrid inheritance is when a mix of two or more of the above types of inheritance occurs. An example of this is when a class A has a subclass B which has two subclasses, C and D. This is a mixture of both multilevel inheritance and hierarchal inheritance.

Subclasses and superclasses

Subclasses, derived classes, heir classes, or child classes are modular derivative classes that inherit one or more language entities from one or more other classes (called superclass, base classes, or parent classes). The semantics of class inheritance vary from language to language, but commonly the subclass automatically inherits the instance variables and member functions of its superclasses.

In C++, the general form of defining a derived class is:

class SubClass: visibility SuperClass {
    // subclass members
};

The colon indicates that the subclass inherits from the superclass. The visibility modifier is optional and, if present, may be either private or public. The default visibility (if there is no modifier present) is private. Visibility specifies whether the features of the base class are privately derived or publicly derived.

Note that in some other languages like Java and C#, there is no visibility modifier for inheritance:

// No visibility modifier
// Equivalent to public SuperClass in C++
class SubClass extends SuperClass {
    // subclass members
}

Some languages also support the inheritance of other constructs. For example, in Eiffel, contracts that define the specification of a class are also inherited by heirs. The superclass establishes a common interface and foundational functionality, which specialized subclasses can inherit, modify, and supplement. The software inherited by a subclass is considered reused in the subclass. A reference to an instance of a class may actually be referring to one of its subclasses. The actual class of the object being referenced is impossible to predict at compile-time. A uniform interface is used to invoke the member functions of objects of a number of different classes. Subclasses may replace superclass functions with entirely new functions that must share the same method signature.

Non-subclassable classes

In some languages a class may be declared as non-subclassable by adding certain class modifiers to the class declaration. Examples include the final keyword in Java and C++11 onwards or the sealed keyword in C#. Such modifiers are added to the class declaration before the class keyword and the class identifier declaration. Such non-subclassable classes restrict reusability, particularly when developers only have access to precompiled binaries and not source code.

A non-subclassable class has no subclasses, so it can be easily deduced at compile time that references or pointers to objects of that class are actually referencing instances of that class and not instances of subclasses (they do not exist) or instances of superclasses (upcasting a reference type violates the type system). Because the exact type of the object being referenced is known before execution, early binding (also called static dispatch) can be used instead of late binding (also called dynamic dispatch), which requires one or more virtual method table lookups depending on whether multiple inheritance or only single inheritance are supported in the programming language that is being used.

Non-overridable methods

Just as classes may be non-subclassable, method declarations may contain method modifiers that prevent the method from being overridden (i.e. replaced with a new function with the same name and type signature in a subclass). A private method is un-overridable simply because it is not accessible by classes other than the class it is a member function of (this is not true for C++, though). A final method in Java, a sealed method in C# or a frozen feature in Eiffel cannot be overridden.

Virtual methods

If a superclass method is a virtual method, then invocations of the superclass method will be dynamically dispatched. Some languages require that method be specifically declared as virtual (e.g. C++), and in others, all methods are virtual (e.g. Java). An invocation of a non-virtual method will always be statically dispatched (i.e. the address of the function call is determined at compile-time). Static dispatch is faster than dynamic dispatch and allows optimizations such as inline expansion.

Visibility of inherited members

The following table shows which variables and functions get inherited dependent on the visibility given when deriving the class, using the terminology established by C++.

Base class visibility Derived class visibility

Private derivation Protected derivation Public derivation
  • Private →
  • Protected →
  • Public →
  • Not inherited
  • Private
  • Private
  • Not inherited
  • Protected
  • Protected
  • Not inherited
  • Protected
  • Public

Applications

Inheritance is used to co-relate two or more classes to each other.

Overriding

Illustration of method overriding

Many object-oriented programming languages permit a class or object to replace the implementation of an aspect—typically a behavior—that it has inherited. This process is called overriding. Overriding introduces a complication: which version of the behavior does an instance of the inherited class use—the one that is part of its own class, or the one from the parent (base) class? The answer varies between programming languages, and some languages provide the ability to indicate that a particular behavior is not to be overridden and should behave as defined by the base class. For instance, in C#, the base method or property can only be overridden in a subclass if it is marked with the virtual, abstract, or override modifier, while in programming languages such as Java, different methods can be called to override other methods. An alternative to overriding is hiding the inherited code.

Code reuse

Implementation inheritance is the mechanism whereby a subclass re-uses code in a base class. By default the subclass retains all of the operations of the base class, but the subclass may override some or all operations, replacing the base-class implementation with its own.

In the following example, subclasses SquareSumComputer and CubeSumComputer override the transform() method of the base class SumComputer. The base class comprises operations to compute the sum of the squares between two integers. The subclass re-uses all of the functionality of the base class with the exception of the operation that transforms a number into its square, replacing it with an operation that transforms a number into its square and cube respectively. The subclasses therefore compute the sum of the squares/cubes between two integers.

Below is an example of Java.

import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

abstract class SumComputer {
    private int a;
    private int b;

    public SumComputer(int a, int b) {
        this.a = a;
        this.b = b;
    }

    // Abstract method to be implemented by subclasses
    public abstract int transform(int x);

    public List<Integer> inputs() {
        return IntStream.rangeClosed(a, b)
            .boxed
            .collect(Collectors.toList());
    }

    public int compute() {
        return inputs().stream().
            .mapToInt(this::transform)
            .sum();
    }
}

class SquareSumComputer extends SumComputer {
    public SquareSumComputer(int a, int b) {
        super(a, b);
    }

    @Override
    public int transform(int x) {
        return x * x;
    }
}

class CubeSumComputer extends SumComputer {
    public CubeSumComputer (int a, int b) {
        super(a, b);
    }

    @Override
    public int transform(int x) {
        return x * x * x;
    }
}

In most quarters, class inheritance for the sole purpose of code reuse has fallen out of favor. The primary concern is that implementation inheritance does not provide any assurance of polymorphic substitutability—an instance of the reusing class cannot necessarily be substituted for an instance of the inherited class. An alternative technique, explicit delegation, requires more programming effort, but avoids the substitutability issue. In C++ private inheritance can be used as a form of implementation inheritance without substitutability. Whereas public inheritance represents an "is-a" relationship and delegation represents a "has-a" relationship, private (and protected) inheritance can be thought of as an "is implemented in terms of" relationship.

Another frequent use of inheritance is to guarantee that classes maintain a certain common interface; that is, they implement the same methods. The parent class can be a combination of implemented operations and operations that are to be implemented in the child classes. Often, there is no interface change between the supertype and subtype- the child implements the behavior described instead of its parent class.

Inheritance vs subtyping

Inheritance is similar to but distinct from subtyping. Subtyping enables a given type to be substituted for another type or abstraction and is said to establish an is-a relationship between the subtype and some existing abstraction, either implicitly or explicitly, depending on language support. The relationship can be expressed explicitly via inheritance in languages that support inheritance as a subtyping mechanism. For example, the following C++ code establishes an explicit inheritance relationship between classes B and A, where B is both a subclass and a subtype of A and can be used as an A wherever a B is specified (via a reference, a pointer or the object itself).

class A { 
public:
    void methodOfA() const {
        // ...
    }
};

class B : public A { 
public:
    void methodOfB() const {
        // ...
    }
};

void functionOnA(const A& a) {
    a.methodOfA();
}

int main() {
   B b;
   functionOnA(b); // b can be substituted for an A.
}

In programming languages that do not support inheritance as a subtyping mechanism, the relationship between a base class and a derived class is only a relationship between implementations (a mechanism for code reuse), as compared to a relationship between types. Inheritance, even in programming languages that support inheritance as a subtyping mechanism, does not necessarily entail behavioral subtyping. It is entirely possible to derive a class whose object will behave incorrectly when used in a context where the parent class is expected; see the Liskov substitution principle. (Compare connotation/denotation.) In some OOP languages, the notions of code reuse and subtyping coincide because the only way to declare a subtype is to define a new class that inherits the implementation of another.

Design constraints

Using inheritance extensively in designing a program imposes certain constraints.

For example, consider a class Person that contains a person's name, date of birth, address and phone number. We can define a subclass of Person called Student that contains the person's grade point average and classes taken, and another subclass of Person called Employee that contains the person's job-title, employer, and salary.

In defining this inheritance hierarchy we have already defined certain restrictions, not all of which are desirable:

  • Singleness: Using single inheritance, a subclass can inherit from only one superclass. Continuing the example given above, a Person object can be either a Student or an Employee, but not both. Using multiple inheritance partially solves this problem, as one can then define a StudentEmployee class that inherits from both Student and Employee. However, in most implementations, it can still inherit from each superclass only once, and thus, does not support cases in which a student has two jobs or attends two institutions. The inheritance model available in Eiffel makes this possible through support for repeated inheritance.
  • Static: The inheritance hierarchy of an object is fixed at instantiation when the object's type is selected and does not change with time. For example, the inheritance graph does not allow a Student object to become an Employee object while retaining the state of its Person superclass. (This kind of behavior, however, can be achieved with the decorator pattern.) Some have criticized inheritance, contending that it locks developers into their original design standards.
  • Visibility: Whenever client code has access to an object, it generally has access to all the object's superclass data. Even if the superclass has not been declared public, the client can still cast the object to its superclass type. For example, there is no way to give a function a pointer to a Student's grade point average and transcript without also giving that function access to all of the personal data stored in the student's Person superclass. Many modern languages, including C++ and Java, provide a "protected" access modifier that allows subclasses to access the data, without allowing any code outside the chain of inheritance to access it.

The composite reuse principle is an alternative to inheritance. This technique supports polymorphism and code reuse by separating behaviors from the primary class hierarchy and including specific behavior classes as required in any business domain class. This approach avoids the static nature of a class hierarchy by allowing behavior modifications at run time and allows one class to implement behaviors buffet-style, instead of being restricted to the behaviors of its ancestor classes.

Issues and alternatives

Implementation inheritance has been controversial among programmers and theoreticians of object-oriented programming since at least the 1990s. Among the critics are the authors of Design Patterns, who advocate instead for interface inheritance, and favor composition over inheritance. For example, the decorator pattern (as mentioned above) has been proposed to overcome the static nature of inheritance between classes. As a more fundamental solution to the same problem, role-oriented programming introduces a distinct relationship, played-by, combining properties of inheritance and composition into a new concept.

According to Allen Holub, the main problem with implementation inheritance is that it introduces unnecessary coupling in the form of the "fragile base class problem": modifications to the base class implementation can cause inadvertent behavioral changes in subclasses. Using interfaces avoids this problem because no implementation is shared, only the API. Another way of stating this is that "inheritance breaks encapsulation". The problem surfaces clearly in open object-oriented systems such as frameworks, where client code is expected to inherit from system-supplied classes and then substituted for the system's classes in its algorithms.

Reportedly, Java inventor James Gosling has spoken against implementation inheritance, stating that he would not include it if he were to redesign Java. Language designs that decouple inheritance from subtyping (interface inheritance) appeared as early as 1990; a modern example of this is the Go programming language.

Complex inheritance, or inheritance used within an insufficiently mature design, may lead to the yo-yo problem. When inheritance was used as a primary approach to structure programs in the late 1990s, developers tended to break code into more layers of inheritance as the system functionality grew. If a development team combined multiple layers of inheritance with the single responsibility principle, this resulted in many very thin layers of code, with many layers consisting of only 1 or 2 lines of actual code. Too many layers make debugging a significant challenge, as it becomes hard to determine which layer needs to be debugged.

Another issue with inheritance is that subclasses must be defined in code, which means that program users cannot add new subclasses at runtime. Other design patterns (such as Entity–component–system) allow program users to define variations of an entity at runtime.

P-code machine

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/P-code_machine ...