Search This Blog

Friday, April 24, 2026

Machine code

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Machine_code
Machine language monitor running on a W65C816S microprocessor, displaying code disassembly and dumps of processor register and memory

In computing, machine code is data encoded and structured to control a computer's central processing unit (CPU) via its programmable interface. A computer program consists primarily of sequences of machine-code instructions. Machine code is classified as native with respect to its host CPU since it is the language that the CPU interprets directly. Some software interpreters translate the programming language that they interpret into a virtual machine code (bytecode) and process it with a P-code machine.

A machine-code instruction causes the CPU to perform a specific task such as:

An instruction set architecture (ISA) defines the interface to a CPU and varies by groupings or families of CPU design such as x86 and ARM. Generally, machine code compatible with one family is not with others, but there are exceptions. The VAX architecture includes optional support of the PDP-11 instruction set. The IA-64 architecture includes optional support of the IA-32 instruction set. And, the PowerPC 615 can natively process both PowerPC and x86 instructions.

Assembly language

Translation of assembly into machine code

Assembly language provides a relatively direct mapping from a human-readable source code to machine code. The assembly language source code represents numerical codes in machine code, as mnemonics and labels. For example, NOP in assembly for an x86 processor represents the x86 architecture opcode 0x90 in machine code. While it is possible to write a program in machine code, doing so is tedious and error-prone. Therefore, programs are usually written in assembly or, more commonly, in a high-level programming language.

Instruction set

A machine instruction encodes an operation as a pattern of bits based on the specified format for the machine's instruction set.

Instruction sets differ in various ways. Instructions of a set might all be the same length or different instructions might have different lengths; they might be smaller than, the same size as, or larger than the word size of the architecture. The number of instructions may be relatively small or large. Instructions may or may not have to be aligned on particular memory boundaries, such as the architecture's word boundary.

An instruction set needs to execute the circuits of a computer's digital logic level. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components. To control a computer's architectural features, machine instructions are created. Examples of features that are controlled using machine instructions:

The criteria for instruction formats include:

  • Instructions most commonly used should be shorter than instructions rarely used.
  • The memory transfer rate of the underlying hardware determines the flexibility of the memory fetch instructions.
  • The number of bits in the address field requires special consideration.

Determining the size of the address field is a choice between space and speed. On some computers, the number of bits in the address field may be too small to access all of the physical memory. Also, virtual address space needs to be considered. Another constraint may be a limitation on the size of registers used to construct the address. Whereas a shorter address field allows the instructions to execute more quickly, other physical properties need to be considered when designing the instruction format.

Instructions can be separated into two types: general-purpose and special-purpose. Special-purpose instructions exploit architectural features that are unique to a computer. General-purpose instructions control architectural features common to all computers.

General-purpose instructions control:

  • Data movement from one place to another
  • Monadic operations that have one operand to produce a result
  • Dyadic operations that have two operands to produce a result
  • Comparisons and conditional jumps
  • Procedure calls
  • Loop control
  • Input/output

Overlapping instruction

On processor architectures with variable-length instruction sets (such as Intel's x86 processor family) it is, within the limits of the control-flow resynchronizing phenomenon known as the Kruskal count, sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences. These are called overlapping instructions, overlapping opcodes, overlapping code, overlapped code, instruction scission, or jump into the middle of an instruction.

In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example was in the implementation of error tables in Microsoft's Altair BASIC, where interleaved instructions mutually shared their instruction bytes. The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on the byte level such as in the implementation of boot loaders which have to fit into boot sectors.

It is also sometimes used as a code obfuscation technique as a measure against disassembly and tampering.

The principle is also used in shared code sequences of fat binaries which must run on multiple instruction-set-incompatible processor platforms.

This property is also used to find unintended instructions called gadgets in existing code repositories and is used in return-oriented programming as alternative to code injection for exploits such as return-to-libc attacks.

Microcode

In some computers, the machine code of the architecture is implemented by an even more fundamental underlying layer called microcode, providing a common machine language interface across a line or family of different models of computer with widely different underlying dataflows. This is done to facilitate porting of machine language programs between different models. An example of this use is the IBM System/360 family of computers and their successors.

Examples

IBM 709x

The IBM 704, 709, 704x and 709x store one instruction in each instruction word; IBM numbers the bit from the left as S, 1, ..., 35. Most instructions have one of two formats:

Generic
S,1-11
12-13 Flag, ignored in some instructions
14-17 unused
18-20 Tag
21-35 Y
Index register control, other than TSX
S,1-2 Opcode
3-17 Decrement
18-20 Tag
21-35 Y

For all but the IBM 7094 and 7094 II, there are three index registers designated A, B and C; indexing with multiple 1 bits in the tag subtracts the logical or of the selected index registers and loading with multiple 1 bits in the tag loads all of the selected index registers. The 7094 and 7094 II have seven index registers, but when they are powered on they are in multiple tag mode, in which they use only the three of the index registers in a fashion compatible with earlier machines, and require a Leave Multiple Tag Mode (LMTM) instruction in order to access the other four index registers.

The effective address is normally Y-C(T), where C(T) is either 0 for a tag of 0, the logical or of the selected index registers in multiple tag mode or the selected index register if not in multiple tag mode. However, the effective address for index register control instructions is just Y.

A flag with both bits 1 selects indirect addressing; the indirect address word has both a tag and a Y field.

In addition to transfer (branch) instructions, these machines have skip instruction that conditionally skip one or two words, e.g., Compare Accumulator with Storage (CAS) does a three way compare and conditionally skips to NSI, NSI+1 or NSI+2, depending on the result.

MIPS

The MIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long. The general type of instruction is given by the op (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by op. R-type (register) instructions include an additional funct (function) field to determine the exact operation. The fields used in these types are:

   6      5     5     5     5      6 bits
[  op  |  rs |  rt |  rd |shamt| funct]  R-type
[  op  |  rs |  rt | address/immediate]  I-type
[  op  |        target address        ]  J-type

rs, rt, and rd indicate register operands; shamt gives a shift amount; and the address or immediate fields contain an operand directly.

For example, adding the registers 1 and 2 and placing the result in register 6 is encoded:

[  op  |  rs |  rt |  rd |shamt| funct]
    0     1     2     6     0     32     decimal
 000000 00001 00010 00110 00000 100000   binary

Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3:

[  op  |  rs |  rt | address/immediate]
   35     3     8           68           decimal
 100011 00011 01000 00000 00001 000100   binary

Jumping to the address 1024:

[  op  |        target address        ]
    2                 1024               decimal
 000010 00000 00000 00000 10000 000000   binary

Bytecode

Machine code is similar to yet fundamentally different from bytecode. Like machine code, bytecode is typically generated (i.e. by a compiler) from source code. But, unlike machine code, bytecode is not directly executable by a CPU. An exception is if a processor is designed to use bytecode as its machine code, such as the Pascal MicroEngine or a Java processor. If bytecode is processed by an software interpreter, then that interpreter is a virtual machine for which the bytecode is its machine code.

Storage

During execution, machine code is generally stored in RAM although running from ROM is supported by some devices. Regardless, the code may also be cached in more specialized memory to enhance performance. There may be different caches for instructions and data, depending on the architecture.

From the point of view of a process, the machine code lives in code space, a designated part of its address space. In a multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of context switching considerably as compared to process switching.

Readability

Machine code is generally considered to be not human readable, with Douglas Hofstadter comparing it to examining the atoms of a DNA molecule. However, various tools and methods support understanding machine code.

Disassembly decodes machine code to assembly language which is possible since assembly instructions can often be mapped one-to-one to machine instructions.

A decompiler converts machine code to a high-level language, but the result can be relatively obfuscated (hard to understand).

A program can be associated with debug symbols (either embedded in the native executable or in a separate file) that allow it to be mapped to external source code. A debugger reads the symbols to help a programmer interactively debug the program. Examples include:

Library (computing)

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Library_(computing)
Illustration of an application which uses libvorbisfile to play an Ogg Vorbis file

In computing, a library is a collection of resources that can be used during software development to implement a computer program. Commonly, a library consists of executable code such as compiled functions and classes, or a library can be a collection of source code. A resource library may contain data such as images and text.

A library can be used by multiple, independent consumers (programs and other libraries). This differs from resources defined in a program which can usually only be used by that program. When a consumer uses a library resource, it gains the value of the library without having to implement it itself. Libraries encourage software reuse in a modular fashion. Libraries can use other libraries resulting in a hierarchy of libraries in a program.

When writing code that uses a library, a programmer only needs to know how to use that library's application programming interface (API) without understanding its internal mechanics. For example, a program could use a library that abstracts a complicated system call so that the programmer can use the system feature without spending time to learn the intricacies of the system function.

History

The idea of a computer library dates back to the first computers created by Charles Babbage. An 1888 paper on his Analytical Engine suggested that computer operations could be punched on separate cards from numerical input. If these operation punch cards were saved for reuse then "by degrees the engine would have a library of its own."

A woman working next to a filing cabinet containing the subroutine library on reels of punched tape for the EDSAC computer

In 1947 Goldstine and von Neumann speculated that it would be useful to create a "library" of subroutines for their work on the IAS machine, an early computer that was not yet operational at that time. They envisioned a physical library of magnetic wire recordings, with each wire storing reusable computer code.

Inspired by von Neumann, Wilkes and his team constructed EDSAC. A filing cabinet of punched tape held the subroutine library for this computer. Programs for EDSAC consisted of a main program and a sequence of subroutines copied from the subroutine library. In 1951 the team published the first textbook on programming, The Preparation of Programs for an Electronic Digital Computer, which detailed the creation and the purpose of the library.

COBOL included "primitive capabilities for a library system" in 1959, but Jean Sammet described them as "inadequate library facilities" in retrospect.

JOVIAL has a Communication Pool (COMPOOL), roughly a library of header files.

Another major contributor to the modern library concept came in the form of the subprogram innovation of FORTRAN. FORTRAN subprograms can be compiled independently of each other, but the compiler lacked a linker. So prior to the introduction of modules in Fortran-90, type checking between FORTRAN subprograms was impossible.

By the mid 1960s, copy and macro libraries for assemblers were common. Starting with the popularity of the IBM System/360, libraries containing other types of text elements, e.g., system parameters, also became common.

In IBM's OS/360 and its successors this is called a partitioned data set.

The first object-oriented programming language, Simula, developed in 1965, supported adding classes to libraries via its compiler.

Linking

The linking (or binding) process resolves references known as symbols (or links) by searching for them in various locations including configured libraries. If a linker (or binder) does not find a symbol, then it fails, but multiple matches may or may not cause failure.

Static linking is linking at build time, such that the library executable code is included in the program. Dynamic linking is linking at run time; it involves building the program with information that supports run-time linking to a dynamic link library (DLL). For dynamic linking, a compatible DLL file must be available to the program at run time, but for static linking, the program is standalone.

Smart linking is performed by a build tool that excludes unused code in the linking process. For example, a program that only uses integers for arithmetic, or does no arithmetic operations at all, can exclude floating-point library routines. This can lead to smaller program file size and reduced memory usage.

Relocation

Some references in a program or library module are stored in a relative or symbolic form which cannot be resolved until all code and libraries are assigned final static addresses. Relocation is the process of adjusting these references, and is done either by the linker or the loader. In general, relocation cannot be done to individual libraries themselves because the addresses in memory may vary depending on the program using them and other libraries they are combined with. Position-independent code avoids references to absolute addresses and therefore does not require relocation.

Categories

Executable

An executable library consists of code that has been converted from source code into machine code or an intermediate form such as bytecode. A linker allows for using library objects by associating each reference with an address at which the object is located. For example, in C, a library function is invoked via C's normal function call syntax and semantics.

A variant is a library containing compiled code (object code in IBM's nomenclature) in a form that cannot be loaded by the OS but that can be read by the linker.

Static

A static library is an executable library that is linked into a program at build-time by a linker (or whatever the build tool is called that does linking). This process, and the resulting stand-alone file, is known as a static build of the program. A static build may not need any further relocation if virtual memory is used and no address space layout randomization is desired.

A static library is sometimes called an archive on Unix-like systems.

Dynamic

A dynamic library is linked when the program is run – either at load-time or runtime. The dynamic library was intended after the static library to support additional software deployment flexibility.

Sources

A source library consists of source code, not compiled code.

Shared

A shared library is a library that contains executable code designed to be used by multiple computer programs or other libraries at runtime, with only one copy of that code in memory, shared by all programs using the code.

Object

Although generally an obsolete technology today, an object library exposes resources for object-oriented programming (OOP) and a distributed object is a remote object library. Examples include: COM/DCOM, SOM/DSOM, DOE, PDO and various CORBA-based systems.

The object library technology was developed since as OOP became popular, it became apparent that OOP runtime binding required information than contemporary libraries did not provide. In addition to the names and entry points of the code located within, due to inheritance, OOP binding also requires a list of dependencies – since the full definition of a method may be in different places. Further, this requires more than listing that one library requires the services of another. In OOP, the libraries themselves may not be known at compile time, and vary from system to system.

The remote object technology was developed in parallel to support multi-tier programs with a user interface application running on a personal computer (PC) using services of a mainframe or minicomputer such as data storage and processing. For instance, a program on a PC would send messages to a minicomputer via remote procedure call (RPC) to retrieve relatively small samples from a relatively large dataset. In response, distributed object technology was developed.

Class

A class library contains classes that can be used to create objects. In Java, for example, classes are contained in JAR files and objects are created at runtime from the classes. However, in Smalltalk, a class library is the starting point for a system image that includes the entire state of the environment, classes and all instantiated objects. Most class libraries are stored in a package repository (such as Maven Central for Java). Client code explicitly specifies dependencies to external libraries in build configuration files (such as a Maven Pom in Java).

Remote

A remote library runs on another computer and its assets are accessed via remote procedure call (RPC) over a network. This distributed architecture allows for minimizing installation of the library and support for it on each consuming system and ensuring consistent versioning. A significant downside is that each library call entails significantly more overhead than for a local library.

Runtime

A runtime library provides access to the runtime environment that is available to a program – tailored to the host platform.

Language standard

Many modern programming languages specify a standard library that provides a base level of functionality for the language environment.

Code generation

A code generation library has a high-level API generating or transforming byte code for Java. They are used by aspect-oriented programming, some data access frameworks, and for testing to generate dynamic proxy objects. They also are used to intercept field access.

File naming

Unix-like

On most modern Unix-like systems, library files are stored in directories such as /lib, /usr/lib and /usr/local/lib. A filename typically starts with lib, and ends with .a for a static library (archive) or .so for a shared object (dynamically linked library). For example, libfoo.a and libfoo.so.

Often, symbolic link files are used to manage versioning of a library by providing a link file named without a version that links to a file named with a version. For example, libfoo.so.2 might be version 2 of library foo and a link file named libfoo.so provides a version independent name to that file that programs link to. The link file could be changed to a refer to a version 3 (libfoo.so.3) such that consuming programs will then use version 3 without having to change the program.

Files with the extension .la are libtool archives; they are not usable by the system.

macOS

The macOS system inherits static library conventions from BSD, with the library stored in a .a file. It uses either .so or .dylib for dynamic libraries. Most libraries in macOS, however, consist of "frameworks", placed inside special directories called "bundles" which wrap the library's required files and metadata. For example, a framework called Abc would be implemented in a bundle called Abc.framework, with Abc.framework/Abc being either the dynamically linked library file or a symlink to the dynamically linked library file in Abc.framework/Versions/Current/Abc.

Windows

Often, a Windows dynamic-link library (DLL) has the file extension .dll, although sometimes different extensions are used to indicate general content, e.g. .ocx for a OLE library.

A .lib file can be either a static library or contain the information needed to build an application that consumes the associated DLL. In the latter case, the associated DLL file must be present at runtime.

Inheritance (object-oriented programming)

From Wikipedia, the free encyclopedia

In object-oriented programming, inheritance is the mechanism of basing an object or class upon another object (prototype-based inheritance) or class (class-based inheritance), retaining similar implementation. It is also defined as deriving new classes (sub classes) from existing ones such as super class or base class and then forming them into a hierarchy of classes. In most class-based object-oriented languages like C++, an object created through inheritance, a "child object", acquires all the properties and behaviors of the "parent object", with the exception of: constructors, destructors, overloaded operators and friend functions of the base class. Inheritance allows programmers to create classes that are built upon existing classes, to specify a new implementation while maintaining the same behaviors (realizing an interface), to reuse code and to independently extend original software via public classes and interfaces. The relationships of objects or classes through inheritance give rise to a directed acyclic graph.

An inherited class is called a subclass of its parent class or super class. The term inheritance is loosely used for both class-based and prototype-based programming, but in narrow use the term is reserved for class-based programming (one class inherits from another), with the corresponding technique in prototype-based programming being instead called delegation (one object delegates to another). Class-modifying inheritance patterns can be pre-defined according to simple network interface parameters such that inter-language compatibility is preserved.

Inheritance should not be confused with subtyping. In some languages, generally statically-typed class-based OO languages, like C++, C#, Java, and Scala, inheritance and subtyping agree, whereas in others they differ. In general, subtyping establishes an is-a relationship, whereas inheritance only reuses implementation and establishes a syntactic relationship, not necessarily a semantic relationship (inheritance does not ensure behavioral subtyping). To distinguish these concepts, subtyping is sometimes referred to as interface inheritance (without acknowledging that the specialization of type variables also induces a subtyping relation), whereas inheritance as defined here is known as implementation inheritance or code inheritance. Still, inheritance is a commonly used mechanism for establishing subtype relationships.

Inheritance is contrasted with object composition, where one object contains another object (or objects of one class contain objects of another class); see composition over inheritance. In contrast to subtyping’s is-a relationship, composition implements a has-a relationship.

Mathematically speaking, inheritance in any system of classes induces a strict partial order on the set of classes in that system.

History

In 1966, Tony Hoare presented some remarks on records, and in particular, the idea of record subclasses, record types with common properties but discriminated by a variant tag and having fields private to the variant. Influenced by this, in 1967 Ole-Johan Dahl and Kristen Nygaard presented a design that allowed specifying objects that belonged to different classes but had common properties. The common properties were collected in a superclass, and each superclass could itself potentially have a superclass. The values of a subclass were thus compound objects, consisting of some number of prefix parts belonging to various superclasses, plus a main part belonging to the subclass. These parts were all concatenated together. The attributes of a compound object would be accessible by dot notation. This idea was first adopted in the Simula 67 programming language. The idea then spread to Smalltalk, C++, Java, Python, and many other languages.

Types

Single inheritance
Multiple inheritance

There are various types of inheritance, based on paradigm and specific language.

Single inheritance
where subclasses inherit the features of one superclass. A class acquires the properties of another class.
Multiple inheritance
where one class can have more than one superclass and inherit features from all parent classes.

"Multiple inheritance ... was widely supposed to be very difficult to implement efficiently. For example, in a summary of C++ in his book on Objective C, Brad Cox actually claimed that adding multiple inheritance to C++ was impossible. Thus, multiple inheritance seemed more of a challenge. Since I had considered multiple inheritance as early as 1982 and found a simple and efficient implementation technique in 1984, I couldn't resist the challenge. I suspect this to be the only case in which fashion affected the sequence of events."

Multilevel inheritance
where a subclass is inherited from another subclass. It is not uncommon that a class is derived from another derived class as shown in the figure "Multilevel inheritance".
Multilevel inheritance
The class A serves as a base class for the derived class B, which in turn serves as a base class for the derived class C. The class B is known as intermediate base class because it provides a link for the inheritance between A and C. The chain ABC is known as inheritance path.
A derived class with multilevel inheritance is declared as follows:
// C++ language implementation
class A { ... };      // Base class
class B : public A { ... };   // B derived from A
class C : public B { ... };   // C derived from B
This process can be extended to any number of levels.
Hierarchical inheritance
This is where one class serves as a superclass (base class) for more than one sub class. For example, a parent class, A, can have two subclasses B and C. Both B and C's parent class is A, but B and C are two separate subclasses.
Hybrid inheritance
Hybrid inheritance is when a mix of two or more of the above types of inheritance occurs. An example of this is when a class A has a subclass B which has two subclasses, C and D. This is a mixture of both multilevel inheritance and hierarchal inheritance.

Subclasses and superclasses

Subclasses, derived classes, heir classes, or child classes are modular derivative classes that inherit one or more language entities from one or more other classes (called superclass, base classes, or parent classes). The semantics of class inheritance vary from language to language, but commonly the subclass automatically inherits the instance variables and member functions of its superclasses.

In C++, the general form of defining a derived class is:

class SubClass: visibility SuperClass {
    // subclass members
};

The colon indicates that the subclass inherits from the superclass. The visibility modifier is optional and, if present, may be either private or public. The default visibility (if there is no modifier present) is private. Visibility specifies whether the features of the base class are privately derived or publicly derived.

Note that in some other languages like Java and C#, there is no visibility modifier for inheritance:

// No visibility modifier
// Equivalent to public SuperClass in C++
class SubClass extends SuperClass {
    // subclass members
}

Some languages also support the inheritance of other constructs. For example, in Eiffel, contracts that define the specification of a class are also inherited by heirs. The superclass establishes a common interface and foundational functionality, which specialized subclasses can inherit, modify, and supplement. The software inherited by a subclass is considered reused in the subclass. A reference to an instance of a class may actually be referring to one of its subclasses. The actual class of the object being referenced is impossible to predict at compile-time. A uniform interface is used to invoke the member functions of objects of a number of different classes. Subclasses may replace superclass functions with entirely new functions that must share the same method signature.

Non-subclassable classes

In some languages a class may be declared as non-subclassable by adding certain class modifiers to the class declaration. Examples include the final keyword in Java and C++11 onwards or the sealed keyword in C#. Such modifiers are added to the class declaration before the class keyword and the class identifier declaration. Such non-subclassable classes restrict reusability, particularly when developers only have access to precompiled binaries and not source code.

A non-subclassable class has no subclasses, so it can be easily deduced at compile time that references or pointers to objects of that class are actually referencing instances of that class and not instances of subclasses (they do not exist) or instances of superclasses (upcasting a reference type violates the type system). Because the exact type of the object being referenced is known before execution, early binding (also called static dispatch) can be used instead of late binding (also called dynamic dispatch), which requires one or more virtual method table lookups depending on whether multiple inheritance or only single inheritance are supported in the programming language that is being used.

Non-overridable methods

Just as classes may be non-subclassable, method declarations may contain method modifiers that prevent the method from being overridden (i.e. replaced with a new function with the same name and type signature in a subclass). A private method is un-overridable simply because it is not accessible by classes other than the class it is a member function of (this is not true for C++, though). A final method in Java, a sealed method in C# or a frozen feature in Eiffel cannot be overridden.

Virtual methods

If a superclass method is a virtual method, then invocations of the superclass method will be dynamically dispatched. Some languages require that method be specifically declared as virtual (e.g. C++), and in others, all methods are virtual (e.g. Java). An invocation of a non-virtual method will always be statically dispatched (i.e. the address of the function call is determined at compile-time). Static dispatch is faster than dynamic dispatch and allows optimizations such as inline expansion.

Visibility of inherited members

The following table shows which variables and functions get inherited dependent on the visibility given when deriving the class, using the terminology established by C++.

Base class visibility Derived class visibility

Private derivation Protected derivation Public derivation
  • Private →
  • Protected →
  • Public →
  • Not inherited
  • Private
  • Private
  • Not inherited
  • Protected
  • Protected
  • Not inherited
  • Protected
  • Public

Applications

Inheritance is used to co-relate two or more classes to each other.

Overriding

Illustration of method overriding

Many object-oriented programming languages permit a class or object to replace the implementation of an aspect—typically a behavior—that it has inherited. This process is called overriding. Overriding introduces a complication: which version of the behavior does an instance of the inherited class use—the one that is part of its own class, or the one from the parent (base) class? The answer varies between programming languages, and some languages provide the ability to indicate that a particular behavior is not to be overridden and should behave as defined by the base class. For instance, in C#, the base method or property can only be overridden in a subclass if it is marked with the virtual, abstract, or override modifier, while in programming languages such as Java, different methods can be called to override other methods. An alternative to overriding is hiding the inherited code.

Code reuse

Implementation inheritance is the mechanism whereby a subclass re-uses code in a base class. By default the subclass retains all of the operations of the base class, but the subclass may override some or all operations, replacing the base-class implementation with its own.

In the following example, subclasses SquareSumComputer and CubeSumComputer override the transform() method of the base class SumComputer. The base class comprises operations to compute the sum of the squares between two integers. The subclass re-uses all of the functionality of the base class with the exception of the operation that transforms a number into its square, replacing it with an operation that transforms a number into its square and cube respectively. The subclasses therefore compute the sum of the squares/cubes between two integers.

Below is an example of Java.

import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

abstract class SumComputer {
    private int a;
    private int b;

    public SumComputer(int a, int b) {
        this.a = a;
        this.b = b;
    }

    // Abstract method to be implemented by subclasses
    public abstract int transform(int x);

    public List<Integer> inputs() {
        return IntStream.rangeClosed(a, b)
            .boxed
            .collect(Collectors.toList());
    }

    public int compute() {
        return inputs().stream().
            .mapToInt(this::transform)
            .sum();
    }
}

class SquareSumComputer extends SumComputer {
    public SquareSumComputer(int a, int b) {
        super(a, b);
    }

    @Override
    public int transform(int x) {
        return x * x;
    }
}

class CubeSumComputer extends SumComputer {
    public CubeSumComputer (int a, int b) {
        super(a, b);
    }

    @Override
    public int transform(int x) {
        return x * x * x;
    }
}

In most quarters, class inheritance for the sole purpose of code reuse has fallen out of favor. The primary concern is that implementation inheritance does not provide any assurance of polymorphic substitutability—an instance of the reusing class cannot necessarily be substituted for an instance of the inherited class. An alternative technique, explicit delegation, requires more programming effort, but avoids the substitutability issue. In C++ private inheritance can be used as a form of implementation inheritance without substitutability. Whereas public inheritance represents an "is-a" relationship and delegation represents a "has-a" relationship, private (and protected) inheritance can be thought of as an "is implemented in terms of" relationship.

Another frequent use of inheritance is to guarantee that classes maintain a certain common interface; that is, they implement the same methods. The parent class can be a combination of implemented operations and operations that are to be implemented in the child classes. Often, there is no interface change between the supertype and subtype- the child implements the behavior described instead of its parent class.

Inheritance vs subtyping

Inheritance is similar to but distinct from subtyping. Subtyping enables a given type to be substituted for another type or abstraction and is said to establish an is-a relationship between the subtype and some existing abstraction, either implicitly or explicitly, depending on language support. The relationship can be expressed explicitly via inheritance in languages that support inheritance as a subtyping mechanism. For example, the following C++ code establishes an explicit inheritance relationship between classes B and A, where B is both a subclass and a subtype of A and can be used as an A wherever a B is specified (via a reference, a pointer or the object itself).

class A { 
public:
    void methodOfA() const {
        // ...
    }
};

class B : public A { 
public:
    void methodOfB() const {
        // ...
    }
};

void functionOnA(const A& a) {
    a.methodOfA();
}

int main() {
   B b;
   functionOnA(b); // b can be substituted for an A.
}

In programming languages that do not support inheritance as a subtyping mechanism, the relationship between a base class and a derived class is only a relationship between implementations (a mechanism for code reuse), as compared to a relationship between types. Inheritance, even in programming languages that support inheritance as a subtyping mechanism, does not necessarily entail behavioral subtyping. It is entirely possible to derive a class whose object will behave incorrectly when used in a context where the parent class is expected; see the Liskov substitution principle. (Compare connotation/denotation.) In some OOP languages, the notions of code reuse and subtyping coincide because the only way to declare a subtype is to define a new class that inherits the implementation of another.

Design constraints

Using inheritance extensively in designing a program imposes certain constraints.

For example, consider a class Person that contains a person's name, date of birth, address and phone number. We can define a subclass of Person called Student that contains the person's grade point average and classes taken, and another subclass of Person called Employee that contains the person's job-title, employer, and salary.

In defining this inheritance hierarchy we have already defined certain restrictions, not all of which are desirable:

  • Singleness: Using single inheritance, a subclass can inherit from only one superclass. Continuing the example given above, a Person object can be either a Student or an Employee, but not both. Using multiple inheritance partially solves this problem, as one can then define a StudentEmployee class that inherits from both Student and Employee. However, in most implementations, it can still inherit from each superclass only once, and thus, does not support cases in which a student has two jobs or attends two institutions. The inheritance model available in Eiffel makes this possible through support for repeated inheritance.
  • Static: The inheritance hierarchy of an object is fixed at instantiation when the object's type is selected and does not change with time. For example, the inheritance graph does not allow a Student object to become an Employee object while retaining the state of its Person superclass. (This kind of behavior, however, can be achieved with the decorator pattern.) Some have criticized inheritance, contending that it locks developers into their original design standards.
  • Visibility: Whenever client code has access to an object, it generally has access to all the object's superclass data. Even if the superclass has not been declared public, the client can still cast the object to its superclass type. For example, there is no way to give a function a pointer to a Student's grade point average and transcript without also giving that function access to all of the personal data stored in the student's Person superclass. Many modern languages, including C++ and Java, provide a "protected" access modifier that allows subclasses to access the data, without allowing any code outside the chain of inheritance to access it.

The composite reuse principle is an alternative to inheritance. This technique supports polymorphism and code reuse by separating behaviors from the primary class hierarchy and including specific behavior classes as required in any business domain class. This approach avoids the static nature of a class hierarchy by allowing behavior modifications at run time and allows one class to implement behaviors buffet-style, instead of being restricted to the behaviors of its ancestor classes.

Issues and alternatives

Implementation inheritance has been controversial among programmers and theoreticians of object-oriented programming since at least the 1990s. Among the critics are the authors of Design Patterns, who advocate instead for interface inheritance, and favor composition over inheritance. For example, the decorator pattern (as mentioned above) has been proposed to overcome the static nature of inheritance between classes. As a more fundamental solution to the same problem, role-oriented programming introduces a distinct relationship, played-by, combining properties of inheritance and composition into a new concept.

According to Allen Holub, the main problem with implementation inheritance is that it introduces unnecessary coupling in the form of the "fragile base class problem": modifications to the base class implementation can cause inadvertent behavioral changes in subclasses. Using interfaces avoids this problem because no implementation is shared, only the API. Another way of stating this is that "inheritance breaks encapsulation". The problem surfaces clearly in open object-oriented systems such as frameworks, where client code is expected to inherit from system-supplied classes and then substituted for the system's classes in its algorithms.

Reportedly, Java inventor James Gosling has spoken against implementation inheritance, stating that he would not include it if he were to redesign Java. Language designs that decouple inheritance from subtyping (interface inheritance) appeared as early as 1990; a modern example of this is the Go programming language.

Complex inheritance, or inheritance used within an insufficiently mature design, may lead to the yo-yo problem. When inheritance was used as a primary approach to structure programs in the late 1990s, developers tended to break code into more layers of inheritance as the system functionality grew. If a development team combined multiple layers of inheritance with the single responsibility principle, this resulted in many very thin layers of code, with many layers consisting of only 1 or 2 lines of actual code. Too many layers make debugging a significant challenge, as it becomes hard to determine which layer needs to be debugged.

Another issue with inheritance is that subclasses must be defined in code, which means that program users cannot add new subclasses at runtime. Other design patterns (such as Entity–component–system) allow program users to define variations of an entity at runtime.

Interplanetary Internet

From Wikipedia, the free encyclopedia The speed of light, illustrated here by a beam of light traveling ...