In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a user,
of the data, specifically in terms of possible values, possible
operations on data of this type, and the behavior of these operations.
This mathematical model contrasts with data structures, which are concrete representations of data, and are the point of view of an implementer, not a user.
Formally, an ADT may be defined as a "class of objects whose
logical behavior is defined by a set of values and a set of operations"; this is analogous to an algebraic structure
in mathematics. What is meant by "behavior" varies by author, with the
two main types of formal specifications for behavior being axiomatic (algebraic) specification and an abstract model; these correspond to axiomatic semantics and operational semantics of an abstract machine, respectively. Some authors also include the computational complexity
("cost"), both in terms of time (for computing operations) and space
(for representing values). In practice many common data types are not
ADTs, as the abstraction is not perfect, and users must be aware of
issues like arithmetic overflow
that are due to the representation. For example, integers are often
stored as fixed-width values (32-bit or 64-bit binary numbers), and thus
experience integer overflow if the maximum value is exceeded.
ADTs are a theoretical concept, in computer science, used in the design and analysis of algorithms, data structures, and software systems, and do not correspond to specific features of computer languages—mainstream
computer languages do not directly support formally specified ADTs.
However, various language features correspond to certain aspects of
ADTs, and are easily confused with ADTs proper; these include abstract types, opaque data types, protocols, and design by contract. ADTs were first proposed by Barbara Liskov and Stephen N. Zilles in 1974, as part of the development of the CLU language.
Examples
For example, integers
are an ADT, defined as the values ..., −2, −1, 0, 1, 2, ..., and by the
operations of addition, subtraction, multiplication, and division,
together with greater than, less than, etc., which behave according to
familiar mathematics (with care for integer division), independently of how the integers are represented by the computer.
Explicitly, "behavior" includes obeying various axioms (associativity
and commutativity of addition, etc.), and preconditions on operations
(cannot divide by zero). Typically integers are represented in a data
structure as binary numbers, most often as two's complement, but might be binary-coded decimal or in ones' complement, but the user is abstracted from the concrete choice of representation, and can simply use the data as data types.
An ADT consists not only of operations but also of values of the
underlying data and of constraints on the operations. An "interface"
typically refers only to the operations, and perhaps some of the
constraints on the operations, notably pre-conditions and
post-conditions, but not other constraints, such as relations between
the operations.
For example, an abstract stack, which is a last-in-first-out structure, could be defined by three operations: push, that inserts a data item onto the stack; pop, that removes a data item from it; and peek or top, that accesses a data item on top of the stack without removal. An abstract queue, which is a first-in-first-out structure, would also have three operations: enqueue, that inserts a data item into the queue; dequeue, that removes the first data item from it; and front,
that accesses and serves the first data item in the queue. There would
be no way of differentiating these two data types, unless a mathematical
constraint is introduced that for a stack specifies that each pop
always returns the most recently pushed item that has not been popped
yet. When analyzing the efficiency
of algorithms that use stacks, one may also specify that all operations
take the same time no matter how many data items have been pushed into
the stack, and that the stack uses a constant amount of storage for each
element.
Introduction
Abstract
data types are purely theoretical entities, used (among other things)
to simplify the description of abstract algorithms, to classify and
evaluate data structures, and to formally describe the type systems of programming languages. However, an ADT may be implemented by specific data types or data structures, in many ways and in many programming languages; or described in a formal specification language. ADTs are often implemented as modules: the module's interface declares procedures that correspond to the ADT operations, sometimes with comments that describe the constraints. This information hiding strategy allows the implementation of the module to be changed without disturbing the client programs.
The term abstract data type can also be regarded as a generalized approach of a number of algebraic structures, such as lattices, groups, and rings. The notion of abstract data types is related to the concept of data abstraction, important in object-oriented programming and design by contract methodologies for software development.
Defining an abstract data type
An
abstract data type is defined as a mathematical model of the data
objects that make up a data type as well as the functions that operate
on these objects.
There are no standard conventions for defining them. A broad division
may be drawn between "imperative" and "functional" definition styles.
Imperative-style definition
In the philosophy of imperative programming languages, an abstract data structure is conceived as an entity that is mutable—meaning that it may be in different states
at different times. Some operations may change the state of the ADT;
therefore, the order in which operations are evaluated is important, and
the same operation on the same entities may have different effects if
executed at different times—just like the instructions of a computer, or
the commands and procedures of an imperative language. To underscore
this view, it is customary to say that the operations are executed or applied, rather than evaluated. The imperative style is often used when describing abstract algorithms. (See The Art of Computer Programming by Donald Knuth for more details)
Abstract variable
Imperative-style definitions of ADT often depend on the concept of an abstract variable, which may be regarded as the simplest non-trivial ADT. An abstract variable V is a mutable entity that admits two operations:
- store(V, x) where x is a value of unspecified nature;
- fetch(V), that yields a value,
with the constraint that
- fetch(V) always returns the value x used in the most recent store(V, x) operation on the same variable V.
As in so many programming languages, the operation store(V, x) is often written V ← x (or some similar notation), and fetch(V) is implied whenever a variable V is used in a context where a value is required. Thus, for example, V ← V + 1 is commonly understood to be a shorthand for store(V,fetch(V) + 1).
In this definition, it is implicitly assumed that storing a value into a variable U has no effect on the state of a distinct variable V. To make this assumption explicit, one could add the constraint that
- if U and V are distinct variables, the sequence { store(U, x); store(V, y) } is equivalent to { store(V, y); store(U, x) }.
More generally, ADT definitions often assume that any operation that
changes the state of one ADT instance has no effect on the state of any
other instance (including other instances of the same ADT) — unless the
ADT axioms imply that the two instances are connected (aliased) in that sense. For example, when extending the definition of abstract variable to include abstract records, the operation that selects a field from a record variable R must yield a variable V that is aliased to that part of R.
The definition of an abstract variable V may also restrict the stored values x to members of a specific set X, called the range or type of V. As in programming languages, such restrictions may simplify the description and analysis of algorithms, and improve their readability.
Note that this definition does not imply anything about the result of evaluating fetch(V) when V is un-initialized, that is, before performing any store operation on V.
An algorithm that does so is usually considered invalid, because its
effect is not defined. (However, there are some important algorithms
whose efficiency strongly depends on the assumption that such a fetch is legal, and returns some arbitrary value in the variable's range.)
Instance creation
Some
algorithms need to create new instances of some ADT (such as new
variables, or new stacks). To describe such algorithms, one usually
includes in the ADT definition a create() operation that yields an instance of the ADT, usually with axioms equivalent to
- the result of create() is distinct from any instance in use by the algorithm.
This axiom may be strengthened to exclude also partial aliasing with
other instances. On the other hand, this axiom still allows
implementations of create() to yield a previously created instance that has become inaccessible to the program.
Example: abstract stack (imperative)
As another example, an imperative-style definition of an abstract stack could specify that the state of a stack S can be modified only by the operations
- push(S, x), where x is some value of unspecified nature;
- pop(S), that yields a value as a result,
with the constraint that
- For any value x and any abstract variable V, the sequence of operations { push(S, x); V ← pop(S) } is equivalent to V ← x.
Since the assignment V ← x, by definition, cannot change the state of S, this condition implies that V ← pop(S) restores S to the state it had before the push(S, x). From this condition and from the properties of abstract variables, it follows, for example, that the sequence
- { push(S, x); push(S, y); U ← pop(S); push(S, z); V ← pop(S); W ← pop(S) }
where x, y, and z are any values, and U, V, W are pairwise distinct variables, is equivalent to
- { U ← y; V ← z; W ← x }
Here it is implicitly assumed that operations on a stack instance do
not modify the state of any other ADT instance, including other stacks;
that is,
- For any values x, y, and any distinct stacks S and T, the sequence { push(S, x); push(T, y) } is equivalent to { push(T, y); push(S, x) }.
An abstract stack definition usually includes also a Boolean-valued function empty(S) and a create() operation that returns a stack instance, with axioms equivalent to
- create() ≠ S for any prior stack S (a newly created stack is distinct from all previous stacks);
- empty(create()) (a newly created stack is empty);
- not empty(push(S, x)) (pushing something into a stack makes it non-empty).
Single-instance style
Sometimes
an ADT is defined as if only one instance of it existed during the
execution of the algorithm, and all operations were applied to that
instance, which is not explicitly notated. For example, the abstract
stack above could have been defined with operations push(x) and pop(), that operate on the
only existing stack. ADT definitions in this style can be easily
rewritten to admit multiple coexisting instances of the ADT, by adding
an explicit instance parameter (like S in the previous example) to every operation that uses or modifies the implicit instance.
On the other hand, some ADTs cannot be meaningfully defined
without assuming multiple instances. This is the case when a single
operation takes two distinct instances of the ADT as parameters. For an
example, consider augmenting the definition of the abstract stack with
an operation compare(S, T) that checks whether the stacks S and T contain the same items in the same order.
Functional-style definition
Another way to define an ADT, closer to the spirit of functional programming,
is to consider each state of the structure as a separate entity. In
this view, any operation that modifies the ADT is modeled as a mathematical function
that takes the old state as an argument, and returns the new state as
part of the result. Unlike the imperative operations, these functions
have no side effects.
Therefore, the order in which they are evaluated is immaterial, and the
same operation applied to the same arguments (including the same input
states) will always return the same results (and output states).
In the functional view, in particular, there is no way (or need)
to define an "abstract variable" with the semantics of imperative
variables (namely, with fetch and store operations). Instead of storing values into variables, one passes them as arguments to functions.
Example: abstract stack (functional)
For example, a complete functional-style definition of an abstract stack could use the three operations:
- push: takes a stack state and an arbitrary value, returns a stack state;
- top: takes a stack state, returns a value;
- pop: takes a stack state, returns a stack state.
In a functional-style definition there is no need for a create
operation. Indeed, there is no notion of "stack instance". The stack
states can be thought of as being potential states of a single stack
structure, and two stack states that contain the same values in the same
order are considered to be identical states. This view actually mirrors
the behavior of some concrete implementations, such as linked lists with hash cons.
Instead of create(), a functional-style definition of an abstract stack may assume the existence of a special stack state, the empty stack, designated by a special symbol like Λ or "()"; or define a bottom() operation that takes no arguments and returns this special stack state. Note that the axioms imply that
- push(Λ, x) ≠ Λ.
In a functional-style definition of a stack one does not need an empty predicate: instead, one can test whether a stack is empty by testing whether it is equal to Λ.
Note that these axioms do not define the effect of top(s) or pop(s), unless s is a stack state returned by a push. Since push leaves the stack non-empty, those two operations are undefined (hence invalid) when s = Λ. On the other hand, the axioms (and the lack of side effects) imply that push(s, x) = push(t, y) if and only if x = y and s = t.
As in some other branches of mathematics, it is customary to
assume also that the stack states are only those whose existence can be
proved from the axioms in a finite number of steps. In the abstract
stack example above, this rule means that every stack is a finite sequence of values, that becomes the empty stack (Λ) after a finite number of pops. By themselves, the axioms above do not exclude the existence of infinite stacks (that can be poped forever, each time yielding a different state) or circular stacks (that return to the same state after a finite number of pops). In particular, they do not exclude states s such that pop(s) = s or push(s, x) = s for some x. However, since one cannot obtain such stack states with the given operations, they are assumed "not to exist".
Whether to include complexity
Aside from the behavior in terms of axioms, it is also possible to include, in the definition of an ADT operation, their algorithmic complexity. Alexander Stepanov, designer of the C++ Standard Template Library, included complexity guarantees in the STL specification, arguing:
The reason for introducing the notion of abstract data types was to allow interchangeable software modules. You cannot have interchangeable modules unless these modules share similar complexity behavior. If I replace one module with another module with the same functional behavior but with different complexity tradeoffs, the user of this code will be unpleasantly surprised. I could tell him anything I like about data abstraction, and he still would not want to use the code. Complexity assertions have to be part of the interface.
— Alexander Stepanov
Advantages of abstract data typing
Encapsulation
Abstraction
provides a promise that any implementation of the ADT has certain
properties and abilities; knowing these is all that is required to make
use of an ADT object. The user does not need any technical knowledge of
how the implementation works to use the ADT. In this way, the
implementation may be complex but will be encapsulated in a simple
interface when it is actually used.
Localization of change
Code
that uses an ADT object will not need to be edited if the
implementation of the ADT is changed. Since any changes to the
implementation must still comply with the interface, and since code
using an ADT object may only refer to properties and abilities specified
in the interface, changes may be made to the implementation without
requiring any changes in code where the ADT is used.
Flexibility
Different
implementations of the ADT, having all the same properties and
abilities, are equivalent and may be used somewhat interchangeably in
code that uses the ADT. This gives a great deal of flexibility when
using ADT objects in different situations. For example, different
implementations of the ADT may be more efficient in different
situations; it is possible to use each in the situation where they are
preferable, thus increasing overall efficiency.
Typical operations
Some operations that are often specified for ADTs (possibly under other names) are
- compare(s, t), that tests whether two instances' states are equivalent in some sense;
- hash(s), that computes some standard hash function from the instance's state;
- print(s) or show(s), that produces a human-readable representation of the instance's state.
In imperative-style ADT definitions, one often finds also
- create(), that yields a new instance of the ADT;
- initialize(s), that prepares a newly created instance s for further operations, or resets it to some "initial state";
- copy(s, t), that puts instance s in a state equivalent to that of t;
- clone(t), that performs s ← create(), copy(s, t), and returns s;
- free(s) or destroy(s), that reclaims the memory and other resources used by s.
The free operation is not normally relevant or meaningful,
since ADTs are theoretical entities that do not "use memory". However,
it may be necessary when one needs to analyze the storage used by an
algorithm that uses the ADT. In that case one needs additional axioms
that specify how much memory each ADT instance uses, as a function of
its state, and how much of it is returned to the pool by free.