From Wikipedia, the free encyclopedia
 
 
In 
computer science, an 
abstract data type (
ADT) is a 
mathematical model for 
data types. An abstract data type is defined by its behavior (
semantics) from the point of view of a 
user,
 of the data, specifically in terms of possible values, possible 
operations on data of this type, and the behavior of these operations. 
This mathematical model contrasts with 
data structures, which are concrete representations of data, and are the point of view of an implementer, not a user.
 
Formally, an ADT may be defined as a "class of objects whose 
logical behavior is defined by a set of values and a set of operations"; this is analogous to an 
algebraic structure
 in mathematics. What is meant by "behavior" varies by author, with the 
two main types of formal specifications for behavior being 
axiomatic (algebraic) specification and an 
abstract model; these correspond to 
axiomatic semantics and 
operational semantics of an 
abstract machine, respectively. Some authors also include the 
computational complexity
 ("cost"), both in terms of time (for computing operations) and space 
(for representing values). In practice many common data types are not 
ADTs, as the abstraction is not perfect, and users must be aware of 
issues like 
arithmetic overflow
 that are due to the representation. For example, integers are often 
stored as fixed-width values (32-bit or 64-bit binary numbers), and thus
 experience 
integer overflow if the maximum value is exceeded.
 
ADTs are a theoretical concept, in computer science, used in the design and analysis of 
algorithms, data structures, and software systems, and do not correspond to specific features of 
computer languages—mainstream
 computer languages do not directly support formally specified ADTs. 
However, various language features correspond to certain aspects of 
ADTs, and are easily confused with ADTs proper; these include 
abstract types, 
opaque data types, 
protocols, and 
design by contract. ADTs were first proposed by 
Barbara Liskov and Stephen N. Zilles in 1974, as part of the development of the 
CLU language.
 
Examples
For example, 
integers
 are an ADT, defined as the values ..., −2, −1, 0, 1, 2, ..., and by the
 operations of addition, subtraction, multiplication, and division, 
together with greater than, less than, etc., which behave according to 
familiar mathematics (with care for 
integer division), independently of how the integers are represented by the computer.
 Explicitly, "behavior" includes obeying various axioms (associativity 
and commutativity of addition, etc.), and preconditions on operations 
(cannot divide by zero). Typically integers are represented in a data 
structure as 
binary numbers, most often as 
two's complement, but might be 
binary-coded decimal or in 
ones' complement, but the user is abstracted from the concrete choice of representation, and can simply use the data as data types.
 
An ADT consists not only of operations but also of values of the 
underlying data and of constraints on the operations. An "interface" 
typically refers only to the operations, and perhaps some of the 
constraints on the operations, notably pre-conditions and 
post-conditions, but not other constraints, such as relations between 
the operations.
For example, an abstract 
stack, which is a last-in-first-out structure, could be defined by three operations: 
push, that inserts a data item onto the stack; 
pop, that removes a data item from it; and 
peek or 
top, that accesses a data item on top of the stack without removal. An abstract 
queue, which is a first-in-first-out structure, would also have three operations: 
enqueue, that inserts a data item into the queue; 
dequeue, that removes the first data item from it; and 
front,
 that accesses and serves the first data item in the queue. There would 
be no way of differentiating these two data types, unless a mathematical
 constraint is introduced that for a stack specifies that each pop 
always returns the most recently pushed item that has not been popped 
yet. When 
analyzing the efficiency
 of algorithms that use stacks, one may also specify that all operations
 take the same time no matter how many data items have been pushed into 
the stack, and that the stack uses a constant amount of storage for each
 element.
 
Introduction
Abstract
 data types are purely theoretical entities, used (among other things) 
to simplify the description of abstract algorithms, to classify and 
evaluate data structures, and to formally describe the 
type systems of programming languages. However, an ADT may be 
implemented by specific 
data types or 
data structures, in many ways and in many programming languages; or described in a 
formal specification language. ADTs are often implemented as 
modules: the module's 
interface declares procedures that correspond to the ADT operations, sometimes with 
comments that describe the constraints. This 
information hiding strategy allows the implementation of the module to be changed without disturbing the 
client programs. 
 
Defining an abstract data type
An
 abstract data type is defined as a mathematical model of the data 
objects that make up a data type as well as the functions that operate 
on these objects.
There are no standard conventions for defining them. A broad division 
may be drawn between "imperative" and "functional" definition styles.
Imperative-style definition
In the philosophy of 
imperative programming languages, an abstract data structure is conceived as an entity that is 
mutable—meaning that it may be in different 
states
 at different times. Some operations may change the state of the ADT; 
therefore, the order in which operations are evaluated is important, and
 the same operation on the same entities may have different effects if 
executed at different times—just like the instructions of a computer, or
 the commands and procedures of an imperative language. To underscore 
this view, it is customary to say that the operations are 
executed or 
applied, rather than 
evaluated. The imperative style is often used when describing abstract algorithms. (See 
The Art of Computer Programming by 
Donald Knuth for more details)
 
Abstract variable
Imperative-style definitions of ADT often depend on the concept of an abstract variable, which may be regarded as the simplest non-trivial ADT. An abstract variable V is a mutable entity that admits two operations:
- store(V, x) where x is a value of unspecified nature;
 
- fetch(V), that yields a value,
 
with the constraint that
- fetch(V) always returns the value x used in the most recent store(V, x) operation on the same variable V.
 
As in so many programming languages, the operation store(V, x) is often written V ← x (or some similar notation), and fetch(V) is implied whenever a variable V is used in a context where a value is required. Thus, for example, V ← V + 1 is commonly understood to be a shorthand for store(V,fetch(V) + 1).
In this definition, it is implicitly assumed that storing a value into a variable U has no effect on the state of a distinct variable V. To make this assumption explicit, one could add the constraint that
- if U and V are distinct variables, the sequence { store(U, x); store(V, y) } is equivalent to { store(V, y); store(U, x) }.
 
More generally, ADT definitions often assume that any operation that 
changes the state of one ADT instance has no effect on the state of any 
other instance (including other instances of the same ADT) — unless the 
ADT axioms imply that the two instances are connected (
aliased) in that sense. For example, when extending the definition of abstract variable to include abstract 
records, the operation that selects a field from a record variable 
R must yield a variable 
V that is aliased to that part of 
R.
 
The definition of an abstract variable 
V may also restrict the stored values 
x to members of a specific set 
X, called the 
range or 
type of 
V. As in programming languages, such restrictions may simplify the description and 
analysis of algorithms, and improve their readability.
 
Note that this definition does not imply anything about the result of evaluating fetch(V) when V is un-initialized, that is, before performing any store operation on V.
 An algorithm that does so is usually considered invalid, because its 
effect is not defined. (However, there are some important algorithms 
whose efficiency strongly depends on the assumption that such a fetch is legal, and returns some arbitrary value in the variable's range.)
Instance creation
Some
 algorithms need to create new instances of some ADT (such as new 
variables, or new stacks). To describe such algorithms, one usually 
includes in the ADT definition a create() operation that yields an instance of the ADT, usually with axioms equivalent to
- the result of create() is distinct from any instance in use by the algorithm.
 
This axiom may be strengthened to exclude also partial aliasing with 
other instances. On the other hand, this axiom still allows 
implementations of create() to yield a previously created instance that has become inaccessible to the program.
Example: abstract stack (imperative)
As another example, an imperative-style definition of an abstract stack could specify that the state of a stack S can be modified only by the operations
- push(S, x), where x is some value of unspecified nature;
 
- pop(S), that yields a value as a result,
 
with the constraint that
- For any value x and any abstract variable V, the sequence of operations { push(S, x); V ← pop(S) } is equivalent to V ← x.
 
Since the assignment V ← x, by definition, cannot change the state of S, this condition implies that V ← pop(S) restores S to the state it had before the push(S, x). From this condition and from the properties of abstract variables, it follows, for example, that the sequence
- { push(S, x); push(S, y); U ← pop(S); push(S, z); V ← pop(S); W ← pop(S) }
 
where x, y, and z are any values, and U, V, W are pairwise distinct variables, is equivalent to
- { U ← y; V ← z; W ← x }
 
Here it is implicitly assumed that operations on a stack instance do 
not modify the state of any other ADT instance, including other stacks; 
that is,
- For any values x, y, and any distinct stacks S and T, the sequence { push(S, x); push(T, y) } is equivalent to { push(T, y); push(S, x) }.
 
An abstract stack definition usually includes also a 
Boolean-valued function 
empty(
S) and a 
create() operation that returns a stack instance, with axioms equivalent to
 
- create() ≠ S for any prior stack S (a newly created stack is distinct from all previous stacks);
 
- empty(create()) (a newly created stack is empty);
 
- not empty(push(S, x)) (pushing something into a stack makes it non-empty).
 
Single-instance style
Sometimes
 an ADT is defined as if only one instance of it existed during the 
execution of the algorithm, and all operations were applied to that 
instance, which is not explicitly notated. For example, the abstract 
stack above could have been defined with operations push(x) and pop(), that operate on the
 only existing stack. ADT definitions in this style can be easily 
rewritten to admit multiple coexisting instances of the ADT, by adding 
an explicit instance parameter (like S in the previous example) to every operation that uses or modifies the implicit instance.
On the other hand, some ADTs cannot be meaningfully defined 
without assuming multiple instances. This is the case when a single 
operation takes two distinct instances of the ADT as parameters. For an 
example, consider augmenting the definition of the abstract stack with 
an operation compare(S, T) that checks whether the stacks S and T contain the same items in the same order.
Functional-style definition
Another way to define an ADT, closer to the spirit of 
functional programming,
 is to consider each state of the structure as a separate entity. In 
this view, any operation that modifies the ADT is modeled as a 
mathematical function
 that takes the old state as an argument, and returns the new state as 
part of the result. Unlike the imperative operations, these functions 
have no 
side effects.
 Therefore, the order in which they are evaluated is immaterial, and the
 same operation applied to the same arguments (including the same input 
states) will always return the same results (and output states).
 
In the functional view, in particular, there is no way (or need) 
to define an "abstract variable" with the semantics of imperative 
variables (namely, with fetch and store operations). Instead of storing values into variables, one passes them as arguments to functions.
Example: abstract stack (functional)
For example, a complete functional-style definition of an abstract stack could use the three operations:
- push: takes a stack state and an arbitrary value, returns a stack state;
 
- top: takes a stack state, returns a value;
 
- pop: takes a stack state, returns a stack state.
 
In a functional-style definition there is no need for a 
create
 operation. Indeed, there is no notion of "stack instance". The stack 
states can be thought of as being potential states of a single stack 
structure, and two stack states that contain the same values in the same
 order are considered to be identical states. This view actually mirrors
 the behavior of some concrete implementations, such as 
linked lists with 
hash cons.
 
Instead of create(), a functional-style definition of an abstract stack may assume the existence of a special stack state, the empty stack, designated by a special symbol like Λ or "()"; or define a bottom() operation that takes no arguments and returns this special stack state. Note that the axioms imply that
In a functional-style definition of a stack one does not need an empty predicate: instead, one can test whether a stack is empty by testing whether it is equal to Λ. 
Note that these axioms do not define the effect of 
top(
s) or 
pop(
s), unless 
s is a stack state returned by a 
push. Since 
push leaves the stack non-empty, those two operations are undefined (hence invalid) when 
s = Λ. On the other hand, the axioms (and the lack of side effects) imply that 
push(
s, 
x) = 
push(
t, 
y) 
if and only if x = 
y and 
s = 
t.
 
As in some other branches of mathematics, it is customary to 
assume also that the stack states are only those whose existence can be 
proved from the axioms in a finite number of steps. In the abstract 
stack example above, this rule means that every stack is a finite sequence of values, that becomes the empty stack (Λ) after a finite number of pops. By themselves, the axioms above do not exclude the existence of infinite stacks (that can be poped forever, each time yielding a different state) or circular stacks (that return to the same state after a finite number of pops). In particular, they do not exclude states s such that pop(s) = s or push(s, x) = s for some x. However, since one cannot obtain such stack states with the given operations, they are assumed "not to exist".
Whether to include complexity
The reason for introducing the 
notion of abstract data types was to allow interchangeable software 
modules. You cannot have interchangeable modules unless these modules 
share similar complexity behavior. If I replace one module with another 
module with the same functional behavior but with different complexity 
tradeoffs, the user of this code will be unpleasantly surprised. I could
 tell him anything I like about data abstraction, and he still would not
 want to use the code. Complexity assertions have to be part of the 
interface.
— Alexander Stepanov
 
Advantages of abstract data typing
Encapsulation
Abstraction
 provides a promise that any implementation of the ADT has certain 
properties and abilities; knowing these is all that is required to make 
use of an ADT object. The user does not need any technical knowledge of 
how the implementation works to use the ADT. In this way, the 
implementation may be complex but will be encapsulated in a simple 
interface when it is actually used. 
 
Localization of change
Code
 that uses an ADT object will not need to be edited if the 
implementation of the ADT is changed. Since any changes to the 
implementation must still comply with the interface, and since code 
using an ADT object may only refer to properties and abilities specified
 in the interface, changes may be made to the implementation without 
requiring any changes in code where the ADT is used.
Flexibility
Different
 implementations of the ADT, having all the same properties and 
abilities, are equivalent and may be used somewhat interchangeably in 
code that uses the ADT. This gives a great deal of flexibility when 
using ADT objects in different situations. For example, different 
implementations of the ADT may be more efficient in different 
situations; it is possible to use each in the situation where they are 
preferable, thus increasing overall efficiency.
Typical operations
Some operations that are often specified for ADTs (possibly under other names) are
- compare(s, t), that tests whether two instances' states are equivalent in some sense;
 
- hash(s), that computes some standard hash function from the instance's state;
 
- print(s) or show(s), that produces a human-readable representation of the instance's state.
 
In imperative-style ADT definitions, one often finds also
- create(), that yields a new instance of the ADT;
 
- initialize(s), that prepares a newly created instance s for further operations, or resets it to some "initial state";
 
- copy(s, t), that puts instance s in a state equivalent to that of t;
 
- clone(t), that performs s ← create(), copy(s, t), and returns s;
 
- free(s) or destroy(s), that reclaims the memory and other resources used by s.
 
The free operation is not normally relevant or meaningful,
 since ADTs are theoretical entities that do not "use memory". However, 
it may be necessary when one needs to analyze the storage used by an 
algorithm that uses the ADT. In that case one needs additional axioms 
that specify how much memory each ADT instance uses, as a function of 
its state, and how much of it is returned to the pool by free.