Search This Blog

Thursday, May 3, 2018

Computability theory

From Wikipedia, the free encyclopedia
Computability theory, also known as recursion theory, is a branch of mathematical logic, of computer science, and of the theory of computation that originated in the 1930s with the study of computable functions and Turing degrees. The field has since expanded to include the study of generalized computability and definability. In these areas, recursion theory overlaps with proof theory and effective descriptive set theory.

Basic questions addressed by recursion theory include:
  • What does it mean for a function on the natural numbers to be computable?
  • How can noncomputable functions be classified into a hierarchy based on their level of noncomputability?
Although there is considerable overlap in terms of knowledge and methods, mathematical recursion theorists study the theory of relative computability, reducibility notions, and degree structures; those in the computer science field focus on the theory of subrecursive hierarchies, formal methods, and formal languages.

Computable and uncomputable sets

Recursion theory originated in the 1930s, with work of Kurt Gödel, Alonzo Church, Rózsa Péter, Alan Turing, Stephen Kleene, and Emil Post.[1][2].

The fundamental results the researchers obtained established Turing computability as the correct formalization of the informal idea of effective calculation. These results led Stephen Kleene (1952) to coin the two names "Church's thesis" (Kleene 1952:300) and "Turing's Thesis" (Kleene 1952:376). Nowadays these are often considered as a single hypothesis, the Church–Turing thesis, which states that any function that is computable by an algorithm is a computable function. Although initially skeptical, by 1946 Gödel argued in favor of this thesis:
"Tarski has stressed in his lecture (and I think justly) the great importance of the concept of general recursiveness (or Turing's computability). It seems to me that this importance is largely due to the fact that with this concept one has for the first time succeeded in giving an absolute notion to an interesting epistemological notion, i.e., one not depending on the formalism chosen.*"(Gödel 1946 in Davis 1965:84).[3]
With a definition of effective calculation came the first proofs that there are problems in mathematics that cannot be effectively decided. Church (1936a, 1936b) and Turing (1936), inspired by techniques used by Gödel (1931) to prove his incompleteness theorems, independently demonstrated that the Entscheidungsproblem is not effectively decidable. This result showed that there is no algorithmic procedure that can correctly decide whether arbitrary mathematical propositions are true or false.

Many problems in mathematics have been shown to be undecidable after these initial examples were established. In 1947, Markov and Post published independent papers showing that the word problem for semigroups cannot be effectively decided. Extending this result, Pyotr Novikov and William Boone showed independently in the 1950s that the word problem for groups is not effectively solvable: there is no effective procedure that, given a word in a finitely presented group, will decide whether the element represented by the word is the identity element of the group. In 1970, Yuri Matiyasevich proved (using results of Julia Robinson) Matiyasevich's theorem, which implies that Hilbert's tenth problem has no effective solution; this problem asked whether there is an effective procedure to decide whether a Diophantine equation over the integers has a solution in the integers. The list of undecidable problems gives additional examples of problems with no computable solution.

The study of which mathematical constructions can be effectively performed is sometimes called recursive mathematics; the Handbook of Recursive Mathematics (Ershov et al. 1998) covers many of the known results in this field.

Turing computability

The main form of computability studied in recursion theory was introduced by Turing (1936). A set of natural numbers is said to be a computable set (also called a decidable, recursive, or Turing computable set) if there is a Turing machine that, given a number n, halts with output 1 if n is in the set and halts with output 0 if n is not in the set. A function f from the natural numbers to themselves is a recursive or (Turing) computable function if there is a Turing machine that, on input n, halts and returns output f(n). The use of Turing machines here is not necessary; there are many other models of computation that have the same computing power as Turing machines; for example the μ-recursive functions obtained from primitive recursion and the μ operator.

The terminology for recursive functions and sets is not completely standardized. The definition in terms of μ-recursive functions as well as a different definition of rekursiv functions by Gödel led to the traditional name recursive for sets and functions computable by a Turing machine. The word decidable stems from the German word Entscheidungsproblem which was used in the original papers of Turing and others. In contemporary use, the term "computable function" has various definitions: according to Cutland (1980), it is a partial recursive function (which can be undefined for some inputs), while according to Soare (1987) it is a total recursive (equivalently, general recursive) function. This article follows the second of these conventions. Soare (1996) gives additional comments about the terminology.

Not every set of natural numbers is computable. The halting problem, which is the set of (descriptions of) Turing machines that halt on input 0, is a well-known example of a noncomputable set. The existence of many noncomputable sets follows from the facts that there are only countably many Turing machines, and thus only countably many computable sets, but there are uncountably many sets of natural numbers.

Although the halting problem is not computable, it is possible to simulate program execution and produce an infinite list of the programs that do halt. Thus the halting problem is an example of a recursively enumerable set, which is a set that can be enumerated by a Turing machine (other terms for recursively enumerable include computably enumerable and semidecidable). Equivalently, a set is recursively enumerable if and only if it is the range of some computable function. The recursively enumerable sets, although not decidable in general, have been studied in detail in recursion theory.

Areas of research

Beginning with the theory of recursive sets and functions described above, the field of recursion theory has grown to include the study of many closely related topics. These are not independent areas of research: each of these areas draws ideas and results from the others, and most recursion theorists are familiar with the majority of them.

Relative computability and the Turing degrees

Recursion theory in mathematical logic has traditionally focused on relative computability, a generalization of Turing computability defined using oracle Turing machines, introduced by Turing (1939). An oracle Turing machine is a hypothetical device which, in addition to performing the actions of a regular Turing machine, is able to ask questions of an oracle, which is a particular set of natural numbers. The oracle machine may only ask questions of the form "Is n in the oracle set?". Each question will be immediately answered correctly, even if the oracle set is not computable. Thus an oracle machine with a noncomputable oracle will be able to compute sets that a Turing machine without an oracle cannot.

Informally, a set of natural numbers A is Turing reducible to a set B if there is an oracle machine that correctly tells whether numbers are in A when run with B as the oracle set (in this case, the set A is also said to be (relatively) computable from B and recursive in B). If a set A is Turing reducible to a set B and B is Turing reducible to A then the sets are said to have the same Turing degree (also called degree of unsolvability). The Turing degree of a set gives a precise measure of how uncomputable the set is.

The natural examples of sets that are not computable, including many different sets that encode variants of the halting problem, have two properties in common:
  1. They are recursively enumerable, and
  2. Each can be translated into any other via a many-one reduction. That is, given such sets A and B, there is a total computable function f such that A = {x : f(x) ∈ B}. These sets are said to be many-one equivalent (or m-equivalent).
Many-one reductions are "stronger" than Turing reductions: if a set A is many-one reducible to a set B, then A is Turing reducible to B, but the converse does not always hold. Although the natural examples of noncomputable sets are all many-one equivalent, it is possible to construct recursively enumerable sets A and B such that A is Turing reducible to B but not many-one reducible to B. It can be shown that every recursively enumerable set is many-one reducible to the halting problem, and thus the halting problem is the most complicated recursively enumerable set with respect to many-one reducibility and with respect to Turing reducibility. Post (1944) asked whether every recursively enumerable set is either computable or Turing equivalent to the halting problem, that is, whether there is no recursively enumerable set with a Turing degree intermediate between those two.

As intermediate results, Post defined natural types of recursively enumerable sets like the simple, hypersimple and hyperhypersimple sets. Post showed that these sets are strictly between the computable sets and the halting problem with respect to many-one reducibility. Post also showed that some of them are strictly intermediate under other reducibility notions stronger than Turing reducibility. But Post left open the main problem of the existence of recursively enumerable sets of intermediate Turing degree; this problem became known as Post's problem. After ten years, Kleene and Post showed in 1954 that there are intermediate Turing degrees between those of the computable sets and the halting problem, but they failed to show that any of these degrees contains a recursively enumerable set. Very soon after this, Friedberg and Muchnik independently solved Post's problem by establishing the existence of recursively enumerable sets of intermediate degree. This groundbreaking result opened a wide study of the Turing degrees of the recursively enumerable sets which turned out to possess a very complicated and non-trivial structure.

There are uncountably many sets that are not recursively enumerable, and the investigation of the Turing degrees of all sets is as central in recursion theory as the investigation of the recursively enumerable Turing degrees. Many degrees with special properties were constructed: hyperimmune-free degrees where every function computable relative to that degree is majorized by a (unrelativized) computable function; high degrees relative to which one can compute a function f which dominates every computable function g in the sense that there is a constant c depending on g such that g(x) < f(x) for all x > c; random degrees containing algorithmically random sets; 1-generic degrees of 1-generic sets; and the degrees below the halting problem of limit-recursive sets.

The study of arbitrary (not necessarily recursively enumerable) Turing degrees involves the study of the Turing jump. Given a set A, the Turing jump of A is a set of natural numbers encoding a solution to the halting problem for oracle Turing machines running with oracle A. The Turing jump of any set is always of higher Turing degree than the original set, and a theorem of Friedburg shows that any set that computes the Halting problem can be obtained as the Turing jump of another set. Post's theorem establishes a close relationship between the Turing jump operation and the arithmetical hierarchy, which is a classification of certain subsets of the natural numbers based on their definability in arithmetic.

Much recent research on Turing degrees has focused on the overall structure of the set of Turing degrees and the set of Turing degrees containing recursively enumerable sets. A deep theorem of Shore and Slaman (1999) states that the function mapping a degree x to the degree of its Turing jump is definable in the partial order of the Turing degrees. A recent survey by Ambos-Spies and Fejer (2006) gives an overview of this research and its historical progression.

Other reducibilities

An ongoing area of research in recursion theory studies reducibility relations other than Turing reducibility. Post (1944) introduced several strong reducibilities, so named because they imply truth-table reducibility. A Turing machine implementing a strong reducibility will compute a total function regardless of which oracle it is presented with. Weak reducibilities are those where a reduction process may not terminate for all oracles; Turing reducibility is one example.
The strong reducibilities include:
One-one reducibility
A is one-one reducible (or 1-reducible) to B if there is a total computable injective function f such that each n is in A if and only if f(n) is in B.
Many-one reducibility
This is essentially one-one reducibility without the constraint that f be injective. A is many-one reducible (or m-reducible) to B if there is a total computable function f such that each n is in A if and only if f(n) is in B.
Truth-table reducibility
A is truth-table reducible to B if A is Turing reducible to B via an oracle Turing machine that computes a total function regardless of the oracle it is given. Because of compactness of Cantor space, this is equivalent to saying that the reduction presents a single list of questions (depending only on the input) to the oracle simultaneously, and then having seen their answers is able to produce an output without asking additional questions regardless of the oracle's answer to the initial queries. Many variants of truth-table reducibility have also been studied.
Further reducibilities (positive, disjunctive, conjunctive, linear and their weak and bounded versions) are discussed in the article Reduction (recursion theory).

The major research on strong reducibilities has been to compare their theories, both for the class of all recursively enumerable sets as well as for the class of all subsets of the natural numbers. Furthermore, the relations between the reducibilities has been studied. For example, it is known that every Turing degree is either a truth-table degree or is the union of infinitely many truth-table degrees.

Reducibilities weaker than Turing reducibility (that is, reducibilities that are implied by Turing reducibility) have also been studied. The most well known are arithmetical reducibility and hyperarithmetical reducibility. These reducibilities are closely connected to definability over the standard model of arithmetic.

Rice's theorem and the arithmetical hierarchy

Rice showed that for every nontrivial class C (which contains some but not all r.e. sets) the index set E = {e: the eth r.e. set We is in C} has the property that either the halting problem or its complement is many-one reducible to E, that is, can be mapped using a many-one reduction to E (see Rice's theorem for more detail). But, many of these index sets are even more complicated than the halting problem. These type of sets can be classified using the arithmetical hierarchy. For example, the index set FIN of class of all finite sets is on the level Σ2, the index set REC of the class of all recursive sets is on the level Σ3, the index set COFIN of all cofinite sets is also on the level Σ3 and the index set COMP of the class of all Turing-complete sets Σ4. These hierarchy levels are defined inductively, Σn+1 contains just all sets which are recursively enumerable relative to Σn; Σ1 contains the recursively enumerable sets. The index sets given here are even complete for their levels, that is, all the sets in these levels can be many-one reduced to the given index sets.

Reverse mathematics

The program of reverse mathematics asks which set-existence axioms are necessary to prove particular theorems of mathematics in subsystems of second-order arithmetic. This study was initiated by Harvey Friedman and was studied in detail by Stephen Simpson and others; Simpson (1999) gives a detailed discussion of the program. The set-existence axioms in question correspond informally to axioms saying that the powerset of the natural numbers is closed under various reducibility notions. The weakest such axiom studied in reverse mathematics is recursive comprehension, which states that the powerset of the naturals is closed under Turing reducibility.

Numberings

A numbering is an enumeration of functions; it has two parameters, e and x and outputs the value of the e-th function in the numbering on the input x. Numberings can be partial-recursive although some of its members are total recursive, that is, computable functions. Admissible numberings are those into which all others can be translated. A Friedberg numbering (named after its discoverer) is a one-one numbering of all partial-recursive functions; it is necessarily not an admissible numbering. Later research dealt also with numberings of other classes like classes of recursively enumerable sets. Goncharov discovered for example a class of recursively enumerable sets for which the numberings fall into exactly two classes with respect to recursive isomorphisms.

The priority method

Post's problem was solved with a method called the priority method; a proof using this method is called a priority argument. This method is primarily used to construct recursively enumerable sets with particular properties. To use this method, the desired properties of the set to be constructed are broken up into an infinite list of goals, known as requirements, so that satisfying all the requirements will cause the set constructed to have the desired properties. Each requirement is assigned to a natural number representing the priority of the requirement; so 0 is assigned to the most important priority, 1 to the second most important, and so on. The set is then constructed in stages, each stage attempting to satisfy one of more of the requirements by either adding numbers to the set or banning numbers from the set so that the final set will satisfy the requirement. It may happen that satisfying one requirement will cause another to become unsatisfied; the priority order is used to decide what to do in such an event.

Priority arguments have been employed to solve many problems in recursion theory, and have been classified into a hierarchy based on their complexity (Soare 1987). Because complex priority arguments can be technical and difficult to follow, it has traditionally been considered desirable to prove results without priority arguments, or to see if results proved with priority arguments can also be proved without them. For example, Kummer published a paper on a proof for the existence of Friedberg numberings without using the priority method.

The lattice of recursively enumerable sets

When Post defined the notion of a simple set as an r.e. set with an infinite complement not containing any infinite r.e. set, he started to study the structure of the recursively enumerable sets under inclusion. This lattice became a well-studied structure. Recursive sets can be defined in this structure by the basic result that a set is recursive if and only if the set and its complement are both recursively enumerable. Infinite r.e. sets have always infinite recursive subsets; but on the other hand, simple sets exist but do not have a coinfinite recursive superset. Post (1944) introduced already hypersimple and hyperhypersimple sets; later maximal sets were constructed which are r.e. sets such that every r.e. superset is either a finite variant of the given maximal set or is co-finite. Post's original motivation in the study of this lattice was to find a structural notion such that every set which satisfies this property is neither in the Turing degree of the recursive sets nor in the Turing degree of the halting problem. Post did not find such a property and the solution to his problem applied priority methods instead; Harrington and Soare (1991) found eventually such a property.

Automorphism problems

Another important question is the existence of automorphisms in recursion-theoretic structures. One of these structures is that one of recursively enumerable sets under inclusion modulo finite difference; in this structure, A is below B if and only if the set difference B − A is finite. Maximal sets (as defined in the previous paragraph) have the property that they cannot be automorphic to non-maximal sets, that is, if there is an automorphism of the recursive enumerable sets under the structure just mentioned, then every maximal set is mapped to another maximal set. Soare (1974) showed that also the converse holds, that is, every two maximal sets are automorphic. So the maximal sets form an orbit, that is, every automorphism preserves maximality and any two maximal sets are transformed into each other by some automorphism. Harrington gave a further example of an automorphic property: that of the creative sets, the sets which are many-one equivalent to the halting problem.

Besides the lattice of recursively enumerable sets, automorphisms are also studied for the structure of the Turing degrees of all sets as well as for the structure of the Turing degrees of r.e. sets. In both cases, Cooper claims to have constructed nontrivial automorphisms which map some degrees to other degrees; this construction has, however, not been verified and some colleagues believe that the construction contains errors and that the question of whether there is a nontrivial automorphism of the Turing degrees is still one of the main unsolved questions in this area (Slaman and Woodin 1986, Ambos-Spies and Fejer 2006).

Kolmogorov complexity

The field of Kolmogorov complexity and algorithmic randomness was developed during the 1960s and 1970s by Chaitin, Kolmogorov, Levin, Martin-Löf and Solomonoff (the names are given here in alphabetical order; much of the research was independent, and the unity of the concept of randomness was not understood at the time). The main idea is to consider a universal Turing machine U and to measure the complexity of a number (or string) x as the length of the shortest input p such that U(p) outputs x. This approach revolutionized earlier ways to determine when an infinite sequence (equivalently, characteristic function of a subset of the natural numbers) is random or not by invoking a notion of randomness for finite objects. Kolmogorov complexity became not only a subject of independent study but is also applied to other subjects as a tool for obtaining proofs. There are still many open problems in this area. For that reason, a recent research conference in this area was held in January 2007[4] and a list of open problems[5] is maintained by Joseph Miller and Andre Nies.

Frequency computation

This branch of recursion theory analyzed the following question: For fixed m and n with 0 < m < n, for which functions A is it possible to compute for any different n inputs x1x2, ..., xn a tuple of n numbers y1,y2,...,yn such that at least m of the equations A(xk) = yk are true. Such sets are known as (mn)-recursive sets. The first major result in this branch of Recursion Theory is Trakhtenbrot's result that a set is computable if it is (mn)-recursive for some mn with 2m > n. On the other hand, Jockusch's semirecursive sets (which were already known informally before Jockusch introduced them 1968) are examples of a set which is (mn)-recursive if and only if 2m < n + 1. There are uncountably many of these sets and also some recursively enumerable but noncomputable sets of this type. Later, Degtev established a hierarchy of recursively enumerable sets that are (1, n + 1)-recursive but not (1, n)-recursive. After a long phase of research by Russian scientists, this subject became repopularized in the west by Beigel's thesis on bounded queries, which linked frequency computation to the above-mentioned bounded reducibilities and other related notions. One of the major results was Kummer's Cardinality Theory which states that a set A is computable if and only if there is an n such that some algorithm enumerates for each tuple of n different numbers up to n many possible choices of the cardinality of this set of n numbers intersected with A; these choices must contain the true cardinality but leave out at least one false one.

Inductive inference

This is the recursion-theoretic branch of learning theory. It is based on Gold's model of learning in the limit from 1967 and has developed since then more and more models of learning. The general scenario is the following: Given a class S of computable functions, is there a learner (that is, recursive functional) which outputs for any input of the form (f(0),f(1),...,f(n)) a hypothesis. A learner M learns a function f if almost all hypotheses are the same index e of f with respect to a previously agreed on acceptable numbering of all computable functions; M learns S if M learns every f in S. Basic results are that all recursively enumerable classes of functions are learnable while the class REC of all computable functions is not learnable. Many related models have been considered and also the learning of classes of recursively enumerable sets from positive data is a topic studied from Gold's pioneering paper in 1967 onwards.

Generalizations of Turing computability

Recursion theory includes the study of generalized notions of this field such as arithmetic reducibility, hyperarithmetical reducibility and α-recursion theory, as described by Sacks (1990). These generalized notions include reducibilities that cannot be executed by Turing machines but are nevertheless natural generalizations of Turing reducibility. These studies include approaches to investigate the analytical hierarchy which differs from the arithmetical hierarchy by permitting quantification over sets of natural numbers in addition to quantification over individual numbers. These areas are linked to the theories of well-orderings and trees; for example the set of all indices of recursive (nonbinary) trees without infinite branches is complete for level \Pi _{1}^{1} of the analytical hierarchy. Both Turing reducibility and hyperarithmetical reducibility are important in the field of effective descriptive set theory. The even more general notion of degrees of constructibility is studied in set theory.

Continuous computability theory

Computability theory for digital computation is well developed. Computability theory is less well developed for analog computation that occurs in analog computers, analog signal processing, analog electronics, neural networks and continuous-time control theory, modelled by differential equations and continuous dynamical systems (Orponen 1997; Moore 1996).

Relationships between definability, proof and computability

There are close relationships between the Turing degree of a set of natural numbers and the difficulty (in terms of the arithmetical hierarchy) of defining that set using a first-order formula. One such relationship is made precise by Post's theorem. A weaker relationship was demonstrated by Kurt Gödel in the proofs of his completeness theorem and incompleteness theorems. Gödel's proofs show that the set of logical consequences of an effective first-order theory is a recursively enumerable set, and that if the theory is strong enough this set will be uncomputable. Similarly, Tarski's indefinability theorem can be interpreted both in terms of definability and in terms of computability.

Recursion theory is also linked to second order arithmetic, a formal theory of natural numbers and sets of natural numbers. The fact that certain sets are computable or relatively computable often implies that these sets can be defined in weak subsystems of second order arithmetic. The program of reverse mathematics uses these subsystems to measure the noncomputability inherent in well known mathematical theorems. Simpson (1999) discusses many aspects of second-order arithmetic and reverse mathematics.

The field of proof theory includes the study of second-order arithmetic and Peano arithmetic, as well as formal theories of the natural numbers weaker than Peano arithmetic. One method of classifying the strength of these weak systems is by characterizing which computable functions the system can prove to be total (see Fairtlough and Wainer (1998)). For example, in primitive recursive arithmetic any computable function that is provably total is actually primitive recursive, while Peano arithmetic proves that functions like the Ackermann function, which are not primitive recursive, are total. Not every total computable function is provably total in Peano arithmetic, however; an example of such a function is provided by Goodstein's theorem.

Name of the subject

The field of mathematical logic dealing with computability and its generalizations has been called "recursion theory" since its early days. Robert I. Soare, a prominent researcher in the field, has proposed (Soare 1996) that the field should be called "computability theory" instead. He argues that Turing's terminology using the word "computable" is more natural and more widely understood than the terminology using the word "recursive" introduced by Kleene. Many contemporary researchers have begun to use this alternate terminology.[6] These researchers also use terminology such as partial computable function and computably enumerable (c.e.) set instead of partial recursive function and recursively enumerable (r.e.) set. Not all researchers have been convinced, however, as explained by Fortnow[7] and Simpson.[8] Some commentators argue that both the names recursion theory and computability theory fail to convey the fact that most of the objects studied in recursion theory are not computable.[9]

Rogers (1967) has suggested that a key property of recursion theory is that its results and structures should be invariant under computable bijections on the natural numbers (this suggestion draws on the ideas of the Erlangen program in geometry). The idea is that a computable bijection merely renames numbers in a set, rather than indicating any structure in the set, much as a rotation of the Euclidean plane does not change any geometric aspect of lines drawn on it. Since any two infinite computable sets are linked by a computable bijection, this proposal identifies all the infinite computable sets (the finite computable sets are viewed as trivial). According to Rogers, the sets of interest in recursion theory are the noncomputable sets, partitioned into equivalence classes by computable bijections of the natural numbers.

Professional organizations

The main professional organization for recursion theory is the Association for Symbolic Logic, which holds several research conferences each year. The interdisciplinary research Association Computability in Europe (CiE) also organizes a series of annual conferences.

Distributed computing

From Wikipedia, the free encyclopedia

Distributed computing is a field of computer science that studies distributed systems. A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages.[1] The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components.[1] Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications.

A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs.[2] There are many alternatives for the message passing mechanism, including pure HTTP, RPC-like connectors and message queues[3].

A goal and challenge pursued by some computer scientists and practitioners in distributed systems is location transparency; however, this goal has fallen out of favour in industry, as distributed systems are different from conventional non-distributed systems, and the differences, such as network partitions, partial system failures, and partial upgrades, cannot simply be "papered over" by attempts at "transparency" (see CAP theorem).[citation needed]

Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers,[4] which communicate with each other by message passing.[5]

Introduction

The word parallel in terms such as "parallel system", "distributed programming", and "distributed algorithm" originally referred to computer networks where individual computers were physically distributed within some geographical area.[6] The terms are nowadays used in a much wider sense, even referring to autonomous processes that run on the same physical computer and interact with each other by message passing.[5]

While there is no single definition of a distributed system,[7] the following defining properties are commonly used:
  • There are several autonomous computational entities (computers or nodes), each of which has its own local memory.[8]
  • The entities communicate with each other by message passing.[9]
A distributed system may have a common goal, such as solving a large computational problem;[10] the user then perceives the collection of autonomous processors as a unit. Alternatively, each computer may have its own user with individual needs, and the purpose of the distributed system is to coordinate the use of shared resources or provide communication services to the users.[11]

Other typical properties of distributed systems include the following:
  • The system has to tolerate failures in individual computers.[12]
  • The structure of the system (network topology, network latency, number of computers) is not known in advance, the system may consist of different kinds of computers and network links, and the system may change during the execution of a distributed program.[13]
  • Each computer has only a limited, incomplete view of the system. Each computer may know only one part of the input.[14]

Parallel and distributed computing


(a), (b): a distributed system.
(c): a parallel system.

Distributed systems are groups of networked computers, which have the same goal for their work. The terms "concurrent computing", "parallel computing", and "distributed computing" have a lot of overlap, and no clear distinction exists between them.[15] The same system may be characterized both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel.[16] Parallel computing may be seen as a particular tightly coupled form of distributed computing,[17] and distributed computing may be seen as a loosely coupled form of parallel computing.[7] Nevertheless, it is possible to roughly classify concurrent systems as "parallel" or "distributed" using the following criteria:
  • In parallel computing, all processors may have access to a shared memory to exchange information between processors.[18]
  • In distributed computing, each processor has its own private memory (distributed memory). Information is exchanged by passing messages between the processors.[19]
The figure on the right illustrates the difference between distributed and parallel systems. Figure (a) is a schematic view of a typical distributed system; the system is represented as a network topology in which each node is a computer and each line connecting the nodes is a communication link. Figure (b) shows the same distributed system in more detail: each computer has its own local memory, and information can be exchanged only by passing messages from one node to another by using the available communication links. Figure (c) shows a parallel system in which each processor has a direct access to a shared memory.

The situation is further complicated by the traditional uses of the terms parallel and distributed algorithm that do not quite match the above definitions of parallel and distributed systems (see below for more detailed discussion). Nevertheless, as a rule of thumb, high-performance parallel computation in a shared-memory multiprocessor uses parallel algorithms while the coordination of a large-scale distributed system uses distributed algorithms.[citation needed]

History

The use of concurrent processes that communicate by message-passing has its roots in operating system architectures studied in the 1960s.[20] The first widespread distributed systems were local-area networks such as Ethernet, which was invented in the 1970s.[21]

ARPANET, the predecessor of the Internet, was introduced in the late 1960s, and ARPANET e-mail was invented in the early 1970s. E-mail became the most successful application of ARPANET,[22] and it is probably the earliest example of a large-scale distributed application. In addition to ARPANET, and its successor, the Internet, other early worldwide computer networks included Usenet and FidoNet from the 1980s, both of which were used to support distributed discussion systems.[citation needed]

The study of distributed computing became its own branch of computer science in the late 1970s and early 1980s. The first conference in the field, Symposium on Principles of Distributed Computing (PODC), dates back to 1982, and its European counterpart International Symposium on Distributed Computing (DISC) was first held in 1985.[citation needed]

Architectures

Various hardware and software architectures are used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely coupled devices and cables. At a higher level, it is necessary to interconnect processes running on those CPUs with some sort of communication system.[citation needed]

Distributed programming typically falls into one of several basic architectures: client–server, three-tier, n-tier, or peer-to-peer; or categories: loose coupling, or tight coupling.[23]
  • Client–server: architectures where smart clients contact the server for data then format and display it to the users. Input at the client is committed back to the server when it represents a permanent change.
  • Three-tier: architectures that move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are three-tier.
  • n-tier: architectures that refer typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers.
  • Peer-to-peer: architectures where there are no special machines that provide a service or manage the network resources.[24]:227 Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and as servers[25].
Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in a master/slave relationship. Alternatively, a "database-centric" architecture can enable distributed computing to be done without any form of direct inter-process communication, by utilizing a shared database.[26]

Applications

Reasons for using distributed systems and distributed computing may include:
  1. The very nature of an application may require the use of a communication network that connects several computers: for example, data produced in one physical location and required in another location.
  2. There are many cases in which the use of a single computer would be possible in principle, but the use of a distributed system is beneficial for practical reasons. For example, it may be more cost-efficient to obtain the desired level of performance by using a cluster of several low-end computers, in comparison with a single high-end computer. A distributed system can provide more reliability than a non-distributed system, as there is no single point of failure. Moreover, a distributed system may be easier to expand and manage than a monolithic uniprocessor system.[27]

Examples

Examples of distributed systems and applications of distributed computing include the following:[28]

Theoretical foundations

Models

Many tasks that we would like to automate by using a computer are of question–answer type: we would like to ask a question and the computer should produce an answer. In theoretical computer science, such tasks are called computational problems. Formally, a computational problem consists of instances together with a solution for each instance. Instances are questions that we can ask, and solutions are desired answers to these questions.

Theoretical computer science seeks to understand which computational problems can be solved by using a computer (computability theory) and how efficiently (computational complexity theory). Traditionally, it is said that a problem can be solved by using a computer if we can design an algorithm that produces a correct solution for any given instance. Such an algorithm can be implemented as a computer program that runs on a general-purpose computer: the program reads a problem instance from input, performs some computation, and produces the solution as output. Formalisms such as random access machines or universal Turing machines can be used as abstract models of a sequential general-purpose computer executing such an algorithm.[citation needed]

The field of concurrent and distributed computing studies similar questions in the case of either multiple computers, or a computer that executes a network of interacting processes: which computational problems can be solved in such a network and how efficiently? However, it is not at all obvious what is meant by "solving a problem" in the case of a concurrent or distributed system: for example, what is the task of the algorithm designer, and what is the concurrent or distributed equivalent of a sequential general-purpose computer?[citation needed]

The discussion below focuses on the case of multiple computers, although many of the issues are the same for concurrent processes running on a single computer.

Three viewpoints are commonly used:
Parallel algorithms in shared-memory model
  • All processors have access to a shared memory. The algorithm designer chooses the program executed by each processor.
  • One theoretical model is the parallel random access machines (PRAM) that are used.[29] However, the classical PRAM model assumes synchronous access to the shared memory.
  • Shared-memory programs can be extended to distributed systems if the underlying operating system encapsulates the communication between nodes and virtually unifies the memory across all individual systems.
  • A model that is closer to the behavior of real-world multiprocessor machines and takes into account the use of machine instructions, such as Compare-and-swap (CAS), is that of asynchronous shared memory. There is a wide body of work on this model, a summary of which can be found in the literature.[30][31]
Parallel algorithms in message-passing model
  • The algorithm designer chooses the structure of the network, as well as the program executed by each computer.
  • Models such as Boolean circuits and sorting networks are used.[32] A Boolean circuit can be seen as a computer network: each gate is a computer that runs an extremely simple computer program. Similarly, a sorting network can be seen as a computer network: each comparator is a computer.
Distributed algorithms in message-passing model
  • The algorithm designer only chooses the computer program. All computers run the same program. The system must work correctly regardless of the structure of the network.
  • A commonly used model is a graph with one finite-state machine per node.
In the case of distributed algorithms, computational problems are typically related to graphs. Often the graph that describes the structure of the computer network is the problem instance. This is illustrated in the following example.[citation needed]

An example

Consider the computational problem of finding a coloring of a given graph G. Different fields might take the following approaches:
Centralized algorithms[citation needed]
  • The graph G is encoded as a string, and the string is given as input to a computer. The computer program finds a coloring of the graph, encodes the coloring as a string, and outputs the result.
Parallel algorithms
  • Again, the graph G is encoded as a string. However, multiple computers can access the same string in parallel. Each computer might focus on one part of the graph and produce a coloring for that part.
  • The main focus is on high-performance computation that exploits the processing power of multiple computers in parallel.
Distributed algorithms
  • The graph G is the structure of the computer network. There is one computer for each node of G and one communication link for each edge of G. Initially, each computer only knows about its immediate neighbors in the graph G; the computers must exchange messages with each other to discover more about the structure of G. Each computer must produce its own color as output.
  • The main focus is on coordinating the operation of an arbitrary distributed system.[citation needed]
While the field of parallel algorithms has a different focus than the field of distributed algorithms, there is a lot of interaction between the two fields. For example, the Cole–Vishkin algorithm for graph coloring[33] was originally presented as a parallel algorithm, but the same technique can also be used directly as a distributed algorithm.

Moreover, a parallel algorithm can be implemented either in a parallel system (using shared memory) or in a distributed system (using message passing).[34] The traditional boundary between parallel and distributed algorithms (choose a suitable network vs. run in any given network) does not lie in the same place as the boundary between parallel and distributed systems (shared memory vs. message passing).

Complexity measures

In parallel algorithms, yet another resource in addition to time and space is the number of computers. Indeed, often there is a trade-off between the running time and the number of computers: the problem can be solved faster if there are more computers running in parallel (see speedup). If a decision problem can be solved in polylogarithmic time by using a polynomial number of processors, then the problem is said to be in the class NC.[35] The class NC can be defined equally well by using the PRAM formalism or Boolean circuits—PRAM machines can simulate Boolean circuits efficiently and vice versa.[36]

In the analysis of distributed algorithms, more attention is usually paid on communication operations than computational steps. Perhaps the simplest model of distributed computing is a synchronous system where all nodes operate in a lockstep fashion. During each communication round, all nodes in parallel (1) receive the latest messages from their neighbours, (2) perform arbitrary local computation, and (3) send new messages to their neighbors. In such systems, a central complexity measure is the number of synchronous communication rounds required to complete the task.[37]

This complexity measure is closely related to the diameter of the network. Let D be the diameter of the network. On the one hand, any computable problem can be solved trivially in a synchronous distributed system in approximately 2D communication rounds: simply gather all information in one location (D rounds), solve the problem, and inform each node about the solution (D rounds).

On the other hand, if the running time of the algorithm is much smaller than D communication rounds, then the nodes in the network must produce their output without having the possibility to obtain information about distant parts of the network. In other words, the nodes must make globally consistent decisions based on information that is available in their local neighbourhood. Many distributed algorithms are known with the running time much smaller than D rounds, and understanding which problems can be solved by such algorithms is one of the central research questions of the field.[38]

Other commonly used measures are the total number of bits transmitted in the network (cf. communication complexity).[citation needed]

Other problems

Traditional computational problems take the perspective that we ask a question, a computer (or a distributed system) processes the question for a while, and then produces an answer and stops. However, there are also problems where we do not want the system to ever stop. Examples of such problems include the dining philosophers problem and other similar mutual exclusion problems. In these problems, the distributed system is supposed to continuously coordinate the use of shared resources so that no conflicts or deadlocks occur.

There are also fundamental challenges that are unique to distributed computing. The first example is challenges that are related to fault-tolerance. Examples of related problems include consensus problems,[39] Byzantine fault tolerance,[40] and self-stabilisation.[41]

A lot of research is also focused on understanding the asynchronous nature of distributed systems:
Election

Coordinator election (or leader election) is the process of designating a single process as the organizer of some task distributed among several computers (nodes). Before the task is begun, all network nodes are either unaware which node will serve as the "coordinator" (or leader) of the task, or unable to communicate with the current coordinator. After a coordinator election algorithm has been run, however, each node throughout the network recognizes a particular, unique node as the task coordinator.[citation needed]

The network nodes communicate among themselves in order to decide which of them will get into the "coordinator" state. For that, they need some method in order to break the symmetry among them. For example, if each node has unique and comparable identities, then the nodes can compare their identities, and decide that the node with the highest identity is the coordinator.[citation needed]

The definition of this problem is often attributed to LeLann, who formalized it as a method to create a new token in a token ring network in which the token has been lost.[45]

Coordinator election algorithms are designed to be economical in terms of total bytes transmitted, and time. The algorithm suggested by Gallager, Humblet, and Spira [46] for general undirected graphs has had a strong impact on the design of distributed algorithms in general, and won the Dijkstra Prize for an influential paper in distributed computing.

Many other algorithms were suggested for different kind of network graphs, such as undirected rings, unidirectional rings, complete graphs, grids, directed Euler graphs, and others. A general method that decouples the issue of the graph family from the design of the coordinator election algorithm was suggested by Korach, Kutten, and Moran.[47]

In order to perform coordination, distributed systems employ the concept of coordinators. The coordinator election problem is to choose a process from among a group of processes on different processors in a distributed system to act as the central coordinator. Several central coordinator election algorithms exist.[48]

Properties of distributed systems

So far the focus has been on designing a distributed system that solves a given problem. A complementary research problem is studying the properties of a given distributed system.[49][50]

The halting problem is an analogous example from the field of centralised computation: we are given a computer program and the task is to decide whether it halts or runs forever. The halting problem is undecidable in the general case, and naturally understanding the behaviour of a computer network is at least as hard as understanding the behaviour of one computer.[citation needed]

However, there are many interesting special cases that are decidable. In particular, it is possible to reason about the behaviour of a network of finite-state machines. One example is telling whether a given network of interacting (asynchronous and non-deterministic) finite-state machines can reach a deadlock. This problem is PSPACE-complete,[51] i.e., it is decidable, but it is not likely that there is an efficient (centralised, parallel or distributed) algorithm that solves the problem in the case of large networks.

e-Science

From Wikipedia, the free encyclopedia

E-Science or eScience is computationally intensive science that is carried out in highly distributed network environments, or science that uses immense data sets that require grid computing; the term sometimes includes technologies that enable distributed collaboration, such as the Access Grid. The term was created by John Taylor, the Director General of the United Kingdom's Office of Science and Technology in 1999 and was used to describe a large funding initiative starting in November 2000. E-science has been more broadly interpreted since then, as "the application of computer technology to the undertaking of modern scientific investigation, including the preparation, experimentation, data collection, results dissemination, and long-term storage and accessibility of all materials generated through the scientific process. These may include data modeling and analysis, electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints, and print and/or electronic publications."[1] In 2014, IEEE eScience Conference Series condensed the definition to "eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle" in one of the working definitions used by the organizers.[2] E-science encompasses "what is often referred to as big data [which] has revolutionized science... [such as] the Large Hadron Collider (LHC) at CERN... [that] generates around 780 terabytes per year... highly data intensive modern fields of science...that generate large amounts of E-science data include: computational biology, bioinformatics, genomics"[1] and the human digital footprint for the social sciences.[3]

Turing award winner Jim Gray imagined "data-intensive science" or "e-science" as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.[4][5]

E-Science revolutionizes both fundamental legs of the scientific method: empirical research, especially through digital big data; and scientific theory, especially through computer simulation model building.[6][7] These ideas were reflected by The White House's Office and Science Technology Policy in February 2013, which slated many of the aforementioned e-Science output products for preservation and access requirements under the memorandum's directive.[8] E-sciences include particle physics, earth sciences and social simulations.

Characteristics and examples

Most of the research activities into e-Science have focused on the development of new computational tools and infrastructures to support scientific discovery. Due to the complexity of the software and the backend infrastructural requirements, e-Science projects usually involve large teams managed and developed by research laboratories, large universities or governments. Currently[when?] there is a large focus in e-Science in the United Kingdom, where the UK e-Science programme provides significant funding. In Europe the development of computing capabilities to support the CERN Large Hadron Collider has led to the development of e-Science and Grid infrastructures which are also used by other disciplines.

Consortiums

Example e-Science infrastructures include the Worldwide LHC Computing Grid, a federation with various partners including the European Grid Infrastructure, the Open Science Grid and the Nordic DataGrid Facility.

To support e-Science applications, Open Science Grid combines interfaces to more than 100 nationwide clusters, 50 interfaces to geographically distributed storage caches, and 8 campus grids (Purdue, Wisconsin-Madison, Clemson, Nebraska-Lincoln, FermiGrid at FNAL, SUNY-Buffalo, and Oklahoma in the United States; and UNESP in Brazil). Areas of science benefiting from Open Science Grid include:
  • Astrophysics, Gravitational Physics, High-energy Physics, Neutrino Physics, and Nuclear Physics.
  • Molecular Dynamics, Materials Science and Engineering, Computer Science and Engineering, and Nanotechnology.
  • Structural Biology, Computational Biology, Genomics, Proteomics, and Medicine.

UK programme

After his appointment as Director General of the Research Councils in 1999 John Taylor, with the support of the Science Minister David Sainsbury and the Chancellor of the Exchequer Gordon Brown, bid to HM Treasury to fund a programme of e-infrastructure development for science which would provide the foundation for UK science and industry to be a world leader in the knowledge economy which motivated the Lisbon Strategy for sustainable economic growth that the UK government committed to in March 2000.

In November 2000 John Taylor announced £98 million for a national UK e-Science programme. An additional £20 million contribution was planned from UK industry in matching funds to projects that they participated in. From this budget of £120 million over three years, £75 million was to be spent on grid application pilots in all areas of science, administered by the Research Council responsible for each area, while £35 million was to be administered by the EPSRC as a Core Programme to develop "industrial strength" Grid middleware. Phase 2 of the programme for 2004-2006 was supported by a further £96 million for application projects, and £27 million for the EPSRC core programme. Phase 3 of the programme for 2007-2009 was supported by a further £14 million for the EPSRC core programme and a further sum for applications. Additional funding for UK e-Science activities was provided from European Union funding, from university funding council SRIF funding for hardware, and from Jisc for networking and other infrastructure.

The UK e-Science programme comprised a wide range of resources, centres and people including the National e-Science Centre (NeSC) which is managed by the Universities of Glasgow and Edinburgh, with facilities in both cities.[9] Tony Hey led the core programme from 2001 to 2005.[10]

Within the UK regional e-Science centres support their local universities and projects, including:
There are also various centres of excellence and research centres.

In addition to centres, the grid application pilot projects were funded by the Research Council responsible for each area of UK science funding.

The EPSRC funded 11 pilot e-Science projects in three phases (for about £3 million each in the first phase):
  • First Phase (2001–2005) were CombEchem, DAME, Discovery Net, GEODISE, myGrid and RealityGrid.
  • Second phase (2004–2008) were GOLD and Integrative biology
  • Third phase (2005–2010) were PMSEG (MESSAGE), CARMEN and NanoCMOS
The PPARC/STFC funded two projects: GridPP (phase 1 for £17 million, phase 2 for £5.9 million, phase 3 for £30 million and a 4th phase running from 2011 to 2014) and Astrogrid (£14 million over 3 phases).

The remaining £23 million of phase one funding was divided between the application projects funded by BBSRC, MRC and NERC:
  • BBSRC: Biomolecular Grid, Proteome Annotation Pipeline, High-Throughput Structural Biology, Global Biodiversity
  • MRC: Biology of Ageing, Sequence and Structure Data, Molecular Genetics, Cancer Management, Clinical e-Science Framework, Neuroinformatics Modeling Tools
  • NERC: Climateprediction.com, Oceanographic Grid, Molecular Environmental Grid, NERC DataGrid
The funded UK e-Science programme was reviewed on its completion in 2009 by an international panel led by Daniel E. Atkins, director of the Office of Cyberinfrastructure of the US NSF. The report concluded that the programme had developed a skilled pool of expertise, some services, and had led to cooperation between academia and industry, but that these achievements were at a project level rather than by generating infrastructure or transforming disciplines to adopt e-Science as a normal method of work, and that they were not self-sustainable without further investment.

United States

United States-based initiatives, where the term cyberinfrastructure is typically used to define e-Science projects, are primarily funded by the National Science Foundation office of cyberinfrastructure (NSF OCI)[11] and Department of Energy (in particular the Office of Science).

The Netherlands

Dutch eScience research is coordinated by the Netherlands eScience Center in Amsterdam, an initiative founded by NWO and SURF.

Europe

Plan-Europe is a Platform of National e-Science/Data Research Centers in Europe, as established during the constituting meeting 29–30 October 2014 in Amsterdam, The Netherlands, and which is based on agreed Terms of Reference. PLAN-E has a kernel group of active members and convenes twice annually. More can be found on PLAN-E.

Sweden

Two academic research projects have been carried out in Sweden by two different groups of universities, to help researches share and access scientific computing resources and knowledge:
  • Swedish e-Science Research Center (SeRC): Kungliga Tekniska högskolan (KTH), Stockholms universitet (SU), Karolinska institutet (KI) and Linköpings universitet (LiU) [12]
  • eSSENCE, The e-Science Collaboration (eSSENCE): Uppsala University, Lund University and Umeå University [13]

Vs. traditional science

Traditional science is representative of two distinct philosophical traditions within the history of science, but e-Science, it is being argued, requires a paradigm shift, and the addition of a third branch of the sciences. "The idea of open data is not a new one; indeed, when studying the history and philosophy of science, Robert Boyle is credited with stressing the concepts of skepticism, transparency, and reproducibility for independent verification in scholarly publishing in the 1660s. The scientific method later was divided into two major branches, deductive and empirical approaches. Today, a theoretical revision in the scientific method should include a new branch, Victoria Stodden advocate[s], that of the computational approach, where like the other two methods, all of the computational steps by which scientists draw conclusions are revealed. This is because within the last 20 years, people have been grappling with how to handle changes in high performance computing and simulation."[1] As such, e-science aims at combining both empirical and theoretical traditions,[3] while computer simulations can create artificial data, and real-time big data can be used to calibrate theoretical simulation models.[7] Conceptually, e-Science revolves around developing new methods to support scientists in conducting scientific research with the aim of making new scientific discoveries by analyzing vast amounts of data accessible over the internet using vast amounts of computational resources. However, discoveries of value cannot be made simply by providing computational tools, a cyberinfrastructure or by performing a pre-defined set of steps to produce a result. Rather, there needs to be an original, creative aspect to the activity that by its nature cannot be automated. This has led to various research that attempts to define the properties that e-Science platforms should provide in order to support a new paradigm of doing science, and new rules to fulfill the requirements of preserving and making computational data results available in a manner such that they are reproducible in traceable, logical steps, as an intrinsic requirement for the maintenance of modern scientific integrity that allows an extenuation of "Boyle's tradition in the computational age".[1]

Modelling e-Science processes

One view [14] argues that since a modern discovery process instance serves a similar purpose to a mathematical proof it should have similar properties, namely it allows results to be deterministically reproduced when re-executed and that intermediate results can be viewed to aid examination and comprehension. In this case, simply modelling the provenance of data is not sufficient. One has to model the provenance of the hypotheses and results generated from analyzing the data as well so as to provide evidence that support new discoveries. Scientific workflows have thus been proposed and developed to assist scientists to track the evolution of their data, intermediate results and final results as a means to document and track the evolution of discoveries within a piece of scientific research.

Science 2.0

Other views include Science 2.0 where e-Science is considered to be a shift from the publication of final results by well-defined collaborative groups towards a more open approach, which includes the public sharing of raw data, preliminary experimental results, and related information. To facilitate this shift, the Science 2.0 view is on providing tools that simplify communication, cooperation and collaboration between interested parties. Such an approach has the potential to: speed up the process of scientific discovery; overcome problems associated with academic publishing and peer review; and remove time and cost barriers, limiting the process of generating new knowledge.

Representation of a Lie group

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Representation_of_a_Lie_group...