Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on more mathematical topics
of computing and includes the theory of computation.
It is difficult to circumscribe the theoretical areas precisely. The ACM's Special Interest Group on Algorithms and Computation Theory (SIGACT) provides the following description:
TCS covers a wide variety of topics including algorithms, data structures, computational complexity, parallel and distributed computation, probabilistic computation, quantum computation, automata theory, information theory, cryptography, program semantics and verification, machine learning, computational biology, computational economics, computational geometry, and computational number theory and algebra. Work in this field is often distinguished by its emphasis on mathematical technique and rigor.
History
While logical it is inference and mathematical proof had existed previously, in 1931 Kurt Gödel proved with his incompleteness theorem that there are fundamental limitations on what statements could be proved or disproved.
These developments have led to the modern study of logic and computability, and indeed the field of theoretical computer science as a whole. Information theory was added to the field with a 1948 mathematical theory of communication by Claude Shannon. In the same decade, Donald Hebb introduced a mathematical model of learning in the brain. With mounting biological data supporting this hypothesis with some modification, the fields of neural networks and parallel distributed processing were established. In 1971, Stephen Cook and, working independently, Leonid Levin, proved that there exist practically relevant problems that are NP-complete – a landmark result in computational complexity theory.
With the development of quantum mechanics
in the beginning of the 20th century came the concept that mathematical
operations could be performed on an entire particle wavefunction. In
other words, one could compute functions on multiple states
simultaneously. This led to the concept of a quantum computer in the latter half of the 20th century that took off in the 1990s when Peter Shor showed that such methods could be used to factor large numbers in polynomial time, which, if implemented, would render most modern public key cryptography systems uselessly insecure.
Modern theoretical computer science research is based on these
basic developments, but includes many other mathematical and
interdisciplinary problems that have been posed.
Topics
Algorithms
An algorithm is a step-by-step procedure for calculations. Algorithms are used for calculation, data processing, and automated reasoning.
An algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Starting from an initial state and initial input (perhaps empty), the instructions describe a computation that, when executed, proceeds through a finite number of well-defined successive states, eventually producing "output" and terminating at a final ending state. The transition from one state to the next is not necessarily deterministic; some algorithms, known as randomized algorithms, incorporate random input.
Data structures
A data structure is a particular way of organizing data in a computer so that it can be used efficiently.
Different kinds of data structures are suited to different kinds
of applications, and some are highly specialized to specific tasks. For
example, databases use B-tree indexes for small percentages of data retrieval and compilers and databases use dynamic hash tables as look up tables.
Data structures provide a means to manage large amounts of data efficiently for uses such as large databases and internet indexing services. Usually, efficient data structures are key to designing efficient algorithms. Some formal design methods and programming languages
emphasize data structures, rather than algorithms, as the key
organizing factor in software design. Storing and retrieving can be
carried out on data stored in both main memory and in secondary memory.
Computational complexity theory
Computational complexity theory is a branch of the theory of computation that focuses on classifying computational problems according to their inherent difficulty, and relating those classes
to each other. A computational problem is understood to be a task that
is in principle amenable to being solved by a computer, which is
equivalent to stating that the problem may be solved by mechanical
application of mathematical steps, such as an algorithm.
A problem is regarded as inherently difficult if its solution requires significant resources, whatever the algorithm used. The theory formalizes this intuition, by introducing mathematical models of computation to study these problems and quantifying the amount of resources needed to solve them, such as time and storage. Other complexity measures are also used, such as the amount of communication (used in communication complexity), the number of gates in a circuit (used in circuit complexity) and the number of processors (used in parallel computing). One of the roles of computational complexity theory is to determine the practical limits on what computers can and cannot do.
Distributed computation
Distributed computing studies distributed systems. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages.
The components interact with each other in order to achieve a common
goal. Three significant characteristics of distributed systems are:
concurrency of components, lack of a global clock, and independent
failure of components. Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications.
A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs. There are many alternatives for the message passing mechanism, including RPC-like connectors and message queues. An important goal and challenge of distributed systems is location transparency.
Parallel computation
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved "in parallel". There are several different forms of parallel computing: bit-level, instruction level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing, but interest in it has grown lately due to the physical constraints preventing frequency scaling. As power consumption (and consequently heat generation) by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.
Parallel computer programs are more difficult to write than sequential ones, because concurrency introduces several new classes of potential software bugs, of which race conditions are the most common. Communication and synchronization between the different subtasks are typically some of the greatest obstacles to getting good parallel program performance.
The maximum possible speed-up of a single program as a result of parallelization is known as Amdahl's law.
Very-large-scale integration
Very-large-scale integration (VLSI) is the process of creating an integrated circuit (IC) by combining thousands of transistors into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device. Before the introduction of VLSI technology most ICs had a limited set of functions they could perform. An electronic circuit might consist of a CPU, ROM, RAM and other glue logic. VLSI allows IC makers to add all of these circuits into one chip.
Machine learning
Machine learning is a scientific discipline that deals with the construction and study of algorithms that can learn from data. Such algorithms operate by building a model based on inputs and using that to make predictions or decisions, rather than following only explicitly programmed instructions.
Machine learning can be considered a subfield of computer science and statistics. It has strong ties to artificial intelligence and optimization,
which deliver methods, theory and application domains to the field.
Machine learning is employed in a range of computing tasks where
designing and programming explicit, rule-based algorithms is infeasible. Example applications include spam filtering, optical character recognition (OCR), search engines and computer vision. Machine learning is sometimes conflated with data mining, although that focuses more on exploratory data analysis. Machine learning and pattern recognition "can be viewed as two facets of
the same field."
Computational biology
Computational biology
involves the development and application of data-analytical and
theoretical methods, mathematical modeling and computational simulation
techniques to the study of biological, behavioral, and social systems. The field is broadly defined and includes foundations in computer science, applied mathematics, animation, statistics, biochemistry, chemistry, biophysics, molecular biology, genetics, genomics, ecology, evolution, anatomy, neuroscience, and visualization.
Computational biology is different from biological computation, which is a subfield of computer science and computer engineering using bioengineering and biology to build computers, but is similar to bioinformatics, which is an interdisciplinary science using computers to store and process biological data.
Computational geometry
Computational geometry is a branch of computer science devoted to the study of algorithms that can be stated in terms of geometry.
Some purely geometrical problems arise out of the study of
computational geometric algorithms, and such problems are also
considered to be part of computational geometry. While modern
computational geometry is a recent development, it is one of the oldest
fields of computing with history stretching back to antiquity. An
ancient precursor is the Sanskrit treatise Shulba Sutras,
or "Rules of the Chord", that is a book of algorithms written in 800
BCE. The book prescribes step-by-step procedures for constructing
geometric objects like altars using a peg and chord.
The main impetus for the development of computational geometry as a discipline was progress in computer graphics and computer-aided design and manufacturing (CAD/CAM), but many problems in computational geometry are classical in nature, and may come from mathematical visualization.
Other important applications of computational geometry include robotics (motion planning and visibility problems), geographic information systems (GIS) (geometrical location and search, route planning), integrated circuit design (IC geometry design and verification), computer-aided engineering (CAE) (mesh generation), computer vision (3D reconstruction).
Information theory
Information theory is a branch of applied mathematics, electrical engineering, and computer science involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and communicating data. Since its inception it has broadened to find applications in many other areas, including statistical inference, natural language processing, cryptography, neurobiology, the evolution and function of molecular codes, model selection in statistics, thermal physics, quantum computing, linguistics, plagiarism detection, pattern recognition, anomaly detection and other forms of data analysis.
Applications of fundamental topics of information theory include lossless data compression (e.g. ZIP files), lossy data compression (e.g. MP3s and JPEGs), and channel coding (e.g. for Digital Subscriber Line (DSL)). The field is at the intersection of mathematics, statistics, computer science, physics, neurobiology, and electrical engineering. Its impact has been crucial to the success of the Voyager missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet, the study of linguistics and of human perception, the understanding of black holes, and numerous other fields. Important sub-fields of information theory are source coding, channel coding, algorithmic complexity theory, algorithmic information theory, information-theoretic security, and measures of information.
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties (called adversaries). More generally, it is about constructing and analyzing protocols that overcome the influence of adversaries and that are related to various aspects in information security such as data confidentiality, data integrity, authentication, and non-repudiation. Modern cryptography intersects the disciplines of mathematics, computer science, and electrical engineering. Applications of cryptography include ATM cards, computer passwords, and electronic commerce.
Modern cryptography is heavily based on mathematical theory and
computer science practice; cryptographic algorithms are designed around computational hardness assumptions,
making such algorithms hard to break in practice by any adversary. It
is theoretically possible to break such a system, but it is infeasible
to do so by any known practical means. These schemes are therefore
termed computationally secure; theoretical advances, e.g., improvements
in integer factorization algorithms, and faster computing technology require these solutions to be continually adapted. There exist information-theoretically secure schemes that provably cannot be broken even with unlimited computing power—an example is the one-time pad—but these schemes are more difficult to implement than the best theoretically breakable but computationally secure mechanisms.
Quantum computation
A quantum computer is a computation system that makes direct use of quantum-mechanical phenomena, such as superposition and entanglement, to perform operations on data. Quantum computers are different from digital computers based on transistors. Whereas digital computers require data to be encoded into binary digits (bits), each of which is always in one of two definite states (0 or 1), quantum computation uses qubits (quantum bits), which can be in superpositions of states. A theoretical model is the quantum Turing machine, also known as the universal quantum computer. Quantum computers share theoretical similarities with non-deterministic and probabilistic computers;
one example is the ability to be in more than one state simultaneously.
The field of quantum computing was first introduced by Yuri Manin in 1980 and Richard Feynman in 1982. A quantum computer with spins as quantum bits was also formulated for use as a quantum space–time in 1968.
As of 2014, quantum computing is still in its infancy but
experiments have been carried out in which quantum computational
operations were executed on a very small number of qubits.
Both practical and theoretical research continues, and many national
governments and military funding agencies support quantum computing
research to develop quantum computers for both civilian and national security purposes, such as cryptanalysis.
Information-based complexity
Information-based complexity (IBC) studies optimal algorithms and
computational complexity for continuous problems. IBC has studied
continuous problems as path integration, partial differential equations,
systems of ordinary differential equations, nonlinear equations,
integral equations, fixed points, and very-high-dimensional integration.
Computational number theory
Computational number theory, also known as algorithmic number theory, is the study of algorithms for performing number theoretic computations. The best known problem in the field is integer factorization.
Symbolic computation
Computer algebra, also called symbolic computation or algebraic computation is a scientific area that refers to the study and development of algorithms and software for manipulating mathematical expressions and other mathematical objects. Although, properly speaking, computer algebra should be a subfield of scientific computing, they are generally considered as distinct fields because scientific computing is usually based on numerical computation with approximate floating point numbers, while symbolic computation emphasizes exact computation with expressions containing variables that have not any given value and are thus manipulated as symbols (therefore the name of symbolic computation).
Software applications that perform symbolic calculations are called computer algebra systems, with the term system
alluding to the complexity of the main applications that include, at
least, a method to represent mathematical data in a computer, a user
programming language (usually different from the language used for the
implementation), a dedicated memory manager, a user interface for the input/output of mathematical expressions, a large set of routines to perform usual operations, like simplification of expressions, differentiation using chain rule, polynomial factorization, indefinite integration, etc.
Program semantics
In programming language theory, semantics is the field concerned with the rigorous mathematical study of the meaning of programming languages. It does so by evaluating the meaning of syntactically legal strings
defined by a specific programming language, showing the computation
involved. In such a case that the evaluation would be of syntactically
illegal strings, the result would be non-computation. Semantics
describes the processes a computer follows when executing a program in
that specific language. This can be shown by describing the relationship
between the input and output of a program, or an explanation of how the
program will execute on a certain platform, hence creating a model of computation.
Formal methods
Formal methods are a particular kind of mathematics based techniques for the specification, development and verification of software and hardware systems.
The use of formal methods for software and hardware design is motivated
by the expectation that, as in other engineering disciplines,
performing appropriate mathematical analysis can contribute to the
reliability and robustness of a design.
Formal methods are best described as the application of a fairly
broad variety of theoretical computer science fundamentals, in
particular logic calculi, formal languages, automata theory, and program semantics, but also type systems and algebraic data types to problems in software and hardware specification and verification.
Automata theory
Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science, under Discrete mathematics. Automata comes from the Greek word αὐτόματα meaning "self-acting".
Automata Theory is the study of self-operating virtual machines
to help in logical understanding of input and output process, without or
with intermediate stage(s) of computation (or any function / process).
Coding theory
Coding theory is the study of the properties of codes and their fitness for a specific application. Codes are used for data compression, cryptography, error-correction and more recently also for network coding. Codes are studied by various scientific disciplines—such as information theory, electrical engineering, mathematics, and computer science—for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction (or detection) of errors in the transmitted data.
Computational learning theory
Theoretical results in machine learning mainly deal with a type of
inductive learning called supervised learning. In supervised
learning, an algorithm is given samples that are labeled in some
useful way. For example, the samples might be descriptions of
mushrooms, and the labels could be whether or not the mushrooms are
edible. The algorithm takes these previously labeled samples and
uses them to induce a classifier. This classifier is a function that
assigns labels to samples including the samples that have never been
previously seen by the algorithm. The goal of the supervised learning
algorithm is to optimize some measure of performance such as
minimizing the number of mistakes made on new samples.
Organizations
- European Association for Theoretical Computer Science
- SIGACT
- Simons Institute for the Theory of Computing
Journals and newsletters
- “Discrete Mathematics and Theoretical Computer Science”
- Information and Computation
- Theory of Computing (open access journal)
- Formal Aspects of Computing
- Journal of the ACM
- SIAM Journal on Computing (SICOMP)
- SIGACT News
- Theoretical Computer Science
- Theory of Computing Systems
- International Journal of Foundations of Computer Science
- Chicago Journal of Theoretical Computer Science (open access journal)
- Foundations and Trends in Theoretical Computer Science
- Journal of Automata, Languages and Combinatorics
- Acta Informatica
- Fundamenta Informaticae
- ACM Transactions on Computation Theory
- Computational Complexity
- Journal of Complexity
- ACM Transactions on Algorithms
- Information Processing Letters
- Open Computer Science (open access journal)
Conferences
- Annual ACM Symposium on Theory of Computing (STOC)
- Annual IEEE Symposium on Foundations of Computer Science (FOCS)
- Innovations in Theoretical Computer Science (ITCS)
- Mathematical Foundations of Computer Science (MFCS)
- International Computer Science Symposium in Russia (CSR)
- ACM–SIAM Symposium on Discrete Algorithms (SODA)
- IEEE Symposium on Logic in Computer Science (LICS)
- Computational Complexity Conference (CCC)
- International Colloquium on Automata, Languages and Programming (ICALP)
- Annual Symposium on Computational Geometry (SoCG)
- ACM Symposium on Principles of Distributed Computing (PODC)
- ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)
- Annual Conference on Learning Theory (COLT)
- Symposium on Theoretical Aspects of Computer Science (STACS)
- European Symposium on Algorithms (ESA)
- Workshop on Approximation Algorithms for Combinatorial Optimization Problems (APPROX)
- Workshop on Randomization and Computation (RANDOM)
- International Symposium on Algorithms and Computation (ISAAC)
- International Symposium on Fundamentals of Computation Theory (FCT)
- International Workshop on Graph-Theoretic Concepts in Computer Science (WG)