A Medley of Potpourri

Monday, May 29, 2023

Quantum finite automaton

From Wikipedia, the free encyclopedia

(Redirected from Quantum finite automata)

In quantum computing, quantum finite automata (QFA) or quantum state machines are a quantum analog of probabilistic automata or a Markov decision process. They provide a mathematical abstraction of real-world quantum computers. Several types of automata may be defined, including measure-once and measure-many automata. Quantum finite automata can also be understood as the quantization of subshifts of finite type, or as a quantization of Markov chains. QFAs are, in turn, special cases of geometric finite automata or topological finite automata.

The automata work by receiving a finite-length string $σ = (σ_{0}, σ_{1}, \dots, σ_{k})$ of letters $σ_{i}$ from a finite alphabet $Σ$ , and assigning to each such string a probability $\Pr (σ)$ indicating the probability of the automaton being in an accept state; that is, indicating whether the automaton accepted or rejected the string.

The languages accepted by QFAs are not the regular languages of deterministic finite automata, nor are they the stochastic languages of probabilistic finite automata. Study of these quantum languages remains an active area of research.

Informal description

There is a simple, intuitive way of understanding quantum finite automata. One begins with a graph-theoretic interpretation of deterministic finite automata (DFA). A DFA can be represented as a directed graph, with states as nodes in the graph, and arrows representing state transitions. Each arrow is labelled with a possible input symbol, so that, given a specific state and an input symbol, the arrow points at the next state. One way of representing such a graph is by means of a set of adjacency matrices, with one matrix for each input symbol. In this case, the list of possible DFA states is written as a column vector. For a given input symbol, the adjacency matrix indicates how any given state (row in the state vector) will transition to the next state; a state transition is given by matrix multiplication.

One needs a distinct adjacency matrix for each possible input symbol, since each input symbol can result in a different transition. The entries in the adjacency matrix must be zero's and one's. For any given column in the matrix, only one entry can be non-zero: this is the entry that indicates the next (unique) state transition. Similarly, the state of the system is a column vector, in which only one entry is non-zero: this entry corresponds to the current state of the system. Let $Σ$ denote the set of input symbols. For a given input symbol $α \in Σ$ , write $U_{α}$ as the adjacency matrix that describes the evolution of the DFA to its next state. The set ${U_{α} | α \in Σ}$ then completely describes the state transition function of the DFA. Let Q represent the set of possible states of the DFA. If there are N states in Q, then each matrix $U_{α}$ is N by N-dimensional. The initial state $q_{0} \in Q$ corresponds to a column vector with a one in the q₀'th row. A general state q is then a column vector with a one in the q'th row. By abuse of notation, let q₀ and q also denote these two vectors. Then, after reading input symbols $α β γ \dots$ from the input tape, the state of the DFA will be given by $q = \dots U_{γ} U_{β} U_{α} q_{0} .$ The state transitions are given by ordinary matrix multiplication (that is, multiply q₀ by $U_{α}$ , etc.); the order of application is 'reversed' only because we follow the standard notation of linear algebra.

The above description of a DFA, in terms of linear operators and vectors, almost begs for generalization, by replacing the state-vector q by some general vector, and the matrices ${U_{α}}$ by some general operators. This is essentially what a QFA does: it replaces q by a probability amplitude, and the ${U_{α}}$ by unitary matrices. Other, similar generalizations also become obvious: the vector q can be some distribution on a manifold; the set of transition matrices become automorphisms of the manifold; this defines a topological finite automaton. Similarly, the matrices could be taken as automorphisms of a homogeneous space; this defines a geometric finite automaton.

Before moving on to the formal description of a QFA, there are two noteworthy generalizations that should be mentioned and understood. The first is the non-deterministic finite automaton (NFA). In this case, the vector q is replaced by a vector which can have more than one entry that is non-zero. Such a vector then represents an element of the power set of Q; its just an indicator function on Q. Likewise, the state transition matrices ${U_{α}}$ are defined in such a way that a given column can have several non-zero entries in it. Equivalently, the multiply-add operations performed during component-wise matrix multiplication should be replaced by Boolean and-or operations, that is, so that one is working with a ring of characteristic 2.

A well-known theorem states that, for each DFA, there is an equivalent NFA, and vice versa. This implies that the set of languages that can be recognized by DFA's and NFA's are the same; these are the regular languages. In the generalization to QFAs, the set of recognized languages will be different. Describing that set is one of the outstanding research problems in QFA theory.

Another generalization that should be immediately apparent is to use a stochastic matrix for the transition matrices, and a probability vector for the state; this gives a probabilistic finite automaton. The entries in the state vector must be real numbers, positive, and sum to one, in order for the state vector to be interpreted as a probability. The transition matrices must preserve this property: this is why they must be stochastic. Each state vector should be imagined as specifying a point in a simplex; thus, this is a topological automaton, with the simplex being the manifold, and the stochastic matrices being linear automorphisms of the simplex onto itself. Since each transition is (essentially) independent of the previous (if we disregard the distinction between accepted and rejected languages), the PFA essentially becomes a kind of Markov chain.

By contrast, in a QFA, the manifold is complex projective space $C P^{N}$ , and the transition matrices are unitary matrices. Each point in $C P^{N}$ corresponds to a quantum-mechanical probability amplitude or pure state; the unitary matrices can be thought of as governing the time evolution of the system (viz in the Schrödinger picture). The generalization from pure states to mixed states should be straightforward: A mixed state is simply a measure-theoretic probability distribution on $C P^{N}$ .

A worthy point to contemplate is the distributions that result on the manifold during the input of a language. In order for an automaton to be 'efficient' in recognizing a language, that distribution should be 'as uniform as possible'. This need for uniformity is the underlying principle behind maximum entropy methods: these simply guarantee crisp, compact operation of the automaton. Put in other words, the machine learning methods used to train hidden Markov models generalize to QFAs as well: the Viterbi algorithm and the forward-backward algorithm generalize readily to the QFA.

Although the study of QFA was popularized in the work of Kondacs and Watrous in 1997 and later by Moore and Crutchfeld, they were described as early as 1971, by Ion Baianu.

Measure-once automata

Measure-once automata were introduced by Cris Moore and James P. Crutchfield. They may be defined formally as follows.

As with an ordinary finite automaton, the quantum automaton is considered to have $N$ possible internal states, represented in this case by an $N$ -state qubit $| ψ ⟩$ . More precisely, the $N$ -state qubit $| ψ ⟩ \in C P^{N}$ is an element of $N$ -dimensional complex projective space, carrying an inner product $‖ \cdot ‖$ that is the Fubini–Study metric.

The state transitions, transition matrices or de Bruijn graphs are represented by a collection of $N \times N$ unitary matrices $U_{α}$ , with one unitary matrix for each letter $α \in Σ$ . That is, given an input letter $α$ , the unitary matrix describes the transition of the automaton from its current state $| ψ ⟩$ to its next state $| ψ^{'} ⟩$ :

| ψ^{'} ⟩ = U_{α} | ψ ⟩

Thus, the triple $(C P^{N}, Σ, {U_{α} | α \in Σ})$ form a quantum semiautomaton.

The accept state of the automaton is given by an $N \times N$ projection matrix $P$ , so that, given a $N$ -dimensional quantum state $| ψ ⟩$ , the probability of $| ψ ⟩$ being in the accept state is

⟨ ψ | P | ψ ⟩ = ‖ P | ψ ⟩ ‖^{2}

The probability of the state machine accepting a given finite input string $σ = (σ_{0}, σ_{1}, \dots, σ_{k})$ is given by

\Pr (σ) = ‖ P U_{σ_{k}} \dots U_{σ_{1}} U_{σ_{0}} | ψ ⟩ ‖^{2}

Here, the vector $| ψ ⟩$ is understood to represent the initial state of the automaton, that is, the state the automaton was in before it started accepting the string input. The empty string $\emptyset$ is understood to be just the unit matrix, so that

\Pr (\emptyset) = ‖ P | ψ ⟩ ‖^{2}

is just the probability of the initial state being an accepted state.

Because the left-action of $U_{α}$ on $| ψ ⟩$ reverses the order of the letters in the string $σ$ , it is not uncommon for QFAs to be defined using a right action on the Hermitian transpose states, simply in order to keep the order of the letters the same.

A regular language is accepted with probability $p$ by a quantum finite automaton, if, for all sentences $σ$ in the language, (and a given, fixed initial state $| ψ ⟩$ ), one has $p < \Pr (σ)$ .

Example

Consider the classical deterministic finite automaton given by the state transition table

**State Transition Table**
Input State	1	0
S₁	S₁	S₂
S₂	S₂	S₁

State Diagram

The quantum state is a vector, in bra–ket notation

| ψ ⟩ = a_{1} | S_{1} ⟩ + a_{2} | S_{2} ⟩ = [\begin{matrix} a_{1} \\ a_{2} \end{matrix}]

with the complex numbers $a_{1}, a_{2}$ normalized so that

[\begin{matrix} a_{1}^{*} a_{2}^{*} \end{matrix}] [\begin{matrix} a_{1} \\ a_{2} \end{matrix}] = a_{1}^{*} a_{1} + a_{2}^{*} a_{2} = 1

The unitary transition matrices are

U_{0} = [\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}]

and

U_{1} = [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}]

Taking $S_{1}$ to be the accept state, the projection matrix is

P = [\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}]

As should be readily apparent, if the initial state is the pure state $| S_{1} ⟩$ or $| S_{2} ⟩$ , then the result of running the machine will be exactly identical to the classical deterministic finite state machine. In particular, there is a language accepted by this automaton with probability one, for these initial states, and it is identical to the regular language for the classical DFA, and is given by the regular expression:

(1^{*} (01^{*} 0)^{*})^{*}

The non-classical behaviour occurs if both $a_{1}$ and $a_{2}$ are non-zero. More subtle behaviour occurs when the matrices $U_{0}$ and $U_{1}$ are not so simple; see, for example, the de Rham curve as an example of a quantum finite state machine acting on the set of all possible finite binary strings.

Measure-many automata

Measure-many automata were introduced by Kondacs and Watrous in 1997. The general framework resembles that of the measure-once automaton, except that instead of there being one projection, at the end, there is a projection, or quantum measurement, performed after each letter is read. A formal definition follows.

The Hilbert space $H_{Q}$ is decomposed into three orthogonal subspaces

H_{Q} = H_{accept} \oplus H_{reject} \oplus H_{non-halting}

In the literature, these orthogonal subspaces are usually formulated in terms of the set $Q$ of orthogonal basis vectors for the Hilbert space $H_{Q}$ . This set of basis vectors is divided up into subsets $Q_{acc} \subset Q$ and $Q_{rej} \subset Q$ , such that

H_{accept} = span {| q ⟩ : | q ⟩ \in Q_{acc}}

is the linear span of the basis vectors in the accept set. The reject space is defined analogously, and the remaining space is designated the non-halting subspace. There are three projection matrices, $P_{acc}$ , $P_{rej}$ and $P_{non}$ , each projecting to the respective subspace:

P_{acc} : H_{Q} \to H_{accept}

and so on. The parsing of the input string proceeds as follows. Consider the automaton to be in a state $| ψ ⟩$ . After reading an input letter $α$ , the automaton will be in the state

| ψ^{'} ⟩ = U_{α} | ψ ⟩

At this point, a measurement is performed on the state $| ψ^{'} ⟩$ , using the projection operators $P$ , at which time its wave-function collapses into one of the three subspaces $H_{accept}$ or $H_{reject}$ or $H_{non-halting}$ . The probability of collapse is given by

\Pr_{acc} (σ) = ‖ P_{acc} | ψ^{'} ⟩ ‖^{2}

for the "accept" subspace, and analogously for the other two spaces.

If the wave function has collapsed to either the "accept" or "reject" subspaces, then further processing halts. Otherwise, processing continues, with the next letter read from the input, and applied to what must be an eigenstate of $P_{non}$ . Processing continues until the whole string is read, or the machine halts. Often, additional symbols $κ$ and $ are adjoined to the alphabet, to act as the left and right end-markers for the string.

In the literature, the measure-many automaton is often denoted by the tuple $(Q; Σ; δ; q_{0}; Q_{acc}; Q_{rej})$ . Here, $Q$ , $Σ$ , $Q_{acc}$ and $Q_{rej}$ are as defined above. The initial state is denoted by $| ψ ⟩ = | q_{0} ⟩$ . The unitary transformations are denoted by the map $δ$ ,

δ : Q \times Σ \times Q \to C

so that

U_{α} | q_{1} ⟩ = \sum_{q_{2} \in Q} δ (q_{1}, α, q_{2}) | q_{2} ⟩

Relation to quantum computing

As of 2019, most quantum computers are implementations of measure-once quantum finite automata, and the software systems for programming them expose the state-preparation of $| ψ ⟩$ , measurement $P$ and a choice of unitary transformations $U_{α}$ , such the controlled NOT gate, the Hadamard transform and other quantum logic gates, directly to the programmer.

The primary difference between real-world quantum computers and the theoretical framework presented above is that the initial state preparation cannot ever result in a point-like pure state, nor can the unitary operators be precisely applied. Thus, the initial state must be taken as a mixed state

ρ = \int p (x) | ψ_{x} ⟩ d x

for some probability distribution $p (x)$ characterizing the ability of the machinery to prepare an initial state close to the desired initial pure state $| ψ ⟩$ . This state is not stable, but suffers from some amount of quantum decoherence over time. Precise measurements are also not possible, and one instead uses positive operator-valued measures to describe the measurement process. Finally, each unitary transformation is not a single, sharply defined quantum logic gate, but rather a mixture

U_{α, (ρ)} = \int p_{α} (x) U_{α, x} d x

for some probability distribution $p_{α} (x)$ describing how well the machinery can effect the desired transformation $U_{α}$ .

As a result of these effects, the actual time evolution of the state cannot be taken as an infinite-precision pure point, operated on by a sequence of arbitrarily sharp transformations, but rather as an ergodic process, or more accurately, a mixing process that only concatenates transformations onto a state, but also smears the state over time.

There is no quantum analog to the push-down automaton or stack machine. This is due to the no-cloning theorem: there is no way to make a copy of the current state of the machine, push it onto a stack for later reference, and then return to it.

Geometric generalizations

The above constructions indicate how the concept of a quantum finite automaton can be generalized to arbitrary topological spaces. For example, one may take some (N-dimensional) Riemann symmetric space to take the place of $C P^{N}$ . In place of the unitary matrices, one uses the isometries of the Riemannian manifold, or, more generally, some set of open functions appropriate for the given topological space. The initial state may be taken to be a point in the space. The set of accept states can be taken to be some arbitrary subset of the topological space. One then says that a formal language is accepted by this topological automaton if the point, after iteration by the homeomorphisms, intersects the accept set. But, of course, this is nothing more than the standard definition of an M-automaton. The behaviour of topological automata is studied in the field of topological dynamics.

The quantum automaton differs from the topological automaton in that, instead of having a binary result (is the iterated point in, or not in, the final set?), one has a probability. The quantum probability is the (square of) the initial state projected onto some final state P; that is $P r = | ⟨ P | ψ ⟩ |^{2}$ . But this probability amplitude is just a very simple function of the distance between the point $| P ⟩$ and the point $| ψ ⟩$ in $C P^{N}$ , under the distance metric given by the Fubini–Study metric. To recap, the quantum probability of a language being accepted can be interpreted as a metric, with the probability of accept being unity, if the metric distance between the initial and final states is zero, and otherwise the probability of accept is less than one, if the metric distance is non-zero. Thus, it follows that the quantum finite automaton is just a special case of a geometric automaton or a metric automaton, where $C P^{N}$ is generalized to some metric space, and the probability measure is replaced by a simple function of the metric on that space.

Superstring theory

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Superstring_theory

Superstring theory is an attempt to explain all of the particles and fundamental forces of nature in one theory by modeling them as vibrations of tiny supersymmetric strings.

'Superstring theory' is a shorthand for supersymmetric string theory because unlike bosonic string theory, it is the version of string theory that accounts for both fermions and bosons and incorporates supersymmetry to model gravity.

Since the second superstring revolution, the five superstring theories are regarded as different limits of a single theory tentatively called M-theory.

Background

The deepest problem in theoretical physics is harmonizing the theory of general relativity, which describes gravitation and applies to large-scale structures (stars, galaxies, super clusters), with quantum mechanics, which describes the other three fundamental forces acting on the atomic scale.

The development of a quantum field theory of a force invariably results in infinite possibilities. Physicists developed the technique of renormalization to eliminate these infinities; this technique works for three of the four fundamental forces—electromagnetic, strong nuclear and weak nuclear forces—but not for gravity. Development of quantum theory of gravity therefore requires different means than those used for the other forces.

According to the theory, the fundamental constituents of reality are strings of the Planck length (about 10⁻³³ cm) that vibrate at resonant frequencies. Every string, in theory, has a unique resonance, or harmonic. Different harmonics determine different fundamental particles. The tension in a string is on the order of the Planck force (10⁴⁴ newtons). The graviton (the proposed messenger particle of the gravitational force), for example, is predicted by the theory to be a string with wave amplitude zero.

History

Investigating how a string theory may include fermions in its spectrum led to the invention of supersymmetry (in the West) in 1971, a mathematical transformation between bosons and fermions. String theories that include fermionic vibrations are now known as "superstring theories".

Since its beginnings in the seventies and through the combined efforts of many different researchers, superstring theory has developed into a broad and varied subject with connections to quantum gravity, particle and condensed matter physics, cosmology, and pure mathematics.

Lack of experimental evidence

Superstring theory is based on supersymmetry. No supersymmetric particles have been discovered and recent research at the Large Hadron Collider (LHC) and Tevatron has excluded some of the ranges. For instance, the mass constraint of the Minimal Supersymmetric Standard Model squarks has been up to 1.1 TeV, and gluinos up to 500 GeV. No report on suggesting large extra dimensions has been delivered from LHC. There have been no principles so far to limit the number of vacua in the concept of a landscape of vacua.

Some particle physicists became disappointed by the lack of experimental verification of supersymmetry, and some have already discarded it; Jon Butterworth at University College London said that we had no sign of supersymmetry, even in higher energy region, excluding the superpartners of the top quark up to a few TeV. Ben Allanach at the University of Cambridge states that if we do not discover any new particles in the next trial at the LHC, then we can say it is unlikely to discover supersymmetry at CERN in the foreseeable future.

Extra dimensions

Our physical space is observed to have three large spatial dimensions and, along with time, is a boundless 4-dimensional continuum known as spacetime. However, nothing prevents a theory from including more than 4 dimensions. In the case of string theory, consistency requires spacetime to have 10 dimensions (3D regular space + 1 time + 6D hyperspace). The fact that we see only 3 dimensions of space can be explained by one of two mechanisms: either the extra dimensions are compactified on a very small scale, or else our world may live on a 3-dimensional submanifold corresponding to a brane, on which all known particles besides gravity would be restricted.

If the extra dimensions are compactified, then the extra 6 dimensions must be in the form of a Calabi–Yau manifold. Within the more complete framework of M-theory, they would have to take form of a G2 manifold. A particular exact symmetry of string/M-theory called T-duality (which exchanges momentum modes for winding number and sends compact dimensions of radius R to radius 1/R), has led to the discovery of equivalences between different Calabi–Yau manifolds called mirror symmetry.

Superstring theory is not the first theory to propose extra spatial dimensions. It can be seen as building upon the Kaluza–Klein theory, which proposed a 4+1 dimensional (5D) theory of gravity. When compactified on a circle, the gravity in the extra dimension precisely describes electromagnetism from the perspective of the 3 remaining large space dimensions. Thus the original Kaluza–Klein theory is a prototype for the unification of gauge and gravity interactions, at least at the classical level, however it is known to be insufficient to describe nature for a variety of reasons (missing weak and strong forces, lack of parity violation, etc.) A more complex compact geometry is needed to reproduce the known gauge forces. Also, to obtain a consistent, fundamental, quantum theory requires the upgrade to string theory, not just the extra dimensions.

Number of superstring theories

Theoretical physicists were troubled by the existence of five separate superstring theories. A possible solution for this dilemma was suggested at the beginning of what is called the second superstring revolution in the 1990s, which suggests that the five string theories might be different limits of a single underlying theory, called M-theory. This remains a conjecture.

String theories
Type	Spacetime dimensions	SUSY generators	chiral	open strings	heterotic compactification	gauge group	tachyon
Bosonic (closed)	26	N = 0	no	no	no	none	yes
Bosonic (open)	26	N = 0	no	yes	no	U(1)	yes
I	10	N = (1,0)	yes	yes	no	SO(32)	no
IIA	10	N = (1,1)	no	no	no	U(1)	no
IIB	10	N = (2,0)	yes	no	no	none	no
HO	10	N = (1,0)	yes	no	yes	SO(32)	no
HE	10	N = (1,0)	yes	no	yes	E₈ × E₈	no
M-theory	11	N = 1	no	no	no	none	no

The five consistent superstring theories are:

The type I string has one supersymmetry in the ten-dimensional sense (16 supercharges). This theory is special in the sense that it is based on unoriented open and closed strings, while the rest are based on oriented closed strings.
The type II string theories have two supersymmetries in the ten-dimensional sense (32 supercharges). There are actually two kinds of type II strings called type IIA and type IIB. They differ mainly in the fact that the IIA theory is non-chiral (parity conserving) while the IIB theory is chiral (parity violating).
The heterotic string theories are based on a peculiar hybrid of a type I superstring and a bosonic string. There are two kinds of heterotic strings differing in their ten-dimensional gauge groups: the heterotic E₈×E₈ string and the heterotic SO(32) string. (The name heterotic SO(32) is slightly inaccurate since among the SO(32) Lie groups, string theory singles out a quotient Spin(32)/Z₂ that is not equivalent to SO(32).)

Chiral gauge theories can be inconsistent due to anomalies. This happens when certain one-loop Feynman diagrams cause a quantum mechanical breakdown of the gauge symmetry. The anomalies were canceled out via the Green–Schwarz mechanism.

Even though there are only five superstring theories, making detailed predictions for real experiments requires information about exactly what physical configuration the theory is in. This considerably complicates efforts to test string theory because there is an astronomically high number—10⁵⁰⁰ or more—of configurations that meet some of the basic requirements to be consistent with our world. Along with the extreme remoteness of the Planck scale, this is the other major reason it is hard to test superstring theory.

Another approach to the number of superstring theories refers to the mathematical structure called composition algebra. In the findings of abstract algebra there are just seven composition algebras over the field of real numbers. In 1990 physicists R. Foot and G.C. Joshi in Australia stated that "the seven classical superstring theories are in one-to-one correspondence to the seven composition algebras".

Integrating general relativity and quantum mechanics

General relativity typically deals with situations involving large mass objects in fairly large regions of spacetime whereas quantum mechanics is generally reserved for scenarios at the atomic scale (small spacetime regions). The two are very rarely used together, and the most common case that combines them is in the study of black holes. Having peak density, or the maximum amount of matter possible in a space, and very small area, the two must be used in synchrony to predict conditions in such places. Yet, when used together, the equations fall apart, spitting out impossible answers, such as imaginary distances and less than one dimension.

The major problem with their incongruence is that, at Planck scale (a fundamental small unit of length) lengths, general relativity predicts a smooth, flowing surface, while quantum mechanics predicts a random, warped surface, which are nowhere near compatible. Superstring theory resolves this issue, replacing the classical idea of point particles with strings. These strings have an average diameter of the Planck length, with extremely small variances, which completely ignores the quantum mechanical predictions of Planck-scale length dimensional warping. Also, these surfaces can be mapped as branes. These branes can be viewed as objects with a morphism between them. In this case, the morphism will be the state of a string that stretches between brane A and brane B.

Singularities are avoided because the observed consequences of "Big Crunches" never reach zero size. In fact, should the universe begin a "big crunch" sort of process, string theory dictates that the universe could never be smaller than the size of one string, at which point it would actually begin expanding.

Mathematics

D-branes

D-branes are membrane-like objects in 10D string theory. They can be thought of as occurring as a result of a Kaluza–Klein compactification of 11D M-theory that contains membranes. Because compactification of a geometric theory produces extra vector fields the D-branes can be included in the action by adding an extra U(1) vector field to the string action.

\partial_{z} \to \partial_{z} + i A_{z} (z, \bar{z})

In type I open string theory, the ends of open strings are always attached to D-brane surfaces. A string theory with more gauge fields such as SU(2) gauge fields would then correspond to the compactification of some higher-dimensional theory above 11 dimensions, which is not thought to be possible to date. Furthermore, the tachyons attached to the D-branes show the instability of those D-branes with respect to the annihilation. The tachyon total energy is (or reflects) the total energy of the D-branes.

Why five superstring theories?

For a 10 dimensional supersymmetric theory we are allowed a 32-component Majorana spinor. This can be decomposed into a pair of 16-component Majorana-Weyl (chiral) spinors. There are then various ways to construct an invariant depending on whether these two spinors have the same or opposite chiralities:

Superstring model	Invariant
Heterotic	$\partial_{z} X^{μ} - i \bar{θ_{L}} Γ^{μ} \partial_{z} θ_{L}$
IIA	$\partial_{z} X^{μ} - i \bar{θ_{L}} Γ^{μ} \partial_{z} θ_{L} - i \bar{θ_{R}} Γ^{μ} \partial_{z} θ_{R}$
IIB	$\partial_{z} X^{μ} - i \bar{θ_{L}^{1}} Γ^{μ} \partial_{z} θ_{L}^{1} - i \bar{θ_{L}^{2}} Γ^{μ} \partial_{z} θ_{L}^{2}$

The heterotic superstrings come in two types SO(32) and E₈×E₈ as indicated above and the type I superstrings include open strings.

Beyond superstring theory

It is conceivable that the five superstring theories are approximated to a theory in higher dimensions possibly involving membranes. Because the action for this involves quartic terms and higher so is not Gaussian, the functional integrals are very difficult to solve and so this has confounded the top theoretical physicists. Edward Witten has popularised the concept of a theory in 11 dimensions, called M-theory, involving membranes interpolating from the known symmetries of superstring theory. It may turn out that there exist membrane models or other non-membrane models in higher dimensions—which may become acceptable when we find new unknown symmetries of nature, such as noncommutative geometry. It is thought, however, that 16 is probably the maximum since SO(16) is a maximal subgroup of E8, the largest exceptional Lie group, and also is more than large enough to contain the Standard Model. Quartic integrals of the non-functional kind are easier to solve so there is hope for the future. This is the series solution, which is always convergent when a is non-zero and negative:

\int_{- \infty}^{\infty} \exp (a x^{4} + b x^{3} + c x^{2} + d x + f) d x = e^{f} \sum_{n, m, p = 0}^{\infty} \frac{b^{4 n}}{(4 n)!} \frac{c^{2 m}}{(2 m)!} \frac{d^{4 p}}{(4 p)!} \frac{Γ (3 n + m + p + \frac{1}{4})}{a^{3 n + m + p + \frac{1}{4}}}

In the case of membranes the series would correspond to sums of various membrane interactions that are not seen in string theory.

Compactification

Investigating theories of higher dimensions often involves looking at the 10 dimensional superstring theory and interpreting some of the more obscure results in terms of compactified dimensions. For example, D-branes are seen as compactified membranes from 11D M-theory. Theories of higher dimensions such as 12D F-theory and beyond produce other effects, such as gauge terms higher than U(1). The components of the extra vector fields (A) in the D-brane actions can be thought of as extra coordinates (X) in disguise. However, the known symmetries including supersymmetry currently restrict the spinors to 32-components—which limits the number of dimensions to 11 (or 12 if you include two time dimensions.) Some physicists (e.g., John Baez et al.) have speculated that the exceptional Lie groups E₆, E₇ and E₈ having maximum orthogonal subgroups SO(10), SO(12) and SO(16) may be related to theories in 10, 12 and 16 dimensions; 10 dimensions corresponding to string theory and the 12 and 16 dimensional theories being yet undiscovered but would be theories based on 3-branes and 7-branes respectively. However, this is a minority view within the string community. Since E₇ is in some sense F₄ quaternified and E₈ is F₄ octonified, the 12 and 16 dimensional theories, if they did exist, may involve the noncommutative geometry based on the quaternions and octonions respectively. From the above discussion, it can be seen that physicists have many ideas for extending superstring theory beyond the current 10 dimensional theory, but so far all have been unsuccessful.

Kac–Moody algebras

Since strings can have an infinite number of modes, the symmetry used to describe string theory is based on infinite dimensional Lie algebras. Some Kac–Moody algebras that have been considered as symmetries for M-theory have been E₁₀ and E₁₁ and their supersymmetric extensions.