Deus ex machina in classical theatre: Euripides' Medea, performed in 2009 in Syracuse, Italy
Deus ex machina (/ˌdeɪəsɛksˈmækɪnə,-ˈmɑːk-/DAY-əs ex-MA(H)K-in-ə, Latin: [ˈdɛ.ʊs ɛks ˈmaːkʰɪnaː]; plural: dei ex machina; English ‘god from the machine’) is a plot device whereby a seemingly unsolvable problem in a story is suddenly and abruptly resolved by an unexpected and unlikely occurrence. Its function can be to resolve an otherwise irresolvable plot situation, to surprise the audience, to bring the tale to a happy ending, or act as a comedic device.
Origin of the expression
Deus ex machina is a Latin calque from Greekἀπὸ μηχανῆς θεός (apò mēkhanês theós), meaning 'god from the machine'.
The term was coined from the conventions of ancient Greek theater,
where actors who were playing gods were brought onto stage using a
machine. The machine could be either a crane (mechane) used to lower actors from above or a riser which brought them up through a trapdoor. Aeschylus
introduced the idea, and it was used often to resolve the conflict and
conclude the drama. The device is associated mostly with Greek tragedy,
although it also appeared in comedies.
Ancient examples
Aeschylus used the device in his Eumenides, but it became an established stage machine with Euripides. More than half of Euripides' extant tragedies employ a deus ex machina in their resolution, and some critics claim that Euripides invented it, not Aeschylus. A frequently cited example is Euripides' Medea, in which the deus ex machina is a dragon-drawn chariot sent by the sun god, used to convey his granddaughter Medea away from her husband Jason to the safety of Athens. In Alcestis, the heroine agrees to give up her own life to spare the life of her husband Admetus. At the end, Heracles shows up and seizes Alcestis from Death, restoring her to life and to Admetus.
Aristophanes' play Thesmophoriazusae
parodies Euripides' frequent use of the crane by making Euripides
himself a character in the play and bringing him on stage by way of the mechane.
The device produced an immediate emotional response from Greek
audiences. They would have a feeling of wonder and astonishment at the
appearance of the gods, which would often add to the moral effect of the
drama.
Modern theatrical examples
Shakespeare uses the device in As You Like It, Pericles, Prince of Tyre, and Cymbeline. John Gay uses it in The Beggar's Opera
where a character breaks the action and rewrites the ending as a
reprieve from hanging for MacHeath. During the politically turbulent
17th and 18th centuries, the deus ex machina was sometimes used to make a controversial thesis more palatable to the powers of the day. For example, in the final scene of Molière's Tartuffe, the heroes are saved from a terrible fate by an agent of the compassionate, all-seeing King Louis XIV — the same king who held Molière's career and livelihood in his hands.
Plot device
Aristotle was the first to use a Greek term equivalent to the Latin phrase deus ex machina to describe the technique as a device to resolve the plot of tragedies.
It is generally deemed undesirable in writing and often implies a lack
of creativity on the part of the author. The reasons for this are that
it does damage to the story's internal logic and is often so unlikely
that it challenges suspension of disbelief, allowing the author to conclude the story with an unlikely ending.
Examples
The Martians in H. G. Wells's The War of the Worlds have destroyed everything in their path and apparently triumphed over humanity, but they are suddenly killed by bacteria. In the novel Lord of the Flies, a passing navy officer rescues the stranded children. William Golding called that a "gimmick", other critics view it as a deus ex machina.
The abrupt ending conveys the terrible fate that would have afflicted
the children if the officer had not arrived at that moment.
J. R. R. Tolkien referred to the Great Eagles that appear in several places in The Hobbit and The Lord of the Rings as "a dangerous 'machine'". This was in a letter refusing permission to a film adapter to have the Fellowship of the Ring
transported by eagles rather than traveling on foot. He felt that the
eagles had already been overused as a plot device and they have
elsewhere been critiqued as a deus ex machina. Charles Dickens used the device in Oliver Twist
when Rose Maylie turns out to be the long-lost sister of Agnes, and
therefore Oliver's aunt; she marries her long-time sweetheart Harry,
allowing Oliver to live happily with his saviour Mr. Brownlow.
In the video game Enderal,
the story is focused on a machine (referred to as "The Beacon"),
explicitly identified by the game as a Deus Ex Machina, albeit perhaps
ironically as the nature of it is revealed later in the story to be much
different than the understanding shared by the player and other
characters in the game.
Criticism
The deus ex machina
device is often criticized as inartistic, too convenient, and overly
simplistic. However, champions of the device say that it opens up
ideological and artistic possibilities.
Ancient criticism
Antiphanes was one of the device's earliest critics. He believed that the use of the deus ex machina was a sign that the playwright was unable to properly manage the complications of his plot.
when they don't know what to say
and have completely given up on the play
just like a finger they lift the machine
and the spectators are satisfied.
— Antiphanes
Another critical reference to the device can be found in Plato's dialogue Cratylus, 425d, though it is made in the context of an argument unrelated to drama.
Aristotle criticized the device in his Poetics, where he argued that the resolution of a plot must arise internally, following from previous action of the play:
In the characters, too, exactly as
in the structure of the incidents, [the poet] ought always to seek what
is either necessary or probable, so that it is either necessary or
probable that a person of such-and-such a sort say or do things of the
same sort, and it is either necessary or probable that this [incident]
happen after that one. It is obvious that the solutions of plots, too,
should come about as a result of the plot itself, and not from a
contrivance, as in the Medea and in the passage about sailing home in the Iliad.
A contrivance must be used for matters outside the drama — either
previous events, which are beyond human knowledge, or later ones that
need to be foretold or announced. For we grant that the gods can see
everything. There should be nothing improbable in the incidents;
otherwise, it should be outside the tragedy, e.g., that in Sophocles' Oedipus.
Aristotle praised Euripides, however, for generally ending his plays
with bad fortune, which he viewed as correct in tragedy, and somewhat
excused the intervention of a deity by suggesting that "astonishment"
should be sought in tragic drama:
Irrationalities should be referred
to what people say: That is one solution, and also sometimes that it is
not irrational, since it is probable that improbable things will happen.
Such a device was referred to by Horace in his Ars Poetica
(lines 191–2), where he instructs poets that they should never resort
to a "god from the machine" to resolve their plots "unless a difficulty
worthy of a god's unraveling should happen" [nec deus intersit, nisi dignus uindice nodus inciderit; nec quarta loqui persona laboret].
Modern criticism
Following Aristotle, Renaissance critics continued to view the deus ex machina as an inept plot device, although it continued to be employed by Renaissance dramatists.
Toward the end of the 19th century, Friedrich Nietzsche criticized Euripides for making tragedy an optimistic genre
by use of the device, and was highly skeptical of the "Greek
cheerfulness", prompting what he viewed as the plays' "blissful delight
in life". The deus ex machina as Nietzsche saw it was symptomatic of Socratic culture, which valued knowledge over Dionysiac music and ultimately caused the death of tragedy:
But the new non-Dionysiac spirit is most clearly apparent in the endings
of the new dramas. At the end of the old tragedies there was a sense of
metaphysical conciliation without which it is impossible to imagine our
taking delight in tragedy; perhaps the conciliatory tones from another
world echo most purely in Oedipus at Colonus.
Now, once tragedy had lost the genius of music, tragedy in the
strictest sense was dead: for where was that metaphysical consolation
now to be found? Hence an earthly resolution for tragic dissonance was
sought; the hero, having been adequately tormented by fate, won his
well-earned reward in a stately marriage and tokens of divine honour.
The hero had become a gladiator, granted freedom once he had been
satisfactorily flayed and scarred. Metaphysical consolation had been
ousted by the deus ex machina.
— Friedrich Nietzsche
Nietzsche argued that the deus ex machina creates a false sense of consolation that ought not to be sought in phenomena. His denigration of the plot device has prevailed in critical opinion.
In Arthur Woollgar Verrall's publication Euripides the Rationalist
(1895), he surveyed and recorded other late 19th-century responses to
the device. He recorded that some of the critical responses to the term
referred to it as 'burlesque', 'coup de théâtre', and 'catastrophe'.
Verrall notes that critics have a dismissive response to authors who
deploy the device in their writings. He comes to the conclusion that
critics feel that the deus ex machina is evidence of the author's attempt to ruin the whole of his work and prevent anyone from putting any importance on his work.
However, other scholars have looked at Euripides' use of deus ex machina
and described its use as an integral part of the plot designed for a
specific purpose. Often, Euripides' plays would begin with gods, so it
is argued that it would be natural for the gods to finish the action.
The conflict throughout Euripides' plays would be caused by the meddling
of the gods, so would make sense to both the playwright and the
audience of the time that the gods would resolve all conflict that they
began. Half of Euripides' eighteen extant plays end with the use of deus ex machina,
therefore it was not simply a device to relieve the playwright of the
embarrassment of a confusing plot ending. This device enabled him to
bring about a natural and more dignified dramatic and tragic ending.
Other champions of the device believe that it can be a
spectacular agent of subversion. It can be used to undercut generic
conventions and challenge cultural assumptions and the privileged role
of tragedy as a literary/theatrical model.
Some 20th-century revisionist criticism suggests that deus ex machina
cannot be viewed in these simplified terms, and contends that the
device allows mortals to "probe" their relationship with the divine. Rush Rehm in particular cites examples of Greek tragedy in which the deus ex machina
complicates the lives and attitudes of characters confronted by the
deity, while simultaneously bringing the drama home to its audience. Sometimes, the unlikeliness of the deus ex machina plot device is employed deliberately. For example, comic effect is created in a scene in Monty Python's Life of Brian when Brian, who lives in Judea at the time of Christ, is saved from a high fall by a passing alien space ship.
Constituent amino-acids can be analyzed to predict secondary, tertiary and quaternary protein structure.
Protein structure prediction is the inference of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of its folding and its secondary and tertiary structure from its primary structure. Structure prediction is fundamentally different from the inverse problem of protein design. Protein structure prediction is one of the most important goals pursued by bioinformatics and theoretical chemistry; it is highly important in medicine (for example, in drug design) and biotechnology (for example, in the design of novel enzymes). Every two years, the performance of current methods is assessed in the CASP
experiment (Critical Assessment of Techniques for Protein Structure
Prediction). A continuous evaluation of protein structure prediction web
servers is performed by the community project CAMEO3D.
Protein structure and terminology
Proteins
are chains of amino acids joined together by peptide bonds. Many
conformations of this chain are possible due to the rotation of the
chain about each Cα atom. It is these conformational changes that are
responsible for differences in the three dimensional structure of
proteins. Each amino acid in the chain is polar, i.e. it has separated
positive and negative charged regions with a free carbonyl group,
which can act as hydrogen bond acceptor and an NH group, which can act
as hydrogen bond donor. These groups can therefore interact in the
protein structure. The 20 amino acids can be classified according to
the chemistry of the side chain which also plays an important structural
role. Glycine
takes on a special position, as it has the smallest side chain, only
one hydrogen atom, and therefore can increase the local flexibility in
the protein structure. Cysteine on the other hand can react with another cysteine residue and thereby form a cross link stabilizing the whole structure.
The protein structure can be considered as a sequence of
secondary structure elements, such as α helices and β sheets, which
together constitute the overall three-dimensional configuration of the
protein chain. In these secondary structures regular patterns of H bonds
are formed between neighboring amino acids, and the amino acids have
similar Φ and Ψ angles.
Bond angles for ψ and ω
The formation of these structures neutralizes the polar groups on
each amino acid. The secondary structures are tightly packed in the
protein core in a hydrophobic environment. Each amino acid side group
has a limited volume to occupy and a limited number of possible
interactions with other nearby side chains, a situation that must be
taken into account in molecular modeling and alignments.
α Helix
The α helix is the most abundant type of secondary structure in
proteins. The α helix has 3.6 amino acids per turn with an H bond formed
between every fourth residue; the average length is 10 amino acids (3
turns) or 10 Å but varies from 5 to 40 (1.5 to 11 turns). The alignment
of the H bonds creates a dipole moment for the helix with a resulting
partial positive charge at the amino end of the helix. Because this
region has free NH2 groups, it will interact with
negatively charged groups such as phosphates. The most common location
of α helices is at the surface of protein cores, where they provide an
interface with the aqueous environment. The inner-facing side of the
helix tends to have hydrophobic amino acids and the outer-facing side
hydrophilic amino acids. Thus, every third of four amino acids along the
chain will tend to be hydrophobic, a pattern that can be quite readily
detected. In the leucine zipper motif, a repeating pattern of leucines
on the facing sides of two adjacent helices is highly predictive of the
motif. A helical-wheel plot can be used to show this repeated pattern.
Other α helices buried in the protein core or in cellular
membranes have a higher and more regular distribution of
hydrophobic amino acids, and are highly predictive of such structures.
Helices exposed on the surface have a lower proportion of hydrophobic
amino acids. Amino acid content can be predictive of an α -helical
region. Regions richer in alanine (A), glutamic acid (E), leucine (L), and methionine (M) and poorer in proline (P), glycine (G), tyrosine (Y), and serine
(S) tend to form an α helix. Proline destabilizes or breaks an α
helix but can be present in longer helices, forming a bend.
An alpha-helix with hydrogen bonds (yellow dots)
β sheet
β sheets are formed by H bonds between an average of 5–10 consecutive
amino acids in one portion of the chain with another 5–10 farther down
the chain. The interacting regions may be adjacent, with a short loop in
between, or far apart, with other structures in between. Every chain
may run in the same direction to form a parallel sheet, every other
chain may run in the reverse chemical direction to form an anti parallel
sheet, or the chains may be parallel and anti parallel to form a mixed
sheet. The pattern of H bonding is different in the parallel and anti
parallel configurations. Each amino acid in the interior strands of the
sheet forms two H bonds with neighboring amino acids, whereas each amino
acid on the outside strands forms only one bond with an interior
strand. Looking across the sheet at right angles to the strands, more
distant strands are rotated slightly counterclockwise to form a
left-handed twist. The Cα atoms alternate above and below the sheet in a
pleated structure, and the R side groups of the amino acids alternate
above and below the pleats. The Φ and Ψ angles of the amino acids in
sheets vary considerably in one region of the Ramachandran plot.
It is more difficult to predict the location of β sheets than of α
helices. The situation improves somewhat when the amino acid variation
in multiple sequence alignments is taken into account.
Loop
Loops are
regions of a protein chain that are 1) between α helices and β
sheets, 2) of various lengths and three-dimensional configurations, and
3) on the surface of the structure.
Hairpin loops that represent a complete turn in the polypeptide
chain joining two antiparallel β strands may be as short as two amino
acids in length. Loops interact with the surrounding aqueous environment
and other proteins. Because amino acids in loops are not constrained by
space and environment as are amino acids in the core region, and do not
have an effect on the arrangement of secondary structures in the core,
more substitutions, insertions, and deletions may occur. Thus, in a
sequence alignment, the presence of these features may be an indication
of a loop. The positions of introns in genomic DNA sometimes correspond to the locations of loops in the encoded protein.
Loops also tend to have charged and polar amino acids and are
frequently a component of active sites. A detailed examination of loop
structures has shown that they fall into distinct families.
Coils
A region of secondary structure that is not an α helix, a β sheet, or a recognizable turn is commonly referred to as a coil.
Protein classification
Proteins
may be classified according to both structural and sequence similarity.
For structural classification, the sizes and spatial arrangements of
secondary structures described in the above paragraph are compared in
known three-dimensional structures. Classification based on sequence
similarity was historically the first to be used. Initially, similarity
based on alignments of whole sequences was performed. Later, proteins
were classified on the basis of the occurrence of conserved amino acid
patterns. Databases
that classify proteins by one or more of these schemes are available.
In considering protein classification schemes, it is important to keep
several observations in mind. First, two entirely different protein
sequences from different evolutionary origins may fold into a similar
structure. Conversely, the sequence of an ancient gene for a given
structure may have diverged considerably in different species while at
the same time maintaining the same basic structural features.
Recognizing any remaining sequence similarity in such cases may be a
very difficult task. Second, two proteins that share a significant
degree of sequence similarity either with each other or with a third
sequence also share an evolutionary origin and should share some
structural features also. However, gene duplication and genetic
rearrangements during evolution may give rise to new gene copies, which
can then evolve into proteins with new function and structure.
Terms used for classifying protein structures and sequences
The
more commonly used terms for evolutionary and structural relationships
among proteins are listed below. Many additional terms are used for
various kinds of structural features found in proteins. Descriptions of
such terms may be found at the CATH Web site, the Structural Classification of Proteins (SCOP) Web site, and a Glaxo-Wellcome tutorial on the Swiss bioinformatics Expasy Web site.
a localized combination of amino acid side groups within the
tertiary (three-dimensional) or quaternary (protein subunit) structure
that can interact with a chemically specific substrate and that provides
the protein with biological activity. Proteins of very different amino
acid sequences may fold into a structure that produces the same active
site.
Architecture
is the relative orientations of secondary structures in a
three-dimensional structure without regard to whether or not they share a
similar loop structure.
Fold (topology)
a type of architecture that also has a conserved loop structure.
Blocks
is a conserved amino acid sequence pattern in a family of proteins.
The pattern includes a series of possible matches at each position in
the represented sequences, but there are not any inserted or deleted
positions in the pattern or in the sequences. By way of contrast,
sequence profiles are a type of scoring matrix that represents a similar
set of patterns that includes insertions and deletions.
a term used to classify protein domains according to their secondary structural content and organization. Four classes
were originally recognized by Levitt and Chothia (1976), and several
others have been added in the SCOP database. Three classes are given in
the CATH database: mainly-α, mainly-β, and α–β, with the α–β class
including both alternating α/β and α+β structures.
Core
the portion of a folded protein molecule that comprises the
hydrophobic interior of α-helices and β-sheets. The compact structure
brings together side groups of amino acids into close enough proximity
so that they can interact. When comparing protein structures, as in the
SCOP database, core is the region common to most of the structures that
share a common fold or that are in the same superfamily. In structure
prediction, core is sometimes defined as the arrangement of secondary
structures that is likely to be conserved during evolutionary change.
a segment of a polypeptide chain that can fold into a
three-dimensional structure irrespective of the presence of other
segments of the chain. The separate domains of a given protein may
interact extensively or may be joined only by a length of polypeptide
chain. A protein with several domains may use these domains for
functional interactions with different molecules.
a group of proteins of similar biochemical function that are more
than 50% identical when aligned. This same cutoff is still used by the Protein Information Resource
(PIR). A protein family comprises proteins with the same function in
different organisms (orthologous sequences) but may also include
proteins in the same organism (paralogous sequences) derived from gene
duplication and rearrangements. If a multiple sequence alignment of a
protein family reveals a common level of similarity throughout the
lengths of the proteins, PIR refers to the family as a homeomorphic
family. The aligned region is referred to as a homeomorphic domain, and
this region may comprise several smaller homology domains that are
shared with other families. Families may be further subdivided into
subfamilies or grouped into superfamilies based on respective higher or
lower levels of sequence similarity. The SCOP database reports 1296
families and the CATH database (version 1.7 beta), reports 1846
families.
When the sequences of proteins with the same function are examined
in greater detail, some are found to share high sequence similarity.
They are obviously members of the same family by the above criteria.
However, others are found that have very little, or even insignificant,
sequence similarity with other family members. In such cases, the family
relationship between two distant family members A and C can often be
demonstrated by finding an additional family member B that shares
significant similarity with both A and C. Thus, B provides a connecting
link between A and C. Another approach is to examine distant alignments
for highly conserved matches.
At a level of identity of 50%, proteins are likely to have the same
three-dimensional structure, and the identical atoms in the sequence
alignment will also superimpose within approximately 1 Å in the
structural model. Thus, if the structure of one member of a family is
known, a reliable prediction may be made for a second member of the
family, and the higher the identity level, the more reliable the
prediction. Protein structural modeling can be performed by examining
how well the amino acid substitutions fit into the core of the
three-dimensional structure.
Family (structural context)
as used in the FSSP database (Families of structurally similar proteins)
and the DALI/FSSP Web site, two structures that have a significant
level of structural similarity but not necessarily significant sequence
similarity.
Fold
similar to structural motif, includes a larger combination of
secondary structural units in the same configuration. Thus, proteins
sharing the same fold have the same combination of secondary structures
that are connected by similar loops. An example is the Rossman fold
comprising several alternating α helices and parallel β strands. In
the SCOP, CATH, and FSSP databases, the known protein structures have
been classified into hierarchical levels of structural complexity with
the fold as a basic level of classification.
Homologous domain (sequence context)
an extended sequence pattern, generally found by sequence alignment
methods, that indicates a common evolutionary origin among the aligned
sequences. A homology domain is generally longer than motifs. The domain
may include all of a given protein sequence or only a portion of the
sequence. Some domains are complex and made up of several smaller
homology domains that became joined to form a larger one during
evolution. A domain that covers an entire sequence is called the
homeomorphic domain by PIR (Protein Information Resource).
Module
a region of conserved amino acid patterns comprising one or more
motifs and considered to be a fundamental unit of structure or function.
The presence of a module has also been used to classify proteins into
families.
a conserved pattern of amino acids that is found in two or more proteins. In the Prosite
catalog, a motif is an amino acid pattern that is found in a group of
proteins that have a similar biochemical activity, and that often is
near the active site of the protein. Examples of sequence motif
databases are the Prosite catalog and the Stanford Motifs Database.
Motif (structural context)
a combination of several secondary structural elements produced by
the folding of adjacent sections of the polypeptide chain into a
specific three-dimensional configuration. An example is the
helix-loop-helix motif. Structural motifs are also referred to as
supersecondary structures and folds.
Position-specific scoring matrix (sequence context, also known as weight or scoring matrix)
represents a conserved region in a multiple sequence alignment with
no gaps. Each matrix column represents the variation found in one column
of the multiple sequence alignment.
represents the amino acid variation found in an alignment of
proteins that fall into the same structural class. Matrix columns
represent the amino acid variation found at one amino acid position in
the aligned structures.
the linear amino acid sequence of a protein, which chemically is a
polypeptide chain composed of amino acids joined by peptide bonds.
Profile (sequence context)
a scoring matrix that represents a multiple sequence alignment of a
protein family. The profile is usually obtained from a well-conserved
region in a multiple sequence alignment. The profile is in the form of a
matrix with each column representing a position in the alignment and
each row one of the amino acids. Matrix values give the likelihood of
each amino acid at the corresponding position in the alignment. The
profile is moved along the target sequence to locate the best scoring
regions by a dynamic programming algorithm. Gaps are allowed during
matching and a gap penalty is included in this case as a negative score
when no amino acid is matched. A sequence profile may also be
represented by a hidden Markov model, referred to as a profile HMM.
Profile (structural context)
a scoring matrix that represents which amino acids should fit well
and which should fit poorly at sequential positions in a known protein
structure. Profile columns represent sequential positions in the
structure, and profile rows represent the 20 amino acids. As with a
sequence profile, the structural profile is moved along a target
sequence to find the highest possible alignment score by a dynamic
programming algorithm. Gaps may be included and receive a penalty. The
resulting score provides an indication as to whether or not the target
protein might adopt such a structure.
the interactions that occur between the C, O, and NH groups on amino
acids in a polypeptide chain to form α-helices, β-sheets, turns, loops,
and other forms, and that facilitate the folding into a
three-dimensional structure.
a group of protein families of the same or different lengths that
are related by distant yet detectable sequence similarity. Members of a
given superfamily
thus have a common evolutionary origin. Originally, Dayhoff defined the
cutoff for superfamily status as being the chance that the sequences
are not related of 10 6, on the basis of an alignment score (Dayhoff et
al. 1978). Proteins with few identities in an alignment of the sequences
but with a convincingly common number of structural and functional
features are placed in the same superfamily. At the level of
three-dimensional structure, superfamily proteins will share common
structural features such as a common fold, but there may also be
differences in the number and arrangement of secondary structures. The
PIR resource uses the term homeomorphic superfamilies to refer to
superfamilies that are composed of sequences that can be aligned from
end to end, representing a sharing of single sequence homology domain, a
region of similarity that extends throughout the alignment. This domain
may also comprise smaller homology domains that are shared with other
protein families and superfamilies. Although a given protein sequence
may contain domains found in several superfamilies, thus indicating a
complex evolutionary history, sequences will be assigned to only one
homeomorphic superfamily based on the presence of similarity throughout
a multiple sequence alignment. The superfamily alignment may also
include regions that do not align either within or at the ends of the
alignment. In contrast, sequences in the same family align well
throughout the alignment.
a term with similar meaning to a structural motif. Tertiary
structure is the three-dimensional or globular structure formed by the
packing together or folding of secondary structures of a polypeptide
chain.
Secondary structure
Secondary structure prediction is a set of techniques in bioinformatics that aim to predict the local secondary structures of proteins based only on knowledge of their amino acid sequence. For proteins, a prediction consists of assigning regions of the amino acid sequence as likely alpha helices, beta strands (often noted as "extended" conformations), or turns. The success of a prediction is determined by comparing it to the results of the DSSP algorithm (or similar e.g. STRIDE) applied to the crystal structure of the protein. Specialized algorithms have been developed for the detection of specific well-defined patterns such as transmembrane helices and coiled coils in proteins.
The best modern methods of secondary structure prediction in proteins reach about 80% accuracy; this high accuracy allows the use of the predictions as feature improving fold recognition and ab initio protein structure prediction, classification of structural motifs, and refinement of sequence alignments. The accuracy of current protein secondary structure prediction methods is assessed in weekly benchmarks such as LiveBench and EVA.
Background
Early methods of secondary structure prediction, introduced in the 1960s and early 1970s, focused on identifying likely alpha helices and were based mainly on helix-coil transition models.
Significantly more accurate predictions that included beta sheets were
introduced in the 1970s and relied on statistical assessments based on
probability parameters derived from known solved structures. These
methods, applied to a single sequence, are typically at most about
60-65% accurate, and often underpredict beta sheets. The evolutionaryconservation of secondary structures can be exploited by simultaneously assessing many homologous sequences in a multiple sequence alignment,
by calculating the net secondary structure propensity of an aligned
column of amino acids. In concert with larger databases of known protein
structures and modern machine learning methods such as neural nets and support vector machines, these methods can achieve up to 80% overall accuracy in globular proteins. The theoretical upper limit of accuracy is around 90%,
partly due to idiosyncrasies in DSSP assignment near the ends of
secondary structures, where local conformations vary under native
conditions but may be forced to assume a single conformation in crystals
due to packing constraints. Limitations are also imposed by secondary
structure prediction's inability to account for tertiary structure;
for example, a sequence predicted as a likely helix may still be able
to adopt a beta-strand conformation if it is located within a beta-sheet
region of the protein and its side chains pack well with their
neighbors. Dramatic conformational changes related to the protein's
function or environment can also alter local secondary structure.
Historical perspective
To date, over 20 different secondary structure prediction methods have been developed. One of the first algorithms was Chou-Fasman method,
which relies predominantly on probability parameters determined from
relative frequencies of each amino acid's appearance in each type of
secondary structure.
The original Chou-Fasman parameters, determined from the small sample
of structures solved in the mid-1970s, produce poor results compared to
modern methods, though the parameterization has been updated since it
was first published. The Chou-Fasman method is roughly 50-60% accurate
in predicting secondary structures.
The next notable program was the GOR method, named for the three scientists who developed it — Garnier, Osguthorpe, and Robson, is an information theory-based method. It uses the more powerful probabilistic technique of Bayesian inference.
The GOR method takes into account not only the probability of each
amino acid having a particular secondary structure, but also the conditional probability
of the amino acid assuming each structure given the contributions of
its neighbors (it does not assume that the neighbors have that same
structure). The approach is both more sensitive and more accurate than
that of Chou and Fasman because amino acid structural propensities are
only strong for a small number of amino acids such as proline and glycine.
Weak contributions from each of many neighbors can add up to strong
effects overall. The original GOR method was roughly 65% accurate and is
dramatically more successful in predicting alpha helices than beta
sheets, which it frequently mispredicted as loops or disorganized
regions.
Another big step forward, was using machine learning methods. First artificial neural networks
methods were used. As a training sets they use solved structures to
identify common sequence motifs associated with particular arrangements
of secondary structures. These methods are over 70% accurate in their
predictions, although beta strands are still often underpredicted due to
the lack of three-dimensional structural information that would allow
assessment of hydrogen bonding patterns that can promote formation of the extended conformation required for the presence of a complete beta sheet. PSIPRED and JPRED are some of the most known programs based on neural networks for protein secondary structure prediction. Next, support vector machines have proven particularly useful for predicting the locations of turns, which are difficult to identify with statistical methods.
Extensions of machine learning techniques attempt to predict more fine-grained local properties of proteins, such as backbonedihedral angles in unassigned regions. Both SVMs and neural networks have been applied to this problem.
More recently, real-value torsion angles can be accurately predicted by
SPINE-X and successfully employed for ab initio structure prediction.
Other improvements
It
is reported that in addition to the protein sequence, secondary
structure formation depends on other factors. For example, it is
reported that secondary structure tendencies depend also on local
environment, solvent accessibility of residues, protein structural class, and even the organism from which the proteins are obtained.
Based on such observations, some studies have shown that secondary
structure prediction can be improved by addition of information about
protein structural class, residue accessible surface area and also contact number information.
Tertiary structure
The
practical role of protein structure prediction is now more important
than ever. Massive amounts of protein sequence data are produced by
modern large-scale DNA sequencing efforts such as the Human Genome Project. Despite community-wide efforts in structural genomics, the output of experimentally determined protein structures—typically by time-consuming and relatively expensive X-ray crystallography or NMR spectroscopy—is lagging far behind the output of protein sequences.
The protein structure prediction remains an extremely difficult
and unresolved undertaking. The two main problems are calculation of protein free energy and finding the global minimum of this energy. A protein structure prediction method must explore the space of possible protein structures which is astronomically large. These problems can be partially bypassed in "comparative" or homology modeling and fold recognition
methods, in which the search space is pruned by the assumption that the
protein in question adopts a structure that is close to the
experimentally determined structure of another homologous protein. On
the other hand, the de novo or ab initio protein structure prediction
methods must explicitly resolve these problems. The progress and
challenges in protein structure prediction has been reviewed in Zhang
2008.
Before modelling
Most
tertiary structure modelling methods, such as Rosetta, are optimized
for modelling the tertiary structure of single protein domains. A step
called domain parsing, or domain boundary prediction, is
usually done first to split a protein into potential structural domains.
As with the rest of tertiary structure prediction, this can be done
comparatively from known structures or ab initio with the sequence only (usually by machine learning, assisted by covariation). The structures for individual domains are docked together in a process called domain assembly to form the final tertiary structure.
Ab initio protein modelling
Energy- and fragment-based methods
Ab initio- or de novo-
protein modelling methods seek to build three-dimensional protein
models "from scratch", i.e., based on physical principles rather than
(directly) on previously solved structures. There are many possible
procedures that either attempt to mimic protein folding or apply some stochastic method to search possible solutions (i.e., global optimization
of a suitable energy function). These procedures tend to require vast
computational resources, and have thus only been carried out for tiny
proteins. To predict protein structure de novo for larger
proteins will require better algorithms and larger computational
resources like those afforded by either powerful supercomputers (such as
Blue Gene or MDGRAPE-3) or distributed computing (such as Folding@home, the Human Proteome Folding Project and Rosetta@Home).
Although these computational barriers are vast, the potential benefits
of structural genomics (by predicted or experimental methods) make ab initio structure prediction an active research field.
As of 2009, a 50-residue protein could be simulated atom-by-atom on a supercomputer for 1 millisecond.
As of 2012, comparable stable-state sampling could be done on a
standard desktop with a new graphics card and more sophisticated
algorithms. A much larger simulation timescales can be achieved using coarse-grained modeling.
Evolutionary covariation to predict 3D contacts
As sequencing became more commonplace in the 1990s several groups used protein sequence alignments to predict correlated mutations
and it was hoped that these coevolved residues could be used to predict
tertiary structure (using the analogy to distance constraints from
experimental procedures such as NMR).
The assumption is when single residue mutations are slightly
deleterious, compensatory mutations may occur to restabilize
residue-residue interactions.
This early work used what are known as local methods to calculate
correlated mutations from protein sequences, but suffered from indirect
false correlations which result from treating each pair of residues as
independent of all other pairs.
In 2011, a different, and this time global statistical
approach, demonstrated that predicted coevolved residues were sufficient
to predict the 3D fold of a protein, providing there are enough
sequences available (>1,000 homologous sequences are needed). The method, EVfold,
uses no homology modeling, threading or 3D structure fragments and can
be run on a standard personal computer even for proteins with hundreds
of residues. The accuracy of the contacts predicted using this and
related approaches has now been demonstrated on many known structures
and contact maps, including the prediction of experimentally unsolved transmembrane proteins.
Comparative protein modeling
Comparative
protein modelling uses previously solved structures as starting points,
or templates. This is effective because it appears that although the
number of actual proteins is vast, there is a limited set of tertiarystructural motifs
to which most proteins belong. It has been suggested that there are
only around 2,000 distinct protein folds in nature, though there are
many millions of different proteins.
These methods may also be split into two groups:
Homology modeling is based on the reasonable assumption that two homologous
proteins will share very similar structures. Because a protein's fold
is more evolutionarily conserved than its amino acid sequence, a target
sequence can be modeled with reasonable accuracy on a very distantly
related template, provided that the relationship between target and
template can be discerned through sequence alignment.
It has been suggested that the primary bottleneck in comparative
modelling arises from difficulties in alignment rather than from errors
in structure prediction given a known-good alignment. Unsurprisingly, homology modelling is most accurate when the target and template have similar sequences.
Protein threading
scans the amino acid sequence of an unknown structure against a
database of solved structures. In each case, a scoring function is used
to assess the compatibility of the sequence to the structure, thus
yielding possible three-dimensional models. This type of method is also
known as 3D-1D fold recognition due to its compatibility analysis
between three-dimensional structures and linear protein sequences. This
method has also given rise to methods performing an inverse folding search
by evaluating the compatibility of a given structure with a large
database of sequences, thus predicting which sequences have the
potential to produce a given fold.
Side-chain geometry prediction
Accurate packing of the amino acid side chains
represents a separate problem in protein structure prediction. Methods
that specifically address the problem of predicting side-chain geometry
include dead-end elimination and the self-consistent mean field
methods. The side chain conformations with low energy are usually
determined on the rigid polypeptide backbone and using a set of discrete
side chain conformations known as "rotamers." The methods attempt to identify the set of rotamers that minimize the model's overall energy.
These methods use rotamer libraries, which are collections of
favorable conformations for each residue type in proteins. Rotamer
libraries may contain information about the conformation, its frequency,
and the standard deviations about mean dihedral angles, which can be
used in sampling. Rotamer libraries are derived from structural bioinformatics
or other statistical analysis of side-chain conformations in known
experimental structures of proteins, such as by clustering the observed
conformations for tetrahedral carbons near the staggered (60°, 180°,
-60°) values.
Rotamer libraries can be backbone-independent,
secondary-structure-dependent, or backbone-dependent.
Backbone-independent rotamer libraries make no reference to backbone
conformation, and are calculated from all available side chains of a
certain type (for instance, the first example of a rotamer library, done
by Ponder and Richards at Yale in 1987). Secondary-structure-dependent libraries present different dihedral angles and/or rotamer frequencies for -helix, -sheet, or coil secondary structures.
Backbone-dependent rotamer libraries present conformations and/or
frequencies dependent on the local backbone conformation as defined by
the backbone dihedral angles and , regardless of secondary structure.
The modern versions of these libraries as used in most software
are presented as multidimensional distributions of probability or
frequency, where the peaks correspond to the dihedral-angle
conformations considered as individual rotamers in the lists. Some
versions are based on very carefully curated data and are used primarily
for structure validation,
while others emphasize relative frequencies in much larger data sets
and are the form used primarily for structure prediction, such as the
Dunbrack rotamer libraries.
Side-chain packing methods are most useful for analyzing the protein's hydrophobic
core, where side chains are more closely packed; they have more
difficulty addressing the looser constraints and higher flexibility of
surface residues, which often occupy multiple rotamer conformations
rather than just one.
Prediction of structural classes
Statistical methods have been developed for predicting structural classes of proteins based on their amino acid composition, pseudo amino acid composition and functional domain composition. Secondary structure predicion also implicitly generates such a prediction for singular domains.
Quaternary structure
In the case of complexes of two or more proteins, where the structures of the proteins are known or can be predicted with high accuracy, protein–protein docking
methods can be used to predict the structure of the complex.
Information of the effect of mutations at specific sites on the affinity
of the complex helps to understand the complex structure and to guide
docking methods.
Evaluation of automatic structure prediction servers
CASP,
which stands for Critical Assessment of Techniques for Protein Structure
Prediction, is a community-wide experiment for protein structure
prediction taking place every two years since 1994. CASP provides with
an opportunity to assess the quality of available human, non-automated
methodology (human category) and automatic servers for protein structure
prediction (server category, introduced in the CASP7).
The CAMEO3D
Continuous Automated Model EvaluatiOn Server evaluates automated
protein structure prediction servers on a weekly basis using blind
predictions for newly release protein structures. CAMEO publishes the
results on its website.
Artificial life
or virtual evolution attempts to understand evolutionary processes via
the computer simulation of simple (artificial) life forms.
An unexpected emergent property of a complex system may be a result of the interplay of the cause-and-effect among simpler, integrated parts.
Biological systems manifest many important examples of emergent
properties in the complex interplay of components. Traditional study of
biological systems requires reductive methods in which quantities of
data are gathered by category, such as concentration over time in
response to a certain stimulus. Computers are critical to analysis and
modelling of these data. The goal is to create accurate real-time models
of a system's response to environmental and internal stimuli, such as a
model of a cancer cell in order to find weaknesses in its signalling
pathways, or modelling of ion channel mutations to see effects on
cardiomyocytes and in turn, the function of a beating heart.
Standards
By far the most widely accepted standard format for storing and exchanging models in the field is the Systems Biology Markup Language (SBML) The SBML.org
website includes a guide to many important software packages used in
computational systems biology. A large number of models encoded in SBML
can be retrieved from BioModels. Other markup languages with different emphases include BioPAX and CellML.
The complex network of biochemical reaction/transport processes and their spatial organization make the development of a predictive model of a living cell a grand challenge for the 21st century, listed as such by the National Science Foundation (NSF) in 2006.
A whole cell computational model for the bacterium Mycoplasma genitalium,
including all its 525 genes, gene products, and their interactions, was
built by scientists from Stanford University and the J. Craig Venter
Institute and published on 20 July 2012 in Cell.
A dynamic computer model of intracellular signaling was the basis
for Merrimack Pharmaceuticals to discover the target for their cancer
medicine MM-111.
An open source simulation of C. elegans at the cellular level is being pursued by the OpenWorm community. So far the physics engine Gepetto has been built and models of the neural connectome and a muscle cell have been created in the NeuroML format.
The Blue Brain Project is an attempt to create a synthetic brain by reverse-engineering the mammalian brain down to the molecular level. The aim of this project, founded in May 2005 by the Brain and Mind Institute of the École Polytechnique in Lausanne,
Switzerland, is to study the brain's architectural and functional
principles. The project is headed by the Institute's director, Henry
Markram. Using a Blue Genesupercomputer running Michael Hines's NEURON software, the simulation does not consist simply of an artificial neural network, but involves a partially biologically realistic model of neurons. It is hoped by its proponents that it will eventually shed light on the nature of consciousness.
There are a number of sub-projects, including the Cajal Blue Brain, coordinated by the Supercomputing and Visualization Center of Madrid
(CeSViMa), and others run by universities and independent laboratories
in the UK, U.S., and Israel. The Human Brain Project builds on the work
of the Blue Brain Project. It is one of six pilot projects in the Future Emerging Technologies Research Program of the European Commission, competing for a billion euro funding.
Model of the immune system
The last decade has seen the emergence of a growing number of simulations of the immune system.
Virtual liver
The Virtual Liver
project is a 43 million euro research program funded by the German
Government, made up of seventy research group distributed across
Germany. The goal is to produce a virtual liver, a dynamic mathematical
model that represents human liver physiology, morphology and function.
Tree model
Electronic trees (e-trees) usually use L-systems to simulate growth. L-systems are very important in the field of complexity science and A-life.
A universally accepted system for describing changes in plant morphology at the cellular or modular level has yet to be devised.
The most widely implemented tree generating algorithms are described in the papers "Creation and Rendering of Realistic Trees", and Real-Time Tree Rendering.
The purpose of models in ecotoxicology
is the understanding, simulation and prediction of effects caused by
toxicants in the environment. Most current models describe effects on
one of many different levels of biological organization (e.g. organisms
or populations). A challenge is the development of models that predict
effects across biological scales. Ecotoxicology and models discusses some types of ecotoxicological models and provides links to many others.
Modelling of infectious disease
It is possible to model the progress of most infectious diseases mathematically to discover the likely outcome of an epidemic or to help manage them by vaccination. This field tries to find parameters for various infectious diseases and to use those parameters to make useful calculations about the effects of a mass vaccination programme.