A Medley of Potpourri

Monday, May 7, 2018

Enzyme

From Wikipedia, the free encyclopedia

The enzyme glucosidase converts the sugar maltose to two glucose sugars. Active site residues in red, maltose substrate in black, and NAD cofactor in yellow. (PDB: 1OBB)

Enzymes /ˈɛnzaɪmz/ are macromolecular biological catalysts. Enzymes accelerate chemical reactions. The molecules upon which enzymes may act are called substrates and the enzyme converts the substrates into different molecules known as products. Almost all metabolic processes in the cell need enzyme catalysis in order to occur at rates fast enough to sustain life.^[1]^:8.1 Metabolic pathways depend upon enzymes to catalyze individual steps. The study of enzymes is called enzymology and a new field of pseudoenzyme analysis has recently grown up, recognising that during evolution, some enzymes have lost the ability to carry out biological catalysis, which is often reflected in their amino acid sequences and unusual 'pseudocatalytic' properties.^[2]^[3]

Enzymes are known to catalyze more than 5,000 biochemical reaction types.^[4] Most enzymes are proteins, although a few are catalytic RNA molecules. The latter are called ribozymes. Enzymes' specificity comes from their unique three-dimensional structures.

Like all catalysts, enzymes increase the reaction rate by lowering its activation energy. Some enzymes can make their conversion of substrate to product occur many millions of times faster. An extreme example is orotidine 5'-phosphate decarboxylase, which allows a reaction that would otherwise take millions of years to occur in milliseconds.^[5]^[6] Chemically, enzymes are like any catalyst and are not consumed in chemical reactions, nor do they alter the equilibrium of a reaction. Enzymes differ from most other catalysts by being much more specific. Enzyme activity can be affected by other molecules: inhibitors are molecules that decrease enzyme activity, and activators are molecules that increase activity. Many therapeutic drugs and poisons are enzyme inhibitors. An enzyme's activity decreases markedly outside its optimal temperature and pH.

Some enzymes are used commercially, for example, in the synthesis of antibiotics. Some household products use enzymes to speed up chemical reactions: enzymes in biological washing powders break down protein, starch or fat stains on clothes, and enzymes in meat tenderizer break down proteins into smaller molecules, making the meat easier to chew.

Etymology and history

Eduard Buchner

By the late 17th and early 18th centuries, the digestion of meat by stomach secretions^[7] and the conversion of starch to sugars by plant extracts and saliva were known but the mechanisms by which these occurred had not been identified.^[8]

French chemist Anselme Payen was the first to discover an enzyme, diastase, in 1833.^[9] A few decades later, when studying the fermentation of sugar to alcohol by yeast, Louis Pasteur concluded that this fermentation was caused by a vital force contained within the yeast cells called "ferments", which were thought to function only within living organisms. He wrote that "alcoholic fermentation is an act correlated with the life and organization of the yeast cells, not with the death or putrefaction of the cells."^[10]

In 1877, German physiologist Wilhelm Kühne (1837–1900) first used the term enzyme, which comes from Greek ἔνζυμον, "leavened" or "in yeast", to describe this process.^[11] The word enzyme was used later to refer to nonliving substances such as pepsin, and the word ferment was used to refer to chemical activity produced by living organisms.^[12]

Eduard Buchner submitted his first paper on the study of yeast extracts in 1897. In a series of experiments at the University of Berlin, he found that sugar was fermented by yeast extracts even when there were no living yeast cells in the mixture.^[13] He named the enzyme that brought about the fermentation of sucrose "zymase".^[14] In 1907, he received the Nobel Prize in Chemistry for "his discovery of cell-free fermentation". Following Buchner's example, enzymes are usually named according to the reaction they carry out: the suffix -ase is combined with the name of the substrate (e.g., lactase is the enzyme that cleaves lactose) or to the type of reaction (e.g., DNA polymerase forms DNA polymers).^[15]

The biochemical identity of enzymes was still unknown in the early 1900s. Many scientists observed that enzymatic activity was associated with proteins, but others (such as Nobel laureate Richard Willstätter) argued that proteins were merely carriers for the true enzymes and that proteins per se were incapable of catalysis.^[16] In 1926, James B. Sumner showed that the enzyme urease was a pure protein and crystallized it; he did likewise for the enzyme catalase in 1937. The conclusion that pure proteins can be enzymes was definitively demonstrated by John Howard Northrop and Wendell Meredith Stanley, who worked on the digestive enzymes pepsin (1930), trypsin and chymotrypsin. These three scientists were awarded the 1946 Nobel Prize in Chemistry.^[17]

The discovery that enzymes could be crystallized eventually allowed their structures to be solved by x-ray crystallography. This was first done for lysozyme, an enzyme found in tears, saliva and egg whites that digests the coating of some bacteria; the structure was solved by a group led by David Chilton Phillips and published in 1965.^[18] This high-resolution structure of lysozyme marked the beginning of the field of structural biology and the effort to understand how enzymes work at an atomic level of detail.^[19]

Naming conventions

An enzyme's name is often derived from its substrate or the chemical reaction it catalyzes, with the word ending in -ase.^[1]^:8.1.3 Examples are lactase, alcohol dehydrogenase and DNA polymerase. Different enzymes that catalyze the same chemical reaction are called isozymes.^[1]^:10.3

The International Union of Biochemistry and Molecular Biology have developed a nomenclature for enzymes, the EC numbers; each enzyme is described by a sequence of four numbers preceded by "EC", which stands for "Enzyme Commission". The first number broadly classifies the enzyme based on its mechanism.^[20]

The top-level classification is:

EC 1, Oxidoreductases: catalyze oxidation/reduction reactions
EC 2, Transferases: transfer a functional group (e.g. a methyl or phosphate group)
EC 3, Hydrolases: catalyze the hydrolysis of various bonds
EC 4, Lyases: cleave various bonds by means other than hydrolysis and oxidation
EC 5, Isomerases: catalyze isomerization changes within a single molecule
EC 6, Ligases: join two molecules with covalent bonds.

These sections are subdivided by other features such as the substrate, products, and chemical mechanism. An enzyme is fully specified by four numerical designations. For example, hexokinase (EC 2.7.1.1) is a transferase (EC 2) that adds a phosphate group (EC 2.7) to a hexose sugar, a molecule containing an alcohol group (EC 2.7.1).^[21]

Structure

Enzyme activity initially increases with temperature (Q10 coefficient) until the enzyme's structure unfolds (denaturation), leading to an optimal rate of reaction at an intermediate temperature.

Enzymes are generally globular proteins, acting alone or in larger complexes. The sequence of the amino acids specifies the structure which in turn determines the catalytic activity of the enzyme.^[22] Although structure determines function, a novel enzymatic activity cannot yet be predicted from structure alone.^[23] Enzyme structures unfold (denature) when heated or exposed to chemical denaturants and this disruption to the structure typically causes a loss of activity.^[24] Enzyme denaturation is normally linked to temperatures above a species' normal level; as a result, enzymes from bacteria living in volcanic environments such as hot springs are prized by industrial users for their ability to function at high temperatures, allowing enzyme-catalysed reactions to be operated at a very high rate.

Enzymes are usually much larger than their substrates. Sizes range from just 62 amino acid residues, for the monomer of 4-oxalocrotonate tautomerase,^[25] to over 2,500 residues in the animal fatty acid synthase.^[26] Only a small portion of their structure (around 2–4 amino acids) is directly involved in catalysis: the catalytic site.^[27] This catalytic site is located next to one or more binding sites where residues orient the substrates. The catalytic site and binding site together comprise the enzyme's active site. The remaining majority of the enzyme structure serves to maintain the precise orientation and dynamics of the active site.^[28]

In some enzymes, no amino acids are directly involved in catalysis; instead, the enzyme contains sites to bind and orient catalytic cofactors.^[28] Enzyme structures may also contain allosteric sites where the binding of a small molecule causes a conformational change that increases or decreases activity.^[29]

A small number of RNA-based biological catalysts called ribozymes exist, which again can act alone or in complex with proteins. The most common of these is the ribosome which is a complex of protein and catalytic RNA components.^[1]^:2.2

Mechanism

Organisation of enzyme structure and lysozyme example. Binding sites in blue, catalytic site in red and peptidoglycan substrate in black. (PDB: 9LYZ)

Substrate binding

Enzymes must bind their substrates before they can catalyse any chemical reaction. Enzymes are usually very specific as to what substrates they bind and then the chemical reaction catalysed. Specificity is achieved by binding pockets with complementary shape, charge and hydrophilic/hydrophobic characteristics to the substrates. Enzymes can therefore distinguish between very similar substrate molecules to be chemoselective, regioselective and stereospecific.^[30]

Some of the enzymes showing the highest specificity and accuracy are involved in the copying and expression of the genome. Some of these enzymes have "proof-reading" mechanisms. Here, an enzyme such as DNA polymerase catalyzes a reaction in a first step and then checks that the product is correct in a second step.^[31] This two-step process results in average error rates of less than 1 error in 100 million reactions in high-fidelity mammalian polymerases.^[1]^:5.3.1 Similar proofreading mechanisms are also found in RNA polymerase,^[32] aminoacyl tRNA synthetases^[33] and ribosomes.^[34]

Conversely, some enzymes display enzyme promiscuity, having broad specificity and acting on a range of different physiologically relevant substrates. Many enzymes possess small side activities which arose fortuitously (i.e. neutrally), which may be the starting point for the evolutionary selection of a new function.^[35]^[36]

Enzyme changes shape by induced fit upon substrate binding to form enzyme-substrate complex. Hexokinase has a large induced fit motion that closes over the substrates adenosine triphosphate and xylose. Binding sites in blue, substrates in black and Mg²⁺ cofactor in yellow. (PDB: 2E2N, 2E2Q)

"Lock and key" model

To explain the observed specificity of enzymes, in 1894 Emil Fischer proposed that both the enzyme and the substrate possess specific complementary geometric shapes that fit exactly into one another.^[37] This is often referred to as "the lock and key" model.^[1]^:8.3.2 This early model explains enzyme specificity, but fails to explain the stabilization of the transition state that enzymes achieve.^[38]

Induced fit model

In 1958, Daniel Koshland suggested a modification to the lock and key model: since enzymes are rather flexible structures, the active site is continuously reshaped by interactions with the substrate as the substrate interacts with the enzyme.^[39] As a result, the substrate does not simply bind to a rigid active site; the amino acid side-chains that make up the active site are molded into the precise positions that enable the enzyme to perform its catalytic function. In some cases, such as glycosidases, the substrate molecule also changes shape slightly as it enters the active site.^[40] The active site continues to change until the substrate is completely bound, at which point the final shape and charge distribution is determined.^[41] Induced fit may enhance the fidelity of molecular recognition in the presence of competition and noise via the conformational proofreading mechanism.^[42]

Catalysis

Enzymes can accelerate reactions in several ways, all of which lower the activation energy (ΔG^‡, Gibbs free energy)^[43]

By stabilizing the transition state:
- Creating an environment with a charge distribution complementary to that of the transition state to lower its energy.^[44]
By providing an alternative reaction pathway:
- Temporarily reacting with the substrate, forming a covalent intermediate to provide a lower energy transition state.^[45]
By destabilising the substrate ground state:
- Distorting bound substrate(s) into their transition state form to reduce the energy required to reach the transition state.^[46]
- By orienting the substrates into a productive arrangement to reduce the reaction entropy change.^[47] The contribution of this mechanism to catalysis is relatively small.^[48]

Enzymes may use several of these mechanisms simultaneously. For example, proteases such as trypsin perform covalent catalysis using a catalytic triad, stabilise charge build-up on the transition states using an oxyanion hole, complete hydrolysis using an oriented water substrate.

Dynamics

Enzymes are not rigid, static structures; instead they have complex internal dynamic motions – that is, movements of parts of the enzyme's structure such as individual amino acid residues, groups of residues forming a protein loop or unit of secondary structure, or even an entire protein domain. These motions give rise to a conformational ensemble of slightly different structures that interconvert with one another at equilibrium. Different states within this ensemble may be associated with different aspects of an enzyme's function. For example, different conformations of the enzyme dihydrofolate reductase are associated with the substrate binding, catalysis, cofactor release, and product release steps of the catalytic cycle.^[49]

Allosteric modulation

Allosteric sites are pockets on the enzyme, distinct from the active site, that bind to molecules in the cellular environment. These molecules then cause a change in the conformation or dynamics of the enzyme that is transduced to the active site and thus affects the reaction rate of the enzyme.^[50] In this way, allosteric interactions can either inhibit or activate enzymes. Allosteric interactions with metabolites upstream or downstream in an enzyme's metabolic pathway cause feedback regulation, altering the activity of the enzyme according to the flux through the rest of the pathway.^[51]

Cofactors

Chemical structure for thiamine pyrophosphate and protein structure of transketolase. Thiamine pyrophosphate cofactor in yellow and xylulose 5-phosphate substrate in black. (PDB: 4KXV)

Some enzymes do not need additional components to show full activity. Others require non-protein molecules called cofactors to be bound for activity.^[52] Cofactors can be either inorganic (e.g., metal ions and iron-sulfur clusters) or organic compounds (e.g., flavin and heme). These cofactors serve many purposes; for instance, metal ions can help in stabilizing nucleophilic species within the active site.^[53] Organic cofactors can be either coenzymes, which are released from the enzyme's active site during the reaction, or prosthetic groups, which are tightly bound to an enzyme. Organic prosthetic groups can be covalently bound (e.g., biotin in enzymes such as pyruvate carboxylase).^[54]

An example of an enzyme that contains a cofactor is carbonic anhydrase, which is shown in the ribbon diagram above with a zinc cofactor bound as part of its active site.^[55] These tightly bound ions or molecules are usually found in the active site and are involved in catalysis.^[1]^:8.1.1 For example, flavin and heme cofactors are often involved in redox reactions.^[1]^:17

Enzymes that require a cofactor but do not have one bound are called apoenzymes or apoproteins. An enzyme together with the cofactor(s) required for activity is called a holoenzyme (or haloenzyme). The term holoenzyme can also be applied to enzymes that contain multiple protein subunits, such as the DNA polymerases; here the holoenzyme is the complete complex containing all the subunits needed for activity.^[1]^:8.1.1

Coenzymes

Coenzymes are small organic molecules that can be loosely or tightly bound to an enzyme. Coenzymes transport chemical groups from one enzyme to another.^[56] Examples include NADH, NADPH and adenosine triphosphate (ATP). Some coenzymes, such as flavin mononucleotide (FMN), flavin adenine dinucleotide (FAD), thiamine pyrophosphate (TPP), and tetrahydrofolate (THF), are derived from vitamins. These coenzymes cannot be synthesized by the body de novo and closely related compounds (vitamins) must be acquired from the diet. The chemical groups carried include:

the hydride ion (H⁻), carried by NAD or NADP⁺
the phosphate group, carried by adenosine triphosphate
the acetyl group, carried by coenzyme A
formyl, methenyl or methyl groups, carried by folic acid and
the methyl group, carried by S-adenosylmethionine.^[56]

Since coenzymes are chemically changed as a consequence of enzyme action, it is useful to consider coenzymes to be a special class of substrates, or second substrates, which are common to many different enzymes. For example, about 1000 enzymes are known to use the coenzyme NADH.^[57]

Coenzymes are usually continuously regenerated and their concentrations maintained at a steady level inside the cell. For example, NADPH is regenerated through the pentose phosphate pathway and S-adenosylmethionine by methionine adenosyltransferase. This continuous regeneration means that small amounts of coenzymes can be used very intensively. For example, the human body turns over its own weight in ATP each day.^[58]

Thermodynamics

The energies of the stages of a chemical reaction. Uncatalysed (dashed line), substrates need a lot of activation energy to reach a transition state, which then decays into lower-energy products. When enzyme catalysed (solid line), the enzyme binds the substrates (ES), then stabilizes the transition state (ES^‡) to reduce the activation energy required to produce products (EP) which are finally released.

As with all catalysts, enzymes do not alter the position of the chemical equilibrium of the reaction. In the presence of an enzyme, the reaction runs in the same direction as it would without the enzyme, just more quickly.^[1]^:8.2.3 For example, carbonic anhydrase catalyzes its reaction in either direction depending on the concentration of its reactants:^[59]

{\begin{matrix}{}\\{\ce {{CO2}+H2O->[{\ce {Carbonic\ anhydrase}}]H2CO3}}\\{}\end{matrix}}

(in tissues; high CO₂ concentration)

(1)

{\displaystyle {\begin{matrix}{}\\{\ce {{CO2}+H2O<- anhydrase="" annotation="" arbonic="" ce="" end="" matrix="">

(in lungs; low CO₂ concentration)

(2)

The rate of a reaction is dependent on the activation energy needed to form the transition state which then decays into products. Enzymes increase reaction rates by lowering the energy of the transition state. First, binding forms a low energy enzyme-substrate complex (ES). Secondly the enzyme stabilises the transition state such that it requires less energy to achieve compared to the uncatalyzed reaction (ES^‡). Finally the enzyme-product complex (EP) dissociates to release the products.^[1]^:8.3

Enzymes can couple two or more reactions, so that a thermodynamically favorable reaction can be used to "drive" a thermodynamically unfavourable one so that the combined energy of the products is lower than the substrates. For example, the hydrolysis of ATP is often used to drive other chemical reactions.^[60]

Kinetics

A chemical reaction mechanism with or without enzyme catalysis. The enzyme (E) binds substrate (S) to produce product (P).

Saturation curve for an enzyme reaction showing the relation between the substrate concentration and reaction rate.

Enzyme kinetics is the investigation of how enzymes bind substrates and turn them into products. The rate data used in kinetic analyses are commonly obtained from enzyme assays. In 1913 Leonor Michaelis and Maud Leonora Menten proposed a quantitative theory of enzyme kinetics, which is referred to as Michaelis–Menten kinetics.^[61] The major contribution of Michaelis and Menten was to think of enzyme reactions in two stages. In the first, the substrate binds reversibly to the enzyme, forming the enzyme-substrate complex. This is sometimes called the Michaelis-Menten complex in their honor. The enzyme then catalyzes the chemical step in the reaction and releases the product. This work was further developed by G. E. Briggs and J. B. S. Haldane, who derived kinetic equations that are still widely used today.^[62]

Enzyme rates depend on solution conditions and substrate concentration. To find the maximum speed of an enzymatic reaction, the substrate concentration is increased until a constant rate of product formation is seen. This is shown in the saturation curve on the right. Saturation happens because, as substrate concentration increases, more and more of the free enzyme is converted into the substrate-bound ES complex. At the maximum reaction rate (V_max) of the enzyme, all the enzyme active sites are bound to substrate, and the amount of ES complex is the same as the total amount of enzyme.^[1]^:8.4

V_max is only one of several important kinetic parameters. The amount of substrate needed to achieve a given rate of reaction is also important. This is given by the Michaelis-Menten constant (K_m), which is the substrate concentration required for an enzyme to reach one-half its maximum reaction rate; generally, each enzyme has a characteristic K_m for a given substrate. Another useful constant is k_cat, also called the turnover number, which is the number of substrate molecules handled by one active site per second.^[1]^:8.4

The efficiency of an enzyme can be expressed in terms of k_cat/K_m. This is also called the specificity constant and incorporates the rate constants for all steps in the reaction up to and including the first irreversible step. Because the specificity constant reflects both affinity and catalytic ability, it is useful for comparing different enzymes against each other, or the same enzyme with different substrates. The theoretical maximum for the specificity constant is called the diffusion limit and is about 10⁸ to 10⁹ (M⁻¹ s⁻¹). At this point every collision of the enzyme with its substrate will result in catalysis, and the rate of product formation is not limited by the reaction rate but by the diffusion rate. Enzymes with this property are called catalytically perfect or kinetically perfect. Example of such enzymes are triose-phosphate isomerase, carbonic anhydrase, acetylcholinesterase, catalase, fumarase, β-lactamase, and superoxide dismutase.^[1]^:8.4.2 The turnover of such enzymes can reach several million reactions per second.^[1]^:9.2 But most enzymes are far from perfect: the average values of

k_{\rm {cat}}/K_{\rm {m}}

and

k_{\rm {cat}}

are about

10^{5}{\rm {s}}^{-1}{\rm {M}}^{-1}

and

10{\rm {s}}^{-1}

, respectively.^[63]

Michaelis–Menten kinetics relies on the law of mass action, which is derived from the assumptions of free diffusion and thermodynamically driven random collision. Many biochemical or cellular processes deviate significantly from these conditions, because of macromolecular crowding and constrained molecular movement.^[64] More recent, complex extensions of the model attempt to correct for these effects.^[65]

Inhibition

An enzyme binding site that would normally bind substrate can alternatively bind a competitive inhibitor, preventing substrate access. Dihydrofolate reductase is inhibited by methotrexate which prevents binding of its substrate, folic acid. Binding site in blue, inhibitor in green, and substrate in black. (PDB: 4QI9)

The coenzyme folic acid (left) and the anti-cancer drug methotrexate (right) are very similar in structure (differences show in green). As a result, methotrexate is a competitive inhibitor of many enzymes that use folates.

Enzyme reaction rates can be decreased by various types of enzyme inhibitors.^[66]^:73–74

Types of inhibition

Competitive

A competitive inhibitor and substrate cannot bind to the enzyme at the same time.^[67] Often competitive inhibitors strongly resemble the real substrate of the enzyme. For example, the drug methotrexate is a competitive inhibitor of the enzyme dihydrofolate reductase, which catalyzes the reduction of dihydrofolate to tetrahydrofolate. The similarity between the structures of dihydrofolate and this drug are shown in the accompanying figure. This type of inhibition can be overcome with high substrate concentration. In some cases, the inhibitor can bind to a site other than the binding-site of the usual substrate and exert an allosteric effect to change the shape of the usual binding-site.

Non-competitive

A non-competitive inhibitor binds to a site other than where the substrate binds. The substrate still binds with its usual affinity and hence K_m remains the same. However the inhibitor reduces the catalytic efficiency of the enzyme so that V_max is reduced. In contrast to competitive inhibition, non-competitive inhibition cannot be overcome with high substrate concentration.^[66]^:76–78

Uncompetitive

An uncompetitive inhibitor cannot bind to the free enzyme, only to the enzyme-substrate complex; hence, these types of inhibitors are most effective at high substrate concentration. In the presence of the inhibitor, the enzyme-substrate complex is inactive.^[66]^:78 This type of inhibition is rare.^[68]

Mixed

A mixed inhibitor binds to an allosteric site and the binding of the substrate and the inhibitor affect each other. The enzyme's function is reduced but not eliminated when bound to the inhibitor. This type of inhibitor does not follow the Michaelis-Menten equation.^[66]^:76–78

Irreversible

An irreversible inhibitor permanently inactivates the enzyme, usually by forming a covalent bond to the protein. Penicillin^[69] and aspirin^[70] are common drugs that act in this manner.

Functions of inhibitors

In many organisms, inhibitors may act as part of a feedback mechanism. If an enzyme produces too much of one substance in the organism, that substance may act as an inhibitor for the enzyme at the beginning of the pathway that produces it, causing production of the substance to slow down or stop when there is sufficient amount. This is a form of negative feedback. Major metabolic pathways such as the citric acid cycle make use of this mechanism.^[1]^:17.2.2

Since inhibitors modulate the function of enzymes they are often used as drugs. Many such drugs are reversible competitive inhibitors that resemble the enzyme's native substrate, similar to methotrexate above; other well-known examples include statins used to treat high cholesterol,^[71] and protease inhibitors used to treat retroviral infections such as HIV.^[72] A common example of an irreversible inhibitor that is used as a drug is aspirin, which inhibits the COX-1 and COX-2 enzymes that produce the inflammation messenger prostaglandin.^[70] Other enzyme inhibitors are poisons. For example, the poison cyanide is an irreversible enzyme inhibitor that combines with the copper and iron in the active site of the enzyme cytochrome c oxidase and blocks cellular respiration.^[73]

Biological function

Enzymes serve a wide variety of functions inside living organisms. They are indispensable for signal transduction and cell regulation, often via kinases and phosphatases.^[74] They also generate movement, with myosin hydrolyzing ATP to generate muscle contraction, and also transport cargo around the cell as part of the cytoskeleton.^[75] Other ATPases in the cell membrane are ion pumps involved in active transport. Enzymes are also involved in more exotic functions, such as luciferase generating light in fireflies.^[76] Viruses can also contain enzymes for infecting cells, such as the HIV integrase and reverse transcriptase, or for viral release from cells, like the influenza virus neuraminidase.^[77]

An important function of enzymes is in the digestive systems of animals. Enzymes such as amylases and proteases break down large molecules (starch or proteins, respectively) into smaller ones, so they can be absorbed by the intestines. Starch molecules, for example, are too large to be absorbed from the intestine, but enzymes hydrolyze the starch chains into smaller molecules such as maltose and eventually glucose, which can then be absorbed. Different enzymes digest different food substances. In ruminants, which have herbivorous diets, microorganisms in the gut produce another enzyme, cellulase, to break down the cellulose cell walls of plant fiber.^[78]

Metabolism

The metabolic pathway of glycolysis releases energy by converting glucose to pyruvate by via a series of intermediate metabolites. Each chemical modification (red box) is performed by a different enzyme.

Several enzymes can work together in a specific order, creating metabolic pathways.^[1]^:30.1 In a metabolic pathway, one enzyme takes the product of another enzyme as a substrate. After the catalytic reaction, the product is then passed on to another enzyme. Sometimes more than one enzyme can catalyze the same reaction in parallel; this can allow more complex regulation: with, for example, a low constant activity provided by one enzyme but an inducible high activity from a second enzyme.^[79]

Enzymes determine what steps occur in these pathways. Without enzymes, metabolism would neither progress through the same steps and could not be regulated to serve the needs of the cell. Most central metabolic pathways are regulated at a few key steps, typically through enzymes whose activity involves the hydrolysis of ATP. Because this reaction releases so much energy, other reactions that are thermodynamically unfavorable can be coupled to ATP hydrolysis, driving the overall series of linked metabolic reactions.^[1]^:30.1

Control of activity

There are five main ways that enzyme activity is controlled in the cell.^[1]^:30.1.1

Regulation

Enzymes can be either activated or inhibited by other molecules. For example, the end product(s) of a metabolic pathway are often inhibitors for one of the first enzymes of the pathway (usually the first irreversible step, called committed step), thus regulating the amount of end product made by the pathways. Such a regulatory mechanism is called a negative feedback mechanism, because the amount of the end product produced is regulated by its own concentration.^[80]^:141–48 Negative feedback mechanism can effectively adjust the rate of synthesis of intermediate metabolites according to the demands of the cells. This helps with effective allocations of materials and energy economy, and it prevents the excess manufacture of end products. Like other homeostatic devices, the control of enzymatic action helps to maintain a stable internal environment in living organisms.^[80]^:141

Post-translational modification

Examples of post-translational modification include phosphorylation, myristoylation and glycosylation.^[80]^:149–69 For example, in the response to insulin, the phosphorylation of multiple enzymes, including glycogen synthase, helps control the synthesis or degradation of glycogen and allows the cell to respond to changes in blood sugar.^[81] Another example of post-translational modification is the cleavage of the polypeptide chain. Chymotrypsin, a digestive protease, is produced in inactive form as chymotrypsinogen in the pancreas and transported in this form to the stomach where it is activated. This stops the enzyme from digesting the pancreas or other tissues before it enters the gut. This type of inactive precursor to an enzyme is known as a zymogen^[80]^:149–53 or proenzyme.

Quantity

Enzyme production (transcription and translation of enzyme genes) can be enhanced or diminished by a cell in response to changes in the cell's environment. This form of gene regulation is called enzyme induction. For example, bacteria may become resistant to antibiotics such as penicillin because enzymes called beta-lactamases are induced that hydrolyse the crucial beta-lactam ring within the penicillin molecule.^[82] Another example comes from enzymes in the liver called cytochrome P450 oxidases, which are important in drug metabolism. Induction or inhibition of these enzymes can cause drug interactions.^[83] Enzyme levels can also be regulated by changing the rate of enzyme degradation.^[1]^:30.1.1 The opposite of enzyme induction is enzyme repression.

Subcellular distribution

Enzymes can be compartmentalized, with different metabolic pathways occurring in different cellular compartments. For example, fatty acids are synthesized by one set of enzymes in the cytosol, endoplasmic reticulum and Golgi and used by a different set of enzymes as a source of energy in the mitochondrion, through β-oxidation.^[84] In addition, trafficking of the enzyme to different compartments may change the degree of protonation (cytoplasm neutral and lysosome acidic) or oxidative state [e.g., oxidized (periplasm) or reduced (cytoplasm)] which in turn affects enzyme activity.^[85]

Organ specialization

In multicellular eukaryotes, cells in different organs and tissues have different patterns of gene expression and therefore have different sets of enzymes (known as isozymes) available for metabolic reactions. This provides a mechanism for regulating the overall metabolism of the organism. For example, hexokinase, the first enzyme in the glycolysis pathway, has a specialized form called glucokinase expressed in the liver and pancreas that has a lower affinity for glucose yet is more sensitive to glucose concentration.^[86] This enzyme is involved in sensing blood sugar and regulating insulin production.^[87]

Involvement in disease

In phenylalanine hydroxylase over 300 different mutations throughout the structure cause phenylketonuria. Phenylalanine substrate and tetrahydrobiopterin coenzyme in black, and Fe²⁺ cofactor in yellow. (PDB: 1KW0)

Since the tight control of enzyme activity is essential for homeostasis, any malfunction (mutation, overproduction, underproduction or deletion) of a single critical enzyme can lead to a genetic disease. The malfunction of just one type of enzyme out of the thousands of types present in the human body can be fatal. An example of a fatal genetic disease due to enzyme insufficiency is Tay-Sachs disease, in which patients lack the enzyme hexosaminidase.^[88]^[89]

One example of enzyme deficiency is the most common type of phenylketonuria. Many different single amino acid mutations in the enzyme phenylalanine hydroxylase, which catalyzes the first step in the degradation of phenylalanine, result in build-up of phenylalanine and related products. Some mutations are in the active site, directly disrupting binding and catalysis, but many are far from the active site and reduce activity by destabilising the protein structure, or affecting correct oligomerisation.^[90]^[91] This can lead to intellectual disability if the disease is untreated.^[92] Another example is pseudocholinesterase deficiency, in which the body's ability to break down choline ester drugs is impaired.^[93] Oral administration of enzymes can be used to treat some functional enzyme deficiencies, such as pancreatic insufficiency^[94] and lactose intolerance.^[95]

Another way enzyme malfunctions can cause disease comes from germline mutations in genes coding for DNA repair enzymes. Defects in these enzymes cause cancer because cells are less able to repair mutations in their genomes. This causes a slow accumulation of mutations and results in the development of cancers. An example of such a hereditary cancer syndrome is xeroderma pigmentosum, which causes the development of skin cancers in response to even minimal exposure to ultraviolet light.^[96]^[97]

Industrial applications

Enzymes are used in the chemical industry and other industrial applications when extremely specific catalysts are required. Enzymes in general are limited in the number of reactions they have evolved to catalyze and also by their lack of stability in organic solvents and at high temperatures. As a consequence, protein engineering is an active area of research and involves attempts to create new enzymes with novel properties, either through rational design or in vitro evolution.^[98]^[99] These efforts have begun to be successful, and a few enzymes have now been designed "from scratch" to catalyze reactions that do not occur in nature.^[100]

Sunday, May 6, 2018

Restriction enzyme

From Wikipedia, the free encyclopedia

A restriction enzyme or restriction endonuclease is an enzyme that cleaves DNA into fragments at or near specific recognition sites within the molecule known as restriction sites.^[1]^[2]^[3] Restriction enzymes are commonly classified into five types, which differ in their structure and whether they cut their DNA substrate at their recognition site, or if the recognition and cleavage sites are separate from one another. To cut DNA, all restriction enzymes make two incisions, once through each sugar-phosphate backbone (i.e. each strand) of the DNA double helix.

These enzymes are found in bacteria and archaea and provide a defense mechanism against invading viruses.^[4]^[5] Inside a prokaryote, the restriction enzymes selectively cut up foreign DNA in a process called restriction; meanwhile, host DNA is protected by a modification enzyme (a methyltransferase) that modifies the prokaryotic DNA and blocks cleavage. Together, these two processes form the restriction modification system.^[6]

Over 3000 restriction enzymes have been studied in detail, and more than 600 of these are available commercially.^[7] These enzymes are routinely used for DNA modification in laboratories, and they are a vital tool in molecular cloning.^[8]^[9]^[10]

History

The term restriction enzyme originated from the studies of phage λ, a virus that infects bacteria, and the phenomenon of host-controlled restriction and modification of such bacterial phage or bacteriophage.^[11] The phenomenon was first identified in work done in the laboratories of Salvador Luria and Giuseppe Bertani in the early 1950s.^[12]^[13] It was found that, for a bacteriophage λ that can grow well in one strain of Escherichia coli, for example E. coli C, when grown in another strain, for example E. coli K, its yields can drop significantly, by as much as 3-5 orders of magnitude. The host cell, in this example E. coli K, is known as the restricting host and appears to have the ability to reduce the biological activity of the phage λ. If a phage becomes established in one strain, the ability of that phage to grow also becomes restricted in other strains. In the 1960s, it was shown in work done in the laboratories of Werner Arber and Matthew Meselson that the restriction is caused by an enzymatic cleavage of the phage DNA, and the enzyme involved was therefore termed a restriction enzyme.^[4]^[14]^[15]^[16]

The restriction enzymes studied by Arber and Meselson were type I restriction enzymes, which cleave DNA randomly away from the recognition site.^[17] In 1970, Hamilton O. Smith, Thomas Kelly and Kent Wilcox isolated and characterized the first type II restriction enzyme, HindII, from the bacterium Haemophilus influenzae.^[18]^[19] Restriction enzymes of this type are more useful for laboratory work as they cleave DNA at the site of their recognition sequence. Later, Daniel Nathans and Kathleen Danna showed that cleavage of simian virus 40 (SV40) DNA by restriction enzymes yields specific fragments that can be separated using polyacrylamide gel electrophoresis, thus showing that restriction enzymes can also be used for mapping DNA.^[20] For their work in the discovery and characterization of restriction enzymes, the 1978 Nobel Prize for Physiology or Medicine was awarded to Werner Arber, Daniel Nathans, and Hamilton O. Smith.^[21] The discovery of restriction enzymes allows DNA to be manipulated, leading to the development of recombinant DNA technology that has many applications, for example, allowing the large scale production of proteins such as human insulin used by diabetics.^[12]^[22]

Origins

Restriction enzymes likely evolved from a common ancestor and became widespread via horizontal gene transfer.^[23]^[24] In addition, there is mounting evidence that restriction endonucleases evolved as a selfish genetic element.^[25]

Recognition site

A palindromic recognition site reads the same on the reverse strand as it does on the forward strand when both are read in the same orientation

Restriction enzymes recognize a specific sequence of nucleotides^[2] and produce a double-stranded cut in the DNA. The recognition sequences can also be classified by the number of bases in its recognition site, usually between 4 and 8 bases, and the number of bases in the sequence will determine how often the site will appear by chance in any given genome, e.g., a 4-base pair sequence would theoretically occur once every 4^4 or 256bp, 6 bases, 4^6 or 4,096bp, and 8 bases would be 4^8 or 65,536bp.^[26] Many of them are palindromic, meaning the base sequence reads the same backwards and forwards.^[27] In theory, there are two types of palindromic sequences that can be possible in DNA. The mirror-like palindrome is similar to those found in ordinary text, in which a sequence reads the same forward and backward on a single strand of DNA, as in GTAATG. The inverted repeat palindrome is also a sequence that reads the same forward and backward, but the forward and backward sequences are found in complementary DNA strands (i.e., of double-stranded DNA), as in GTATAC (GTATAC being complementary to CATATG).^[28] Inverted repeat palindromes are more common and have greater biological importance than mirror-like palindromes.

EcoRI digestion produces "sticky" ends,

whereas SmaI restriction enzyme cleavage produces "blunt" ends:

Recognition sequences in DNA differ for each restriction enzyme, producing differences in the length, sequence and strand orientation (5' end or 3' end) of a sticky-end "overhang" of an enzyme restriction.^[29]

Different restriction enzymes that recognize the same sequence are known as neoschizomers. These often cleave in different locales of the sequence. Different enzymes that recognize and cleave in the same location are known as isoschizomers.

Types

Naturally occurring restriction endonucleases are categorized into four groups (Types I, II III, and IV) based on their composition and enzyme cofactor requirements, the nature of their target sequence, and the position of their DNA cleavage site relative to the target sequence.^[30]^[31]^[32] DNA sequence analyses of restriction enzymes however show great variations, indicating that there are more than four types.^[33] All types of enzymes recognize specific short DNA sequences and carry out the endonucleolytic cleavage of DNA to give specific fragments with terminal 5'-phosphates. They differ in their recognition sequence, subunit composition, cleavage position, and cofactor requirements,^[34]^[35] as summarised below:

Type I enzymes (EC 3.1.21.3) cleave at sites remote from a recognition site; require both ATP and S-adenosyl-L-methionine to function; multifunctional protein with both restriction and methylase (EC 2.1.1.72) activities.
Type II enzymes (EC 3.1.21.4) cleave within or at short specific distances from a recognition site; most require magnesium; single function (restriction) enzymes independent of methylase.
Type III enzymes (EC 3.1.21.5) cleave at sites a short distance from a recognition site; require ATP (but do not hydrolyse it); S-adenosyl-L-methionine stimulates the reaction but is not required; exist as part of a complex with a modification methylase (EC 2.1.1.72).
Type IV enzymes target modified DNA, e.g. methylated, hydroxymethylated and glucosyl-hydroxymethylated DNA

Type l

Type I restriction enzymes were the first to be identified and were first identified in two different strains (K-12 and B) of E. coli.^[36] These enzymes cut at a site that differs, and is a random distance (at least 1000 bp) away, from their recognition site. Cleavage at these random sites follows a process of DNA translocation, which shows that these enzymes are also molecular motors. The recognition site is asymmetrical and is composed of two specific portions—one containing 3–4 nucleotides, and another containing 4–5 nucleotides—separated by a non-specific spacer of about 6–8 nucleotides. These enzymes are multifunctional and are capable of both restriction and modification activities, depending upon the methylation status of the target DNA. The cofactors S-Adenosyl methionine (AdoMet), hydrolyzed adenosine triphosphate (ATP), and magnesium (Mg²⁺) ions, are required for their full activity. Type I restriction enzymes possess three subunits called HsdR, HsdM, and HsdS; HsdR is required for restriction; HsdM is necessary for adding methyl groups to host DNA (methyltransferase activity), and HsdS is important for specificity of the recognition (DNA-binding) site in addition to both restriction (DNA cleavage) and modification (DNA methyltransferase) activity.^[30]^[36]

Type II

Type II site-specific deoxyribonuclease

Structure of the homodimeric restriction enzyme EcoRI (cyan and green cartoon diagram) bound to double stranded DNA (brown tubes).^[37] Two catalytic magnesium ions (one from each monomer) are shown as magenta spheres and are adjacent to the cleaved sites in the DNA made by the enzyme (depicted as gaps in the DNA backbone).

Identifiers

Databases

PDB structures

Typical type II restriction enzymes differ from type I restriction enzymes in several ways. They form homodimers, with recognition sites that are usually undivided and palindromic and 4–8 nucleotides in length. They recognize and cleave DNA at the same site, and they do not use ATP or AdoMet for their activity—they usually require only Mg²⁺ as a cofactor.^[27] These enzymes cleave the phosphodiester bond of double helix DNA. It can either cleave at the center of both strands to yield a blunt end. Or it can cleave at a staggered position leaving overhangs called sticky ends.^[38] These are the most commonly available and used restriction enzymes. In the 1990s and early 2000s, new enzymes from this family were discovered that did not follow all the classical criteria of this enzyme class, and new subfamily nomenclature was developed to divide this large family into subcategories based on deviations from typical characteristics of type II enzymes.^[27] These subgroups are defined using a letter suffix.

Type IIB restriction enzymes (e.g., BcgI and BplI) are multimers, containing more than one subunit.^[27] They cleave DNA on both sides of their recognition to cut out the recognition site. They require both AdoMet and Mg²⁺ cofactors. Type IIE restriction endonucleases (e.g., NaeI) cleave DNA following interaction with two copies of their recognition sequence.^[27] One recognition site acts as the target for cleavage, while the other acts as an allosteric effector that speeds up or improves the efficiency of enzyme cleavage. Similar to type IIE enzymes, type IIF restriction endonucleases (e.g. NgoMIV) interact with two copies of their recognition sequence but cleave both sequences at the same time.^[27] Type IIG restriction endonucleases (e.g., Eco57I) do have a single subunit, like classical Type II restriction enzymes, but require the cofactor AdoMet to be active.^[27] Type IIM restriction endonucleases, such as DpnI, are able to recognize and cut methylated DNA.^[27] Type IIS restriction endonucleases (e.g., FokI) cleave DNA at a defined distance from their non-palindromic asymmetric recognition sites;^[27] this characteristic is widely used to perform in-vitro cloning techniques such as Golden Gate cloning. These enzymes may function as dimers. Similarly, Type IIT restriction enzymes (e.g., Bpu10I and BslI) are composed of two different subunits. Some recognize palindromic sequences while others have asymmetric recognition sites.^[27]

Type III

Type III restriction enzymes (e.g., EcoP15) recognize two separate non-palindromic sequences that are inversely oriented. They cut DNA about 20–30 base pairs after the recognition site.^[39] These enzymes contain more than one subunit and require AdoMet and ATP cofactors for their roles in DNA methylation and restriction, respectively.^[40] They are components of prokaryotic DNA restriction-modification mechanisms that protect the organism against invading foreign DNA. Type III enzymes are hetero-oligomeric, multifunctional proteins composed of two subunits, Res and Mod. The Mod subunit recognises the DNA sequence specific for the system and is a modification methyltransferase; as such, it is functionally equivalent to the M and S subunits of type I restriction endonuclease. Res is required for restriction, although it has no enzymatic activity on its own. Type III enzymes recognise short 5–6 bp-long asymmetric DNA sequences and cleave 25–27 bp downstream to leave short, single-stranded 5' protrusions. They require the presence of two inversely oriented unmethylated recognition sites for restriction to occur. These enzymes methylate only one strand of the DNA, at the N-6 position of adenosyl residues, so newly replicated DNA will have only one strand methylated, which is sufficient to protect against restriction. Type III enzymes belong to the beta-subfamily of N6 adenine methyltransferases, containing the nine motifs that characterise this family, including motif I, the AdoMet binding pocket (FXGXG), and motif IV, the catalytic region (S/D/N (PP) Y/F).^[34]^[41]

Type IV

Type IV enzymes recognize modified, typically methylated DNA and are exemplified by the McrBC and Mrr systems of E. coli.^[33]

Type V

Type V restriction enzymes (e.g., the cas9-gRNA complex from CRISPRs^[42]) utilize guide RNAs to target specific non-palindromic sequences found on invading organisms. They can cut DNA of variable length, provided that a suitable guide RNA is provided. The flexibility and ease of use of these enzymes make them promising for future genetic engineering applications.^[42]^[43]

Artificial restriction enzymes

Artificial restriction enzymes can be generated by fusing a natural or engineered DNA binding domain to a nuclease domain (often the cleavage domain of the type IIS restriction enzyme FokI).^[44] Such artificial restriction enzymes can target large DNA sites (up to 36 bp) and can be engineered to bind to desired DNA sequences.^[45] Zinc finger nucleases are the most commonly used artificial restriction enzymes and are generally used in genetic engineering applications,^[46]^[47]^[48]^[49] but can also be used for more standard gene cloning applications.^[50] Other artificial restriction enzymes are based on the DNA binding domain of TAL effectors.^[51]^[52]

In 2013, a new technology CRISPR-Cas9, based on a prokaryotic viral defense system, was engineered for editing the genome, and it was quickly adopted in laboratories.^[53] For more detail, read CRISPR (Clustered regularly interspaced short palindromic repeats).

In 2017 a group in Illinois announced using an Argonaute protein taken from Pyrococcus furiosus (PfAgo) along with guide DNA to edit DNA as artificial restriction enzymes.^[54]

Artificial ribonucleases that act as restriction enzymes for RNA are also being developed. A PNA-based system, called PNAzymes, has a Cu(II)-2,9-dimethylphenanthroline group that mimics ribonucleases for specific RNA sequence and cleaves at a non-base-paired region (RNA bulge) of the targeted RNA formed when the enzyme binds the RNA. This enzyme shows selectivity by cleaving only at one site that either does not have a mismatch or is kinetically preferred out of two possible cleavage sites.^[55]

Nomenclature

Derivation of the EcoRI name
Abbreviation	Meaning	Description
E	Escherichia	genus
co	coli	specific species
R	RY13	strain
I	First identified	order of identification in the bacterium

Since their discovery in the 1970s, many restriction enzymes have been identified; for example, more than 3500 different Type II restriction enzymes have been characterized.^[56] Each enzyme is named after the bacterium from which it was isolated, using a naming system based on bacterial genus, species and strain.^[57]^[58] For example, the name of the EcoRI restriction enzyme was derived as shown in the box.

Applications

Isolated restriction enzymes are used to manipulate DNA for different scientific applications.

They are used to assist insertion of genes into plasmid vectors during gene cloning and protein production experiments. For optimal use, plasmids that are commonly used for gene cloning are modified to include a short polylinker sequence (called the multiple cloning site, or MCS) rich in restriction enzyme recognition sequences. This allows flexibility when inserting gene fragments into the plasmid vector; restriction sites contained naturally within genes influence the choice of endonuclease for digesting the DNA, since it is necessary to avoid restriction of wanted DNA while intentionally cutting the ends of the DNA. To clone a gene fragment into a vector, both plasmid DNA and gene insert are typically cut with the same restriction enzymes, and then glued together with the assistance of an enzyme known as a DNA ligase.^[59]^[60]

Restriction enzymes can also be used to distinguish gene alleles by specifically recognizing single base changes in DNA known as single nucleotide polymorphisms (SNPs).^[61]^[62] This is however only possible if a SNP alters the restriction site present in the allele. In this method, the restriction enzyme can be used to genotype a DNA sample without the need for expensive gene sequencing. The sample is first digested with the restriction enzyme to generate DNA fragments, and then the different sized fragments separated by gel electrophoresis. In general, alleles with correct restriction sites will generate two visible bands of DNA on the gel, and those with altered restriction sites will not be cut and will generate only a single band. A DNA map by restriction digest can also be generated that can give the relative positions of the genes.^[63] The different lengths of DNA generated by restriction digest also produce a specific pattern of bands after gel electrophoresis, and can be used for DNA fingerprinting.

In a similar manner, restriction enzymes are used to digest genomic DNA for gene analysis by Southern blot. This technique allows researchers to identify how many copies (or paralogues) of a gene are present in the genome of one individual, or how many gene mutations (polymorphisms) have occurred within a population. The latter example is called restriction fragment length polymorphism (RFLP).^[64]

Artificial restriction enzymes created by linking the FokI DNA cleavage domain with an array of DNA binding proteins or zinc finger arrays, denoted zinc finger nucleases (ZFN), are a powerful tool for host genome editing due to their enhanced sequence specificity. ZFN work in pairs, their dimerization being mediated in-situ through the FokI domain. Each zinc finger array (ZFA) is capable of recognizing 9–12 base pairs, making for 18–24 for the pair. A 5–7 bp spacer between the cleavage sites further enhances the specificity of ZFN, making them a safe and more precise tool that can be applied in humans. A recent Phase I clinical trial of ZFN for the targeted abolition of the CCR5 co-receptor for HIV-1 has been undertaken.^[65]

Others have proposed using the bacteria R-M system as a model for devising human anti-viral gene or genomic vaccines and therapies since the RM system serves an innate defense-role in bacteria by restricting tropism by bacteriophages.^[66] There is research on REases and ZFN that can cleave the DNA of various human viruses, including HSV-2, high-risk HPVs and HIV-1, with the ultimate goal of inducing target mutagenesis and aberrations of human-infecting viruses.^[67]^[68]^[69] Interestingly, the human genome already contains remnants of retroviral genomes that have been inactivated and harnessed for self-gain. Indeed, the mechanisms for silencing active L1 genomic retroelements by the three prime repair exonuclease 1 (TREX1) and excision repair cross complementing 1(ERCC) appear to mimic the action of RM-systems in bacteria, and the non-homologous end-joining (NHEJ) that follows the use of ZFN without a repair template.^[70]^[71]

Examples

Examples of restriction enzymes include:^[72]

Enzyme	Source	Recognition Sequence	Cut
EcoRI	Escherichia coli	5'GAATTC 3'CTTAAG	5'---G AATTC---3' 3'---CTTAA G---5'
EcoRII	Escherichia coli	5'CCWGG 3'GGWCC	5'--- CCWGG---3' 3'---GGWCC ---5'
BamHI	Bacillus amyloliquefaciens	5'GGATCC 3'CCTAGG	5'---G GATCC---3' 3'---CCTAG G---5'
HindIII	Haemophilus influenzae	5'AAGCTT 3'TTCGAA	5'---A AGCTT---3' 3'---TTCGA A---5'
TaqI	Thermus aquaticus	5'TCGA 3'AGCT	5'---T CGA---3' 3'---AGC T---5'
NotI	Nocardia otitidis	5'GCGGCCGC 3'CGCCGGCG	5'---GC GGCCGC---3' 3'---CGCCGG CG---5'
HinFI	Haemophilus influenzae	5'GANTC 3'CTNAG	5'---G ANTC---3' 3'---CTNA G---5'
Sau3AI	Staphylococcus aureus	5'GATC 3'CTAG	5'--- GATC---3' 3'---CTAG ---5'
PvuII*	Proteus vulgaris	5'CAGCTG 3'GTCGAC	5'---CAG CTG---3' 3'---GTC GAC---5'
SmaI*	Serratia marcescens	5'CCCGGG 3'GGGCCC	5'---CCC GGG---3' 3'---GGG CCC---5'
HaeIII*	Haemophilus aegyptius	5'GGCC 3'CCGG	5'---GG CC---3' 3'---CC GG---5'
HgaI^[73]	Haemophilus gallinarum	5'GACGC 3'CTGCG	5'---NN NN---3' 3'---NN NN---5'
AluI*	Arthrobacter luteus	5'AGCT 3'TCGA	5'---AG CT---3' 3'---TC GA---5'
EcoRV*	Escherichia coli	5'GATATC 3'CTATAG	5'---GAT ATC---3' 3'---CTA TAG---5'
EcoP15I	Escherichia coli	5'CAGCAGN₂₅NN 3'GTCGTCN₂₅NN	5'---CAGCAGN₂₅ NN---3' 3'---GTCGTCN₂₅NN ---5'
KpnI^[74]	Klebsiella pneumoniae	5'GGTACC 3'CCATGG	5'---GGTAC C---3' 3'---C CATGG---5'
PstI^[74]	Providencia stuartii	5'CTGCAG 3'GACGTC	5'---CTGCA G---3' 3'---G ACGTC---5'
SacI^[74]	Streptomyces achromogenes	5'GAGCTC 3'CTCGAG	5'---GAGCT C---3' 3'---C TCGAG---5'
SalI^[74]	Streptomyces albus	5'GTCGAC 3'CAGCTG	5'---G TCGAC---3' 3'---CAGCT G---5'
ScaI*^[74]	Streptomyces caespitosus	5'AGTACT 3'TCATGA	5'---AGT ACT---3' 3'---TCA TGA---5'
SpeI	Sphaerotilus natans	5'ACTAGT 3'TGATCA	5'---A CTAGT---3' 3'---TGATC A---5'
SphI^[74]	Streptomyces phaeochromogenes	5'GCATGC 3'CGTACG	5'---GCATG C---3' 3'---C GTACG---5'
StuI*^[75]^[76]	Streptomyces tubercidicus	5'AGGCCT 3'TCCGGA	5'---AGG CCT---3' 3'---TCC GGA---5'
XbaI^[74]	Xanthomonas badrii	5'TCTAGA 3'AGATCT	5'---T CTAGA---3' 3'---AGATC T---5'

Key:
* = blunt ends
N = C or G or T or A
W = A or T

How the New Science of Computational History Is Changing the Study of the Past

Applying network theory to medieval records suggests that historical events are governed by “laws of history,” just as nature is bound by the laws of physics.

One of the curious features of network science is that the same networks underlie entirely different phenomena. As a result, these phenomena have deep similarities that are far from obvious at first glance. Good examples include the spread of disease, the size of forest fires, and even the distribution of earthquake magnitude, which all follow a similar pattern. This is a direct result of their sharing the same network structure.

So it’s usually no surprise that the same “laws” emerge when physicists find the same networks underlying other phenomena. Exactly this has happened repeatedly in the social sciences. Network science now allows social scientists to model societies, to study the way ideas, gossip, fashions, and so on flow through society—and even to study how this influences opinion.

To do this they’ve used the tools developed to study other disciplines. That’s why the new field of computational social science has become so powerful so quickly.

But there’s another field of endeavor that also stands to benefit: the study of history. Throughout history, humans have formed networks that have played a profound role in the way events have unfolded. Historians have recently begun to reconstruct these networks using historical sources such as correspondence and contemporary records.

Today, Johannes Preiser-Kapeller at the Austrian Academy of Science in Vienna explains how this approach is casting a new light on various historical events. Indeed, the work has uncovered previously unknown patterns in the way history unfolds. In the same way that patterns in nature reveal the laws of physics, these discoveries are revealing the first laws of history.

Preiser-Kapeller has focused on medieval conflicts and particularly those relating to the Byzantine Empire in the 14th century, which was concentrated around Constantinople, a link between European and Asian trade networks. This was a period of significant conflict because of changing political forces, the plague, and climate change caused by a small ice age during the Middle Ages.

Preiser-Kapeller has reconstructed the political networks that existed at the time using surviving correspondence and other historical records. In these networks, each influential individual is a node, and links are drawn between those who share significant relationships. To be registered on the network, these links have to be recorded in correspondence with phrases such as My noble aunt or My imperial cousin. He also records how these change over time.

Using standard algorithms to study various measures of network structure, Preiser-Kapeller found clusters within the network, identified the most important actors in a network, and examined how individuals clustered around others who were similar in some way.

How these measures change over time turns out to have an important link to the major events that unfolded later. For example, Preiser-Kapeller says, the fragmentation of the political network created the conditions for a civil war that permanently weakened the Byzantine Empire. It ultimately collapsed in 1453.

These changes also followed some interesting patterns. “The distribution of frequencies of the number of conflict ties activated in a year tends to follow a power law,” says Preiser-Kapeller. Exactly the same power-law patterns emerge when complexity scientists study the size distribution of wars, epidemics, and religions.

An interesting question is whether the same patterns turn up elsewhere in history. To find out, he compared the Byzantium network with those from five other periods of medieval conflict in Europe, Africa, and Asia.

And the results make for interesting reading. “On average across all five polities, a change of ruler in one year increased the probability for another change in the following year threefold,” says Preiser-Kapeller. So the closer you are to an upheaval, the more likely there is to be another one soon. Or in other words, upheavals tend to cluster together.

That’s a rule that should sound familiar to geophysicists. A similar phenomenon exists in earthquake records: the more recent a big earthquake, the greater the likelihood of another big one soon. This is known as Omori’s law—that earthquakes tend to cluster together.

It’s no surprise that similar effects arise in these systems, since they are both governed by the same network science. Historians would be well within their rights to adopt this and other patterns as “laws of history.”

These laws are ripe for further study. While the complexity that arises from network theory in many areas of science has been studied for decades, there has been almost no such research in the field of history. That suggests there is low-hanging fruit to be had by the first generation of computational historians, like Preiser-Kapeller. Expect to hear more about it the near future.

Ref: arxiv.org/abs/1606.03433 : Calculating the Middle Ages? The Project “Complexities and Networks in the Medieval Mediterranean and the Near East”

Search This Blog