A Medley of Potpourri

Friday, August 22, 2025

Protein folding

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Protein_folding

Protein folding is the physical process by which a protein, after synthesis by a ribosome as a linear chain of amino acids, changes from an unstable random coil into a more ordered three-dimensional structure. This structure permits the protein to become biologically functional or active.

The folding of many proteins begins even during the translation of the polypeptide chain. The amino acids interact with each other to produce a well-defined three-dimensional structure, known as the protein's native state. This structure is determined by the amino-acid sequence or primary structure.

The correct three-dimensional structure is essential to function, although some parts of functional proteins may remain unfolded, indicating that protein dynamics are important. Failure to fold into a native structure generally produces inactive proteins, but in some instances, misfolded proteins have modified or toxic functionality. Several neurodegenerative and other diseases are believed to result from the accumulation of amyloid fibrils formed by misfolded proteins, the infectious varieties of which are known as prions. Many allergies are caused by the incorrect folding of some proteins because the immune system does not produce the antibodies for certain protein structures.

Denaturation of proteins is a process of transition from a folded to an unfolded state. It happens in cooking, burns, proteinopathies, and other contexts. Residual structure present, if any, in the supposedly unfolded state may form a folding initiation site and guide the subsequent folding reactions.

The duration of the folding process varies dramatically depending on the protein of interest. When studied outside the cell, the slowest folding proteins require many minutes or hours to fold, primarily due to proline isomerization, and must pass through a number of intermediate states, like checkpoints, before the process is complete. On the other hand, very small single-domain proteins with lengths of up to a hundred amino acids typically fold in a single step. Time scales of milliseconds are the norm, and the fastest known protein folding reactions are complete within a few microseconds. The folding time scale of a protein depends on its size, contact order, and circuit topology.

Understanding and simulating the protein folding process has been an important challenge for computational biology since the late 1960s.

Process of protein folding

Primary structure

The primary structure of a protein, its linear amino-acid sequence, determines its native conformation. The specific amino acid residues and their position in the polypeptide chain are the determining factors for which portions of the protein fold closely together and form its three-dimensional conformation. The amino acid composition is not as important as the sequence. The essential fact of folding, however, remains that the amino acid sequence of each protein contains the information that specifies both the native structure and the pathway to attain that state. This is not to say that nearly identical amino acid sequences always fold similarly. Conformations differ based on environmental factors as well; similar proteins fold differently based on where they are found.

Secondary structure

An anti-parallel beta pleated sheet displaying hydrogen bonding within the backbone

Formation of a secondary structure is the first step in the folding process that a protein takes to assume its native structure. Characteristic of secondary structure are the structures known as alpha helices and beta sheets that fold rapidly because they are stabilized by intramolecular hydrogen bonds, as was first characterized by Linus Pauling. Formation of intramolecular hydrogen bonds provides another important contribution to protein stability. α-helices are formed by hydrogen bonding of the backbone to form a spiral shape (refer to figure on the right). The β pleated sheet is a structure that forms with the backbone bending over itself to form the hydrogen bonds (as displayed in the figure to the left). The hydrogen bonds are between the amide hydrogen and carbonyl oxygen of the peptide bond. There exists anti-parallel β pleated sheets and parallel β pleated sheets where the stability of the hydrogen bonds is stronger in the anti-parallel β sheet as it hydrogen bonds with the ideal 180 degree angle compared to the slanted hydrogen bonds formed by parallel sheets.

Tertiary structure

The α-Helices and β-Sheets are commonly amphipathic, meaning they have a hydrophilic and a hydrophobic portion. This ability helps in forming tertiary structure of a protein in which folding occurs so that the hydrophilic sides are facing the aqueous environment surrounding the protein and the hydrophobic sides are facing the hydrophobic core of the protein. Secondary structure hierarchically gives way to tertiary structure formation. Once the protein's tertiary structure is formed and stabilized by the hydrophobic interactions, there may also be covalent bonding in the form of disulfide bridges formed between two cysteine residues. These non-covalent and covalent contacts take a specific topological arrangement in a native structure of a protein. Tertiary structure of a protein involves a single polypeptide chain; however, additional interactions of folded polypeptide chains give rise to quaternary structure formation.

Quaternary structure

Tertiary structure may give way to the formation of quaternary structure in some proteins, which usually involves the "assembly" or "coassembly" of subunits that have already folded; in other words, multiple polypeptide chains could interact to form a fully functional quaternary protein.

Driving forces of protein folding

Folding is a spontaneous process that is mainly guided by hydrophobic interactions, formation of intramolecular hydrogen bonds, van der Waals forces, and it is opposed by conformational entropy. The folding time scale of an isolated protein depends on its size, contact order, and circuit topology. Inside cells, the process of folding often begins co-translationally, so that the N-terminus of the protein begins to fold while the C-terminal portion of the protein is still being synthesized by the ribosome; however, a protein molecule may fold spontaneously during or after biosynthesis. While these macromolecules may be regarded as "folding themselves", the process also depends on the solvent (water or lipid bilayer), the concentration of salts, the pH, the temperature, the possible presence of cofactors and of molecular chaperones.

Proteins will have limitations on their folding abilities by the restricted bending angles or conformations that are possible. These allowable angles of protein folding are described with a two-dimensional plot known as the Ramachandran plot, depicted with psi and phi angles of allowable rotation.

Hydrophobic effect

Protein folding must be thermodynamically favorable within a cell in order for it to be a spontaneous reaction. Since it is known that protein folding is a spontaneous reaction, then it must assume a negative Gibbs free energy value. Gibbs free energy in protein folding is directly related to enthalpy and entropy. For a negative delta G to arise and for protein folding to become thermodynamically favorable, then either enthalpy, entropy, or both terms must be favorable.

Minimizing the number of hydrophobic side-chains exposed to water is an important driving force behind the folding process. The hydrophobic effect is the phenomenon in which the hydrophobic chains of a protein collapse into the core of the protein (away from the hydrophilic environment). In an aqueous environment, the water molecules tend to aggregate around the hydrophobic regions or side chains of the protein, creating water shells of ordered water molecules. An ordering of water molecules around a hydrophobic region increases order in a system and therefore contributes a negative change in entropy (less entropy in the system). The water molecules are fixed in these water cages which drives the hydrophobic collapse, or the inward folding of the hydrophobic groups. The hydrophobic collapse introduces entropy back to the system via the breaking of the water cages which frees the ordered water molecules. The multitude of hydrophobic groups interacting within the core of the globular folded protein contributes a significant amount to protein stability after folding, because of the vastly accumulated van der Waals forces (specifically London Dispersion forces). The hydrophobic effect exists as a driving force in thermodynamics only if there is the presence of an aqueous medium with an amphiphilic molecule containing a large hydrophobic region. The strength of hydrogen bonds depends on their environment; thus, H-bonds enveloped in a hydrophobic core contribute more than H-bonds exposed to the aqueous environment to the stability of the native state.

In proteins with globular folds, hydrophobic amino acids tend to be interspersed along the primary sequence, rather than randomly distributed or clustered together. However, proteins that have recently been born de novo, which tend to be intrinsically disordered, show the opposite pattern of hydrophobic amino acid clustering along the primary sequence.

Chaperones

Molecular chaperones are a class of proteins that aid in the correct folding of other proteins in vivo. Chaperones exist in all cellular compartments and interact with the polypeptide chain in order to allow the native three-dimensional conformation of the protein to form; however, chaperones themselves are not included in the final structure of the protein they are assisting in. Chaperones may assist in folding even when the nascent polypeptide is being synthesized by the ribosome. Molecular chaperones operate by binding to stabilize an otherwise unstable structure of a protein in its folding pathway, but chaperones do not contain the necessary information to know the correct native structure of the protein they are aiding; rather, chaperones work by preventing incorrect folding conformations. In this way, chaperones do not actually increase the rate of individual steps involved in the folding pathway toward the native structure; instead, they work by reducing possible unwanted aggregations of the polypeptide chain that might otherwise slow down the search for the proper intermediate and they provide a more efficient pathway for the polypeptide chain to assume the correct conformations. Chaperones are not to be confused with folding catalyst proteins, which catalyze chemical reactions responsible for slow steps in folding pathways. Examples of folding catalysts are protein disulfide isomerases and peptidyl-prolyl isomerases that may be involved in formation of disulfide bonds or interconversion between cis and trans stereoisomers of peptide group. Chaperones are shown to be critical in the process of protein folding in vivo because they provide the protein with the aid needed to assume its proper alignments and conformations efficiently enough to become "biologically relevant". This means that the polypeptide chain could theoretically fold into its native structure without the aid of chaperones, as demonstrated by protein folding experiments conducted in vitro; however, this process proves to be too inefficient or too slow to exist in biological systems; therefore, chaperones are necessary for protein folding in vivo. Along with its role in aiding native structure formation, chaperones are shown to be involved in various roles such as protein transport, degradation, and even allow denatured proteins exposed to certain external denaturant factors an opportunity to refold into their correct native structures.

A fully denatured protein lacks both tertiary and secondary structure, and exists as a so-called random coil. Under certain conditions some proteins can refold; however, in many cases, denaturation is irreversible. Cells sometimes protect their proteins against the denaturing influence of heat with enzymes known as heat shock proteins (a type of chaperone), which assist other proteins both in folding and in remaining folded. Heat shock proteins have been found in all species examined, from bacteria to humans, suggesting that they evolved very early and have an important function. Some proteins never fold in cells at all except with the assistance of chaperones which either isolate individual proteins so that their folding is not interrupted by interactions with other proteins or help to unfold misfolded proteins, allowing them to refold into the correct native structure. This function is crucial to prevent the risk of precipitation into insoluble amorphous aggregates. The external factors involved in protein denaturation or disruption of the native state include temperature, external fields (electric, magnetic), molecular crowding, and even the limitation of space (i.e. confinement), which can have a big influence on the folding of proteins. High concentrations of solutes, extremes of pH, mechanical forces, and the presence of chemical denaturants can contribute to protein denaturation, as well. These individual factors are categorized together as stresses. Chaperones are shown to exist in increasing concentrations during times of cellular stress and help the proper folding of emerging proteins as well as denatured or misfolded ones.

Under some conditions proteins will not fold into their biochemically functional forms. Temperatures above or below the range that cells tend to live in will cause thermally unstable proteins to unfold or denature (this is why boiling makes an egg white turn opaque). Protein thermal stability is far from constant, however; for example, hyperthermophilic bacteria have been found that grow at temperatures as high as 122 °C, which of course requires that their full complement of vital proteins and protein assemblies be stable at that temperature or above.

The bacterium E. coli is the host for bacteriophage T4, and the phage encoded gp31 protein (P17313) appears to be structurally and functionally homologous to E. coli chaperone protein GroES and able to substitute for it in the assembly of bacteriophage T4 virus particles during infection. Like GroES, gp31 forms a stable complex with GroEL chaperonin that is absolutely necessary for the folding and assembly in vivo of the bacteriophage T4 major capsid protein gp23.

Fold switching

Some proteins have multiple native structures, and change their fold based on some external factors. For example, the KaiB protein switches fold throughout the day, acting as a clock for cyanobacteria. It has been estimated that around 0.5–4% of PDB (Protein Data Bank) proteins switch folds.

Protein misfolding and neurodegenerative disease

A protein is considered to be misfolded if it cannot achieve its normal native state. This can be due to mutations in the amino acid sequence or a disruption of the normal folding process by external factors. The misfolded protein typically contains β-sheets that are organized in a supramolecular arrangement known as a cross-β structure. These β-sheet-rich assemblies are very stable, very insoluble, and generally resistant to proteolysis. The structural stability of these fibrillar assemblies is caused by extensive interactions between the protein monomers, formed by backbone hydrogen bonds between their β-strands. The misfolding of proteins can trigger the further misfolding and accumulation of other proteins into aggregates or oligomers. The increased levels of aggregated proteins in the cell leads to formation of amyloid-like structures which can cause degenerative disorders and cell death. The amyloids are fibrillary structures that contain intermolecular hydrogen bonds which are highly insoluble and made from converted protein aggregates. Therefore, the proteasome pathway may not be efficient enough to degrade the misfolded proteins prior to aggregation. Misfolded proteins can interact with one another and form structured aggregates and gain toxicity through intermolecular interactions.

Aggregated proteins are associated with prion-related illnesses such as Creutzfeldt–Jakob disease, bovine spongiform encephalopathy (mad cow disease), amyloid-related illnesses such as Alzheimer's disease and familial amyloid cardiomyopathy or polyneuropathy, as well as intracellular aggregation diseases such as Huntington's and Parkinson's disease. These age onset degenerative diseases are associated with the aggregation of misfolded proteins into insoluble, extracellular aggregates and/or intracellular inclusions including cross-β amyloid fibrils. It is not completely clear whether the aggregates are the cause or merely a reflection of the loss of protein homeostasis, the balance between synthesis, folding, aggregation and protein turnover. Recently the European Medicines Agency approved the use of Tafamidis or Vyndaqel (a kinetic stabilizer of tetrameric transthyretin) for the treatment of transthyretin amyloid diseases. This suggests that the process of amyloid fibril formation (and not the fibrils themselves) causes the degeneration of post-mitotic tissue in human amyloid diseases. Misfolding and excessive degradation instead of folding and function leads to a number of proteopathy diseases such as antitrypsin-associated emphysema, cystic fibrosis and the lysosomal storage diseases, where loss of function is the origin of the disorder. While protein replacement therapy has historically been used to correct the latter disorders, an emerging approach is to use pharmaceutical chaperones to fold mutated proteins to render them functional.

Experimental techniques for studying protein folding

While inferences about protein folding can be made through mutation studies, typically, experimental techniques for studying protein folding rely on the gradual unfolding or folding of proteins and observing conformational changes using standard non-crystallographic techniques.

X-ray crystallography

X-ray crystallography is one of the more efficient and important methods for attempting to decipher the three dimensional configuration of a folded protein. To be able to conduct X-ray crystallography, the protein under investigation must be located inside a crystal lattice. To place a protein inside a crystal lattice, one must have a suitable solvent for crystallization, obtain a pure protein at supersaturated levels in solution, and precipitate the crystals in solution. Once a protein is crystallized, X-ray beams can be concentrated through the crystal lattice which would diffract the beams or shoot them outwards in various directions. These exiting beams are correlated to the specific three-dimensional configuration of the protein enclosed within. The X-rays specifically interact with the electron clouds surrounding the individual atoms within the protein crystal lattice and produce a discernible diffraction pattern. Only by relating the electron density clouds with the amplitude of the X-rays can this pattern be read and lead to assumptions of the phases or phase angles involved that complicate this method. Without the relation established through a mathematical basis known as Fourier transform, the "phase problem" would render predicting the diffraction patterns very difficult. Emerging methods like multiple isomorphous replacement use the presence of a heavy metal ion to diffract the X-rays into a more predictable manner, reducing the number of variables involved and resolving the phase problem.

Fluorescence spectroscopy

Fluorescence spectroscopy is a highly sensitive method for studying the folding state of proteins. Three amino acids, phenylalanine (Phe), tyrosine (Tyr) and tryptophan (Trp), have intrinsic fluorescence properties, but only Tyr and Trp are used experimentally because their quantum yields are high enough to give good fluorescence signals. Both Trp and Tyr are excited by a wavelength of 280 nm, whereas only Trp is excited by a wavelength of 295 nm. Because of their aromatic character, Trp and Tyr residues are often found fully or partially buried in the hydrophobic core of proteins, at the interface between two protein domains, or at the interface between subunits of oligomeric proteins. In this apolar environment, they have high quantum yields and therefore high fluorescence intensities. Upon disruption of the protein's tertiary or quaternary structure, these side chains become more exposed to the hydrophilic environment of the solvent, and their quantum yields decrease, leading to low fluorescence intensities. For Trp residues, the wavelength of their maximal fluorescence emission also depend on their environment.

Fluorescence spectroscopy can be used to characterize the equilibrium unfolding of proteins by measuring the variation in the intensity of fluorescence emission or in the wavelength of maximal emission as functions of a denaturant value. The denaturant can be a chemical molecule (urea, guanidinium hydrochloride), temperature, pH, pressure, etc. The equilibrium between the different but discrete protein states, i.e. native state, intermediate states, unfolded state, depends on the denaturant value; therefore, the global fluorescence signal of their equilibrium mixture also depends on this value. One thus obtains a profile relating the global protein signal to the denaturant value. The profile of equilibrium unfolding may enable one to detect and identify intermediates of unfolding. General equations have been developed by Hugues Bedouelle to obtain the thermodynamic parameters that characterize the unfolding equilibria for homomeric or heteromeric proteins, up to trimers and potentially tetramers, from such profiles. Fluorescence spectroscopy can be combined with fast-mixing devices such as stopped flow, to measure protein folding kinetics, generate a chevron plot and derive a Phi value analysis.

Circular dichroism

Circular dichroism is one of the most general and basic tools to study protein folding. Circular dichroism spectroscopy measures the absorption of circularly polarized light. In proteins, structures such as alpha helices and beta sheets are chiral, and thus absorb such light. The absorption of this light acts as a marker of the degree of foldedness of the protein ensemble. This technique has been used to measure equilibrium unfolding of the protein by measuring the change in this absorption as a function of denaturant concentration or temperature. A denaturant melt measures the free energy of unfolding as well as the protein's m value, or denaturant dependence. A temperature melt measures the denaturation temperature (Tm) of the protein. As for fluorescence spectroscopy, circular-dichroism spectroscopy can be combined with fast-mixing devices such as stopped flow to measure protein folding kinetics and to generate chevron plots.

Vibrational circular dichroism of proteins

The more recent developments of vibrational circular dichroism (VCD) techniques for proteins, currently involving Fourier transform (FT) instruments, provide powerful means for determining protein conformations in solution even for very large protein molecules. Such VCD studies of proteins can be combined with X-ray diffraction data for protein crystals, FT-IR data for protein solutions in heavy water (D₂O), or quantum computations.

Protein nuclear magnetic resonance spectroscopy

Protein nuclear magnetic resonance (NMR) is able to collect protein structural data by inducing a magnet field through samples of concentrated protein. In NMR, depending on the chemical environment, certain nuclei will absorb specific radio-frequencies. Because protein structural changes operate on a time scale from ns to ms, NMR is especially equipped to study intermediate structures in timescales of ps to s. Some of the main techniques for studying proteins structure and non-folding protein structural changes include COSY, TOCSY, HSQC, time relaxation (T1 & T2), and NOE. NOE is especially useful because magnetization transfers can be observed between spatially proximal hydrogens are observed. Different NMR experiments have varying degrees of timescale sensitivity that are appropriate for different protein structural changes. NOE can pick up bond vibrations or side chain rotations, however, NOE is too sensitive to pick up protein folding because it occurs at larger timescale.

Because protein folding takes place in about 50 to 3000 s⁻¹ CPMG Relaxation dispersion and chemical exchange saturation transfer have become some of the primary techniques for NMR analysis of folding. In addition, both techniques are used to uncover excited intermediate states in the protein folding landscape. To do this, CPMG Relaxation dispersion takes advantage of the spin echo phenomenon. This technique exposes the target nuclei to a 90 pulse followed by one or more 180 pulses. As the nuclei refocus, a broad distribution indicates the target nuclei is involved in an intermediate excited state. By looking at Relaxation dispersion plots the data collect information on the thermodynamics and kinetics between the excited and ground. Saturation Transfer measures changes in signal from the ground state as excited states become perturbed. It uses weak radio frequency irradiation to saturate the excited state of a particular nuclei which transfers its saturation to the ground state. This signal is amplified by decreasing the magnetization (and the signal) of the ground state.

The main limitations in NMR is that its resolution decreases with proteins that are larger than 25 kDa and is not as detailed as X-ray crystallography. Additionally, protein NMR analysis is quite difficult and can propose multiple solutions from the same NMR spectrum.

In a study focused on the folding of an amyotrophic lateral sclerosis involved protein SOD1, excited intermediates were studied with relaxation dispersion and Saturation transfer. SOD1 had been previously tied to many disease causing mutants which were assumed to be involved in protein aggregation, however the mechanism was still unknown. By using Relaxation Dispersion and Saturation Transfer experiments many excited intermediate states were uncovered misfolding in the SOD1 mutants.

Dual-polarization interferometry

Dual polarisation interferometry is a surface-based technique for measuring the optical properties of molecular layers. When used to characterize protein folding, it measures the conformation by determining the overall size of a monolayer of the protein and its density in real time at sub-Angstrom resolution, although real-time measurement of the kinetics of protein folding are limited to processes that occur slower than ~10 Hz. Similar to circular dichroism, the stimulus for folding can be a denaturant or temperature.

Studies of folding with high time resolution

The study of protein folding has been greatly advanced in recent years by the development of fast, time-resolved techniques. Experimenters rapidly trigger the folding of a sample of unfolded protein and observe the resulting dynamics. Fast techniques in use include neutron scattering, ultrafast mixing of solutions, photochemical methods, and laser temperature jump spectroscopy. Among the many scientists who have contributed to the development of these techniques are Jeremy Cook, Heinrich Roder, Terry Oas, Harry Gray, Martin Gruebele, Brian Dyer, William Eaton, Sheena Radford, Chris Dobson, Alan Fersht, Bengt Nölting and Lars Konermann.

Proteolysis

Proteolysis is routinely used to probe the fraction unfolded under a wide range of solution conditions (e.g. fast parallel proteolysis (FASTpp).

Single-molecule force spectroscopy

Single molecule techniques such as optical tweezers and AFM have been used to understand protein folding mechanisms of isolated proteins as well as proteins with chaperones. Optical tweezers have been used to stretch single protein molecules from their C- and N-termini and unfold them to allow study of the subsequent refolding. The technique allows one to measure folding rates at single-molecule level; for example, optical tweezers have been recently applied to study folding and unfolding of proteins involved in blood coagulation. von Willebrand factor (vWF) is a protein with an essential role in blood clot formation process. It discovered – using single molecule optical tweezers measurement – that calcium-bound vWF acts as a shear force sensor in the blood. Shear force leads to unfolding of the A2 domain of vWF, whose refolding rate is dramatically enhanced in the presence of calcium. Recently, it was also shown that the simple src SH3 domain accesses multiple unfolding pathways under force.

Biotin painting

Biotin painting enables condition-specific cellular snapshots of (un)folded proteins. Biotin 'painting' shows a bias towards predicted Intrinsically disordered proteins.

Computational studies of protein folding

Computational studies of protein folding includes three main aspects related to the prediction of protein stability, kinetics, and structure. A 2013 review summarizes the available computational methods for protein folding.

Levinthal's paradox

In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible conformations. An estimate of 3³⁰⁰ or 10¹⁴³ was made in one of his papers. Levinthal's paradox is a thought experiment based on the observation that if a protein were folded by sequential sampling of all possible conformations, it would take an astronomical amount of time to do so, even if the conformations were sampled at a rapid rate (on the nanosecond or picosecond scale). Based upon the observation that proteins fold much faster than this, Levinthal then proposed that a random conformational search does not occur, and the protein must, therefore, fold through a series of meta-stable intermediate states.

Energy landscape of protein folding

The configuration space of a protein during folding can be visualized as an energy landscape. According to Joseph Bryngelson and Peter Wolynes, proteins follow the principle of minimal frustration, meaning that naturally evolved proteins have optimized their folding energy landscapes, and that nature has chosen amino acid sequences so that the folded state of the protein is sufficiently stable. In addition, the acquisition of the folded state had to become a sufficiently fast process. Even though nature has reduced the level of frustration in proteins, some degree of it remains up to now as can be observed in the presence of local minima in the energy landscape of proteins.

A consequence of these evolutionarily selected sequences is that proteins are generally thought to have globally "funneled energy landscapes" (a term coined by José Onuchic) that are largely directed toward the native state. This "folding funnel" landscape allows the protein to fold to the native state through any of a large number of pathways and intermediates, rather than being restricted to a single mechanism. The theory is supported by both computational simulations of model proteins and experimental studies, and it has been used to improve methods for protein structure prediction and design. The description of protein folding by the leveling free-energy landscape is also consistent with the 2nd law of thermodynamics. Physically, thinking of landscapes in terms of visualizable potential or total energy surfaces simply with maxima, saddle points, minima, and funnels, rather like geographic landscapes, is perhaps a little misleading. The relevant description is really a high-dimensional phase space in which manifolds might take a variety of more complicated topological forms.

The unfolded polypeptide chain begins at the top of the funnel where it may assume the largest number of unfolded variations and is in its highest energy state. Energy landscapes such as these indicate that there are a large number of initial possibilities, but only a single native state is possible; however, it does not reveal the numerous folding pathways that are possible. A different molecule of the same exact protein may be able to follow marginally different folding pathways, seeking different lower energy intermediates, as long as the same native structure is reached. Different pathways may have different frequencies of utilization depending on the thermodynamic favorability of each pathway. This means that if one pathway is found to be more thermodynamically favorable than another, it is likely to be used more frequently in the pursuit of the native structure. As the protein begins to fold and assume its various conformations, it always seeks a more thermodynamically favorable structure than before and thus continues through the energy funnel. Formation of secondary structures is a strong indication of increased stability within the protein, and only one combination of secondary structures assumed by the polypeptide backbone will have the lowest energy and therefore be present in the native state of the protein. Among the first structures to form once the polypeptide begins to fold are alpha helices and beta turns, where alpha helices can form in as little as 100 nanoseconds and beta turns in 1 microsecond.

There exists a saddle point in the energy funnel landscape where the transition state for a particular protein is found. The transition state in the energy funnel diagram is the conformation that must be assumed by every molecule of that protein if the protein wishes to finally assume the native structure. No protein may assume the native structure without first passing through the transition state. The transition state can be referred to as a variant or premature form of the native state rather than just another intermediary step. The folding of the transition state is shown to be rate-determining, and even though it exists in a higher energy state than the native fold, it greatly resembles the native structure. Within the transition state, there exists a nucleus around which the protein is able to fold, formed by a process referred to as "nucleation condensation" where the structure begins to collapse onto the nucleus.

Modeling of protein folding

De novo or ab initio techniques for computational protein structure prediction can be used for simulating various aspects of protein folding. Molecular dynamics (MD) was used in simulations of protein folding and dynamics in silico. First equilibrium folding simulations were done using implicit solvent model and umbrella sampling. Because of computational cost, ab initio MD folding simulations with explicit water are limited to peptides and small proteins. MD simulations of larger proteins remain restricted to dynamics of the experimental structure or its high-temperature unfolding. Long-time folding processes (beyond about 1 millisecond), like folding of larger proteins (>150 residues) can be accessed using coarse-grained models.

Several large-scale computational projects, such as Rosetta@home, Folding@home and Foldit, target protein folding.

Long continuous-trajectory simulations have been performed on Anton, a massively parallel supercomputer designed and built around custom ASICs and interconnects by D. E. Shaw Research. The longest published result of a simulation performed using Anton as of 2011 was a 2.936 millisecond simulation of NTL9 at 355 K. Such simulations are currently able to unfold and refold small proteins (<150 amino acids residues) in equilibrium and predict how mutations affect folding kinetics and stability.

In 2020 a team of researchers that used AlphaFold, an artificial intelligence (AI) protein structure prediction program developed by DeepMind placed first in CASP, a long-standing structure prediction contest. The team achieved a level of accuracy much higher than any other group. It scored above 90% for around two-thirds of the proteins in CASP's global distance test (GDT), a test that measures the degree of similarity between the structure predicted by a computational program, and the empirical structure determined experimentally in a lab. A score of 100 is considered a complete match, within the distance cutoff used for calculating GDT.

AlphaFold's protein structure prediction results at CASP were described as "transformational" and "astounding". Some researchers noted that the accuracy is not high enough for a third of its predictions, and that it does not reveal the physical mechanism of protein folding for the protein folding problem to be considered solved. Nevertheless, it is considered a significant achievement in computational biology and great progress towards a decades-old grand challenge of biology, predicting the structure of proteins.

Molecular machine

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Molecular_machine

Molecular machines are a class of molecules typically described as an assembly of a discrete number of molecular components intended to produce mechanical movements in response to specific stimuli, mimicking macromolecular devices such as switches and motors. Naturally occurring or biological molecular machines are responsible for vital living processes such as DNA replication and ATP synthesis. Kinesins and ribosomes are examples of molecular machines, and they often take the form of multi-protein complexes. For the last several decades, scientists have attempted, with varying degrees of success, to miniaturize machines found in the macroscopic world. The first example of an artificial molecular machine (AMM) was reported in 1994, featuring a rotaxane with a ring and two different possible binding sites. In 2016 the Nobel Prize in Chemistry was awarded to Jean-Pierre Sauvage, Sir J. Fraser Stoddart, and Bernard L. Feringa for the design and synthesis of molecular machines.

AMMs have diversified rapidly over the past few decades. A major point is to exploit existing motion in proteins, such as rotation about single bonds or cis-trans isomerization. Different AMMs are produced by introducing various functionalities, such as the introduction of bistability to create switches. A broad range of AMMs has been designed, featuring different properties and applications; some of these include molecular motors, switches, and logic gates. A wide range of applications have been demonstrated for AMMs, including those integrated into polymeric, liquid crystal, and crystalline systems for varied functions (such as materials research, homogenous catalysis and surface chemistry).

Terminology

Several definitions describe a "molecular machine" as a class of molecules typically described as an assembly of a discrete number of molecular components intended to produce mechanical movements in response to specific stimuli. The expression is often more generally applied to molecules that simply mimic functions that occur at the macroscopic level. A few prime requirements for a molecule to be considered a "molecular machine" are: the presence of moving parts, the ability to consume energy, and the ability to perform a task. Molecular machines differ from other stimuli-responsive compounds that can produce motion (such as cis-trans isomers) in their relatively larger amplitude of movement (potentially due to chemical reactions) and the presence of a clear external stimulus to regulate the movements (as compared to random thermal motion). Piezoelectric, magnetostrictive, and other materials that produce a movement due to external stimuli on a macro-scale are generally not included, since despite the molecular origin of the motion the effects are not useable on the molecular scale.

This definition generally applies to synthetic molecular machines, which have historically gained inspiration from the naturally occurring biological molecular machines (also referred to as "nanomachines"). Biological machines are considered to be nanoscale devices (such as molecular proteins) in a living system that convert various forms of energy to mechanical work in order to drive crucial biological processes such as intracellular transport, muscle contractions, ATP generation and cell division.

History

What would be the utility of such machines? Who knows? I cannot see exactly what would happen, but I can hardly doubt that when we have some control of the arrangement of things on a molecular scale we will get an enormously greater range of possible properties that substances can have, and of the different things we can do.

— Richard Feynman, There's Plenty of Room at the Bottom

Biological molecular machines have been known and studied for decades given their vital role in sustaining life, and have served as inspiration for synthetically designed systems with similar useful functionality. The advent of conformational analysis, or the study of conformers to analyze complex chemical structures, in the 1950s gave rise to the idea of understanding and controlling relative motion within molecular components for further applications. This led to the design of "proto-molecular machines" featuring conformational changes such as cog-wheeling of the aromatic rings in triptycenes. By 1980, scientists could achieve desired conformations using external stimuli and utilize this for different applications. A major example is the design of a photoresponsive crown ether containing an azobenzene unit, which could switch between cis and trans isomers on exposure to light and hence tune the cation-binding properties of the ether. In his seminal 1959 lecture There's Plenty of Room at the Bottom, Richard Feynman alluded to the idea and applications of molecular devices designed artificially by manipulating matter at the atomic level. This was further substantiated by Eric Drexler during the 1970s, who developed ideas based on molecular nanotechnology such as nanoscale "assemblers", though their feasibility was disputed.

Though these events served as inspiration for the field, the actual breakthrough in practical approaches to synthesize artificial molecular machines (AMMs) took place in 1991 with the invention of a "molecular shuttle" by Sir Fraser Stoddart. Building upon the assembly of mechanically linked molecules such as catenanes and rotaxanes as developed by Jean-Pierre Sauvage in the early 1980s, this shuttle features a rotaxane with a ring that can move across an "axle" between two ends or possible binding sites (hydroquinone units). This design realized the well-defined motion of a molecular unit across the length of the molecule for the first time. In 1994, an improved design allowed control over the motion of the ring by pH variation or electrochemical methods, making it the first example of an AMM. Here the two binding sites are a benzidine and a biphenol unit; the cationic ring typically prefers staying over the benzidine ring, but moves over to the biphenol group when the benzidine gets protonated at low pH or if it gets electrochemically oxidized. In 1998, a study could capture the rotary motion of a decacyclene molecule on a copper-base metallic surface using a scanning tunneling microscope. Over the following decade, a broad variety of AMMs responding to various stimuli were invented for different applications. In 2016, the Nobel Prize in Chemistry was awarded to Sauvage, Stoddart, and Bernard L. Feringa for the design and synthesis of molecular machines.

Artificial molecular machines

Over the past few decades, AMMs have diversified rapidly and their design principles, properties, and characterization methods have been outlined more clearly. A major starting point for the design of AMMs is to exploit the existing modes of motion in molecules. For instance, single bonds can be visualized as axes of rotation, as can be metallocene complexes. Bending or V-like shapes can be achieved by incorporating double bonds, that can undergo cis-trans isomerization in response to certain stimuli (typically irradiation with a suitable wavelength), as seen in numerous designs consisting of stilbene and azobenzene units. Similarly, ring-opening and -closing reactions such as those seen for spiropyran and diarylethene can also produce curved shapes. Another common mode of movement is the circumrotation of rings relative to one another as observed in mechanically interlocked molecules (primarily catenanes). While this type of rotation can not be accessed beyond the molecule itself (because the rings are confined within one another), rotaxanes can overcome this as the rings can undergo translational movements along a dumbbell-like axis. Another line of AMMs consists of biomolecules such as DNA and proteins as part of their design, making use of phenomena like protein folding and unfolding.

AMM designs have diversified significantly since the early days of the field. A major route is the introduction of bistability to produce molecular switches, featuring two distinct configurations for the molecule to convert between. This has been perceived as a step forward from the original molecular shuttle which consisted of two identical sites for the ring to move between without any preference, in a manner analogous to the ring flip in an unsubstituted cyclohexane. If these two sites are different from each other in terms of features like electron density, this can give rise to weak or strong recognition sites as in biological systems — such AMMs have found applications in catalysis and drug delivery. This switching behavior has been further optimized to acquire useful work that gets lost when a typical switch returns to its original state. Inspired by the use of kinetic control to produce work in natural processes, molecular motors are designed to have a continuous energy influx to keep them away from equilibrium to deliver work.

Various energy sources are employed to drive molecular machines today, but this was not the case during the early years of AMM development. Though the movements in AMMs were regulated relative to the random thermal motion generally seen in molecules, they could not be controlled or manipulated as desired. This led to the addition of stimuli-responsive moieties in AMM design, so that externally applied non-thermal sources of energy could drive molecular motion and hence allow control over the properties. Chemical energy (or "chemical fuels") was an attractive option at the beginning, given the broad array of reversible chemical reactions (heavily based on acid-base chemistry) to switch molecules between different states. However, this comes with the issue of practically regulating the delivery of the chemical fuel and the removal of waste generated to maintain the efficiency of the machine as in biological systems. Though some AMMs have found ways to circumvent this, more recently waste-free reactions such based on electron transfers or isomerization have gained attention (such as redox-responsive viologens). Eventually, several different forms of energy (electric, magnetic, optical and so on) have become the primary energy sources used to power AMMs, even producing autonomous systems such as light-driven motors.

Types

Various AMMs are tabulated below along with indicative images:

Type	Details	Image
Molecular balance	A molecule that can interconvert between two or more conformational or configurational states in response to the dynamic of multiple intra- and intermolecular driving forces, such as hydrogen bonding, solvophobic or hydrophobic effects, π interactions, and steric and dispersion interactions. The distinct conformers of a molecular balance can show different interactions with the same molecule, such that analyzing the ratio of the conformers and the energies for these interactions can enable quantification of different properties (such as CH-π or arene-arene interactions, see image).
Molecular hinge	A molecular hinge is a molecule that can typically rotate in a crank-like motion around a rigid axis, such as a double bond or aromatic ring, to switch between reversible configurations. Such configurations must have distinguishable geometries; for instance, azobenzene groups in a linear molecule may undergo cis-trans isomerization when irradiated with ultraviolet light, triggering a reversible transition to a bent or V-shaped conformation (see image).Molecular hinges have been adapted for applications such as nucleobase recognition, peptide modifications, and visualizing molecular motion.
Molecular logic gate	A molecule that performs a logical operation on one or more logic inputs and produces a single logic output. Modelled on logic gates, these molecules have slowly replaced the conventional silicon-based machinery. Several applications have come forth, such as water quality examination, food safety examination, metal ion detection, and pharmaceutical studies. The first example of a molecular logic gate was reported in 1993, featuring a receptor (see image) where the emission intensity could be treated as a tunable output if the concentrations of protons and sodium ions were to be considered as inputs.
Molecular motor	A molecule that is capable of directional rotary motion around a single or double bond and produce useful work as a result (as depicted in the image). Carbon nanotube nanomotors have also been produced. Single bond rotary motors are generally activated by chemical reactions whereas double bond rotary motors are generally fueled by light. The rotation speed of the motor can also be tuned by careful molecular design.
Molecular necklace	A class of mechanically interlocked molecules derived from catenanes where a large macrocycle backbone connects at least three small rings in the shape of a necklace (see image for example). A molecular necklace consisting of a large macrocycle threaded by n-1 rings (hence comprising n rings) is represented as [n]MN. The first molecular necklace was synthesized in 1992, featuring several α-cyclodextrins on a single polyethylene glycol chain backbone; the authors connected this to the idea of a "molecular abacus" proposed by Stoddart and coworkers around the same time. Several interesting applications have emerged for these molecules, such as antibacterial activity, desulfurization of fuels, and piezoelectricity.
Molecular propeller	A molecule that can propel fluids when rotated, due to its special shape that is designed in analogy to macroscopic propellers (see schematic image on right). It has several molecular-scale blades attached at a certain pitch angle around the circumference of a nanoscale shaft.Propellers have been shown to have interesting properties, such as variations in pumping rates for hydrophilic and hydrophobic fluids.
Molecular shuttle	A molecule capable of shuttling molecules or ions from one location to another. This is schematically depicted in the image on the right, where a ring (in green) can bind to either one of the yellow sites on the blue macrocyclic backbone. A common molecular shuttle consists of a rotaxane where the macrocycle can move between two sites or stations along the dumbbell backbone; controlling the properties of either site and by regulating conditions like pH can enable control over which site is selected for binding. This has led to novel applications in catalysis and drug delivery.
Molecular switch	A molecule that can be reversibly shifted between two or more stable states in response to certain stimuli. This change of states influences the properties of the molecule according to the state it occupies at the moment. Unlike a molecular motor, any mechanical work done due to the motion in a switch is generally undone once the molecule returns to its original state unless it is part of a larger motor-like system. The image on the right shows a hydrazone-based switch that switches in response to pH changes.
Molecular tweezers	Host molecules capable of holding items between their two arms. The open cavity of the molecular tweezers binds items using non-covalent bonding including hydrogen bonding, metal coordination, hydrophobic forces, van der Waals forces, π interactions, or electrostatic effects. For instance, the image on the right depicts tweezers formed by corannulene pincers clasping a C60 fullerene molecule, termed "buckycatcher". Examples of molecular tweezers have been reported that are constructed from DNA and are considered DNA machines.
Nanocar	Single-molecule vehicles that resemble macroscopic automobiles and are important for understanding how to control molecular diffusion on surfaces. The image on the right shows an example with wheels made of fullerene molecules. The first nanocars were synthesized by James M. Tour in 2005. They had an H-shaped chassis and 4 molecular wheels (fullerenes) attached to the four corners. In 2011, Feringa and co-workers synthesized the first motorized nanocar which had molecular motors attached to the chassis as rotating wheels. The authors were able to demonstrate directional motion of the nanocar on a copper surface by providing energy from a scanning tunneling microscope tip. Later, in 2017, the world's first-ever nanocar race took place in Toulouse.

Biological molecular machines

A ribosome performing the elongation and membrane targeting stages of protein translation. The ribosome is green and yellow, the tRNAs are dark blue, and the other proteins involved are light blue. The produced peptide is released into the endoplasmic reticulum. Protein domain dynamics can now be seen by neutron spin echo spectroscpoy.

Many macromolecular machines are found within cells, often in the form of multi-protein complexes. Examples of biological machines include motor proteins such as myosin, which is responsible for muscle contraction, kinesin, which moves cargo inside cells away from the nucleus along microtubules, and dynein, which moves cargo inside cells towards the nucleus and produces the axonemal beating of motile cilia and flagella. "[I]n effect, the [motile cilium] is a nanomachine composed of perhaps over 600 proteins in molecular complexes, many of which also function independently as nanomachines ... Flexible linkers allow the mobile protein domains connected by them to recruit their binding partners and induce long-range allostery via protein domain dynamics." Other biological machines are responsible for energy production, for example ATP synthase which harnesses energy from proton gradients across membranes to drive a turbine-like motion used to synthesise ATP, the energy currency of a cell. Still other machines are responsible for gene expression, including DNA polymerases for replicating DNA, RNA polymerases for producing mRNA, the spliceosome for removing introns, and the ribosome for synthesising proteins. These machines and their nanoscale dynamics are far more complex than any molecular machines that have yet been artificially constructed.

Biological machines have potential applications in nanomedicine. For example, they could be used to identify and destroy cancer cells. Molecular nanotechnology is a speculative subfield of nanotechnology regarding the possibility of engineering molecular assemblers, biological machines which could re-order matter at a molecular or atomic scale.^{[citation needed]} Nanomedicine would make use of these nanorobots, introduced into the body, to repair or detect damages and infections, but these are considered to be far beyond current capabilities.

Research and applications

Advances in this area are inhibited by the lack of synthetic methods. In this context, theoretical modeling has emerged as a pivotal tool to understand the self-assembly or -disassembly processes in these systems.

Possible applications have been demonstrated for AMMs, including those integrated into polymeric, liquid crystal, and crystalline systems for varied functions. Homogenous catalysis is a prominent example, especially in areas like asymmetric synthesis, utilizing noncovalent interactions and biomimetic allosteric catalysis. AMMs have been pivotal in the design of several stimuli-responsive smart materials, such as 2D and 3D self-assembled materials and nanoparticle-based systems, for versatile applications ranging from 3D printing to drug delivery.

AMMs are gradually moving from the conventional solution-phase chemistry to surfaces and interfaces. For instance, AMM-immobilized surfaces (AMMISs) are a novel class of functional materials consisting of AMMs attached to inorganic surfaces forming features like self-assembled monolayers; this gives rise to tunable properties such as fluorescence, aggregation and drug-release activity.

Most of these "applications" remain at the proof-of-concept level. Challenges in streamlining macroscale applications include autonomous operation, the complexity of the machines, stability in the synthesis of the machines and the working conditions.