A Medley of Potpourri: Nov 25, 2021

Thursday, November 25, 2021

Protein engineering

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Protein_engineering

Protein engineering is the process of developing useful or valuable proteins. It is a young discipline, with much research taking place into the understanding of protein folding and recognition for protein design principles. It is also a product and services market, with an estimated value of $168 billion by 2017.

There are two general strategies for protein engineering: rational protein design and directed evolution. These methods are not mutually exclusive; researchers will often apply both. In the future, more detailed knowledge of protein structure and function, and advances in high-throughput screening, may greatly expand the abilities of protein engineering. Eventually, even unnatural amino acids may be included, via newer methods, such as expanded genetic code, that allow encoding novel amino acids in genetic code.

Approaches

Rational design

In rational protein design, a scientist uses detailed knowledge of the structure and function of a protein to make desired changes. In general, this has the advantage of being inexpensive and technically easy, since site-directed mutagenesis methods are well-developed. However, its major drawback is that detailed structural knowledge of a protein is often unavailable, and, even when available, it can be very difficult to predict the effects of various mutations since structural information most often provide a static picture of a protein structure. However, programs such as Folding@home and Foldit have utilized crowdsourcing techniques in order to gain insight into the folding motifs of proteins.

Computational protein design algorithms seek to identify novel amino acid sequences that are low in energy when folded to the pre-specified target structure. While the sequence-conformation space that needs to be searched is large, the most challenging requirement for computational protein design is a fast, yet accurate, energy function that can distinguish optimal sequences from similar suboptimal ones.

Multiple sequence alignment

Without structural information about a protein, sequence analysis is often useful in elucidating information about the protein. These techniques involve alignment of target protein sequences with other related protein sequences. This alignment can show which amino acids are conserved between species and are important for the function of the protein. These analyses can help to identify hot spot amino acids that can serve as the target sites for mutations. Multiple sequence alignment utilizes data bases such as PREFAB, SABMARK, OXBENCH, IRMBASE, and BALIBASE in order to cross reference target protein sequences with known sequences. Multiple sequence alignment techniques are listed below.

This method begins by performing pair wise alignment of sequences using k-tuple or Needleman–Wunsch methods. These methods calculate a matrix that depicts the pair wise similarity among the sequence pairs. Similarity scores are then transformed into distance scores that are used to produce a guide tree using the neighbor joining method. This guide tree is then employed to yield a multiple sequence alignment.

Clustal omega

This method is capable of aligning up to 190,000 sequences by utilizing the k-tuple method. Next sequences are clustered using the mBed and k-means methods. A guide tree is then constructed using the UPGMA method that is used by the HH align package. This guide tree is used to generate multiple sequence alignments.

MAFFT

This method utilizes fast Fourier transform (FFT) that converts amino acid sequences into a sequence composed of volume and polarity values for each amino acid residue. This new sequence is used to find homologous regions.

K-Align

This method utilizes the Wu-Manber approximate string matching algorithm to generate multiple sequence alignments.

Multiple sequence comparison by log expectation (MUSCLE)

This method utilizes Kmer and Kimura distances to generate multiple sequence alignments.^[

T-Coffee

This method utilizes tree based consistency objective functions for alignment evolution. This method has been shown to be 5-10% more accurate than Clustal W.

Coevolutionary analysis

Coevolutionary analysis is also known as correlated mutation, covariation, or co-substitution. This type of rational design involves reciprocal evolutionary changes at evolutionarily interacting loci. Generally this method begins with the generation of a curated multiple sequence alignments for the target sequence. This alignment is then subjected to manual refinement that involves removal of highly gapped sequences, as well as sequences with low sequence identity. This step increases the quality of the alignment. Next, the manually processed alignment is utilized for further coevolutionary measurements using distinct correlated mutation algorithms. These algorithms result in a coevolution scoring matrix. This matrix is filtered by applying various significance tests to extract significant coevolution values and wipe out background noise. Coevolutionary measurements are further evaluated to assess their performance and stringency. Finally, the results from this coevolutionary analysis are validated experimentally.

Structural prediction

De novo synthesis of protein benefits from knowledge of existing protein structures. This knowledge of existing protein structure assists with the prediction of new protein structures. Methods for protein structure prediction fall under one of the four following classes: ab initio, fragment based methods, homology modeling, and protein threading.

Ab initio

These methods involve free modeling without using any structural information about the template. Ab initio methods are aimed at prediction of the native structures of proteins corresponding to the global minimum of its free energy. some examples of ab initio methods are AMBER, GROMOS, GROMACS, CHARMM, OPLS, and ENCEPP12. General steps for ab initio methods begin with the geometric representation of the protein of interest. Next, a potential energy function model for the protein is developed. This model can be created using either molecular mechanics potentials or protein structure derived potential functions. Following the development of a potential model, energy search techniques including molecular dynamic simulations, Monte Carlo simulations and genetic algorithms are applied to the protein.

Fragment based

These methods use database information regarding structures to match homologous structures to the created protein sequences. These homologous structures are assembled to give compact structures using scoring and optimization procedures, with the goal of achieving the lowest potential energy score. Webservers for fragment information are I-TASSER, ROSETTA, ROSETTA @ home, FRAGFOLD, CABS fold, PROFESY, CREF, QUARK, UNDERTAKER, HMM, and ANGLOR.

Homology modeling

These methods are based upon the homology of proteins. These methods are also known as comparative modeling. The first step in homology modeling is generally the identification of template sequences of known structure which are homologous to the query sequence. Next the query sequence is aligned to the template sequence. Following the alignment, the structurally conserved regions are modeled using the template structure. This is followed by the modeling of side chains and loops that are distinct from the template. Finally the modeled structure undergoes refinement and assessment of quality. Servers that are available for homology modeling data are listed here: SWISS MODEL, MODELLER, ReformAlign, PyMOD, TIP-STRUCTFAST, COMPASS, 3d-PSSM, SAMT02, SAMT99, HHPRED, FAGUE, 3D-JIGSAW, META-PP, ROSETTA, and I-TASSER.

Protein threading

Protein threading can be used when a reliable homologue for the query sequence cannot be found. This method begins by obtaining a query sequence and a library of template structures. Next, the query sequence is threaded over known template structures. These candidate models are scored using scoring functions. These are scored based upon potential energy models of both query and template sequence. The match with the lowest potential energy model is then selected. Methods and servers for retrieving threading data and performing calculations are listed here: GenTHREADER, pGenTHREADER, pDomTHREADER, ORFEUS, PROSPECT, BioShell-Threading, FFASO3, RaptorX, HHPred, LOOPP server, Sparks-X, SEGMER, THREADER2, ESYPRED3D, LIBRA, TOPITS, RAPTOR, COTH, MUSTER.

For more information on rational design see site-directed mutagenesis.

Directed evolution

In directed evolution, random mutagenesis, e.g. by error-prone PCR or sequence saturation mutagenesis, is applied to a protein, and a selection regime is used to select variants having desired traits. Further rounds of mutation and selection are then applied. This method mimics natural evolution and, in general, produces superior results to rational design. An added process, termed DNA shuffling, mixes and matches pieces of successful variants to produce better results. Such processes mimic the recombination that occurs naturally during sexual reproduction. Advantages of directed evolution are that it requires no prior structural knowledge of a protein, nor is it necessary to be able to predict what effect a given mutation will have. Indeed, the results of directed evolution experiments are often surprising in that desired changes are often caused by mutations that were not expected to have some effect. The drawback is that they require high-throughput screening, which is not feasible for all proteins. Large amounts of recombinant DNA must be mutated and the products screened for desired traits. The large number of variants often requires expensive robotic equipment to automate the process. Further, not all desired activities can be screened for easily.

Natural Darwinian evolution can be effectively imitated in the lab toward tailoring protein properties for diverse applications, including catalysis. Many experimental technologies exist to produce large and diverse protein libraries and for screening or selecting folded, functional variants. Folded proteins arise surprisingly frequently in random sequence space, an occurrence exploitable in evolving selective binders and catalysts. While more conservative than direct selection from deep sequence space, redesign of existing proteins by random mutagenesis and selection/screening is a particularly robust method for optimizing or altering extant properties. It also represents an excellent starting point for achieving more ambitious engineering goals. Allying experimental evolution with modern computational methods is likely the broadest, most fruitful strategy for generating functional macromolecules unknown to nature.

The main challenges of designing high quality mutant libraries have shown significant progress in the recent past. This progress has been in the form of better descriptions of the effects of mutational loads on protein traits. Also computational approaches have showed large advances in the innumerably large sequence space to more manageable screenable sizes, thus creating smart libraries of mutants. Library size has also been reduced to more screenable sizes by the identification of key beneficial residues using algorithms for systematic recombination. Finally a significant step forward toward efficient reengineering of enzymes has been made with the development of more accurate statistical models and algorithms quantifying and predicting coupled mutational effects on protein functions.

Generally, directed evolution may be summarized as an iterative two step process which involves generation of protein mutant libraries, and high throughput screening processes to select for variants with improved traits. This technique does not require prior knowledge of the protein structure and function relationship. Directed evolution utilizes random or focused mutagenesis to generate libraries of mutant proteins. Random mutations can be introduced using either error prone PCR, or site saturation mutagenesis. Mutants may also be generated using recombination of multiple homologous genes. Nature has evolved a limited number of beneficial sequences. Directed evolution makes it possible to identify undiscovered protein sequences which have novel functions. This ability is contingent on the proteins ability to tolerant amino acid residue substitutions without compromising folding or stability.

Directed evolution methods can be broadly categorized into two strategies, asexual and sexual methods.

Asexual methods

Asexual methods do not generate any cross links between parental genes. Single genes are used to create mutant libraries using various mutagenic techniques. These asexual methods can produce either random or focused mutagenesis.

Random mutagenesis

Random mutagenic methods produce mutations at random throughout the gene of interest. Random mutagenesis can introduce the following types of mutations: transitions, transversions, insertions, deletions, inversion, missense, and nonsense. Examples of methods for producing random mutagenesis are below.

Error prone PCR

Error prone PCR utilizes the fact that Taq DNA polymerase lacks 3' to 5' exonuclease activity. This results in an error rate of 0.001-0.002% per nucleotide per replication. This method begins with choosing the gene, or the area within a gene, one wishes to mutate. Next, the extent of error required is calculated based upon the type and extent of activity one wishes to generate. This extent of error determines the error prone PCR strategy to be employed. Following PCR, the genes are cloned into a plasmid and introduced to competent cell systems. These cells are then screened for desired traits. Plasmids are then isolated for colonies which show improved traits, and are then used as templates the next round of mutagenesis. Error prone PCR shows biases for certain mutations relative to others. Such as biases for transitions over transversions.

Rates of error in PCR can be increased in the following ways:

Increase concentration of magnesium chloride, which stabilizes non complementary base pairing.
Add manganese chloride to reduce base pair specificity.
Increased and unbalanced addition of dNTPs.
Addition of base analogs like dITP, 8 oxo-dGTP, and dPTP.
Increase concentration of Taq polymerase.
Increase extension time.
Increase cycle time.
Use less accurate Taq polymerase.

Also see polymerase chain reaction for more information.

Rolling circle error-prone PCR

This PCR method is based upon rolling circle amplification, which is modeled from the method that bacteria use to amplify circular DNA. This method results in linear DNA duplexes. These fragments contain tandem repeats of circular DNA called concatamers, which can be transformed into bacterial strains. Mutations are introduced by first cloning the target sequence into an appropriate plasmid. Next, the amplification process begins using random hexamer primers and Φ29 DNA polymerase under error prone rolling circle amplification conditions. Additional conditions to produce error prone rolling circle amplification are 1.5 pM of template DNA, 1.5 mM MnCl₂ and a 24 hour reaction time. MnCl₂ is added into the reaction mixture to promote random point mutations in the DNA strands. Mutation rates can be increased by increasing the concentration of MnCl₂, or by decreasing concentration of the template DNA. Error prone rolling circle amplification is advantageous relative to error prone PCR because of its use of universal random hexamer primers, rather than specific primers. Also the reaction products of this amplification do not need to be treated with ligases or endonucleases. This reaction is isothermal.

Chemical mutagenesis

Chemical mutagenesis involves the use of chemical agents to introduce mutations into genetic sequences. Examples of chemical mutagens follow.

Sodium bisulfate is effective at mutating G/C rich genomic sequences. This is because sodium bisulfate catalyses deamination of unmethylated cytosine to uracil.

Ethyl methane sulfonate alkylates guanidine residues. This alteration causes errors during DNA replication.

Nitrous acid causes transversion by de-amination of adenine and cytosine.

The dual approach to random chemical mutagenesis is an iterative two step process. First it involves the in vivo chemical mutagenesis of the gene of interest via EMS. Next, the treated gene is isolated and cloning into an untreated expression vector in order to prevent mutations in the plasmid backbone. This technique preserves the plasmids genetic properties.

Targeting glycosylases to embedded arrays for mutagenesis (TaGTEAM)

This method has been used to create targeted in vivo mutagenesis in yeast. This method involves the fusion of a 3-methyladenine DNA glycosylase to tetR DNA-binding domain. This has been shown to increase mutation rates by over 800 time in regions of the genome containing tetO sites.

Mutagenesis by random insertion and deletion

This method involves alteration in length of the sequence via simultaneous deletion and insertion of chunks of bases of arbitrary length. This method has been shown to produce proteins with new functionalities via introduction of new restriction sites, specific codons, four base codons for non-natural amino acids.

Transposon based random mutagenesis

Recently many methods for transposon based random mutagenesis have been reported. This methods include, but are not limited to the following: PERMUTE-random circular permutation, random protein truncation, random nucleotide triplet substitution, random domain/tag/multiple amino acid insertion, codon scanning mutagenesis, and multicodon scanning mutagenesis. These aforementioned techniques all require the design of mini-Mu transposons. Thermo scientific manufactures kits for the design of these transposons.

Random mutagenesis methods altering the target DNA length

These methods involve altering gene length via insertion and deletion mutations. An example is the tandem repeat insertion (TRINS) method. This technique results in the generation of tandem repeats of random fragments of the target gene via rolling circle amplification and concurrent incorporation of these repeats into the target gene.

Mutator strains

Mutator strains are bacterial cell lines which are deficient in one or more DNA repair mechanisms. An example of a mutator strand is the E. coli XL1-RED. This subordinate strain of E. coli is deficient in the MutS, MutD, MutT DNA repair pathways. Use of mutator strains is useful at introducing many types of mutation; however, these strains show progressive sickness of culture because of the accumulation of mutations in the strains own genome.

Focused mutagenesis

Focused mutagenic methods produce mutations at predetermined amino acid residues. These techniques require and understanding of the sequence-function relationship for the protein of interest. Understanding of this relationship allows for the identification of residues which are important in stability, stereoselectivity, and catalytic efficiency. Examples of methods that produce focused mutagenesis are below.

Site saturation mutagenesis

Site saturation mutagenesis is a PCR based method used to target amino acids with significant roles in protein function. The two most common techniques for performing this are whole plasmid single PCR, and overlap extension PCR.

Whole plasmid single PCR is also referred to as site directed mutagenesis (SDM). SDM products are subjected to Dpn endonuclease digestion. This digestion results in cleavage of only the parental strand, because the parental strand contains a GmATC which is methylated at N6 of adenine. SDM does not work well for large plasmids of over ten kilobases. Also, this method is only capable of replacing two nucleotides at a time.

Overlap extension PCR requires the use of two pairs of primers. One primer in each set contains a mutation. A first round of PCR using these primer sets is performed and two double stranded DNA duplexes are formed. A second round of PCR is then performed in which these duplexes are denatured and annealed with the primer sets again to produce heteroduplexes, in which each strand has a mutation. Any gaps in these newly formed heteroduplexes are filled with DNA polymerases and further amplified.

Sequence saturation mutagenesis (SeSaM)

Sequence saturation mutagenesis results in the randomization of the target sequence at every nucleotide position. This method begins with the generation of variable length DNA fragments tailed with universal bases via the use of template transferases at the 3' termini. Next, these fragments are extended to full length using a single stranded template. The universal bases are replaced with a random standard base, causing mutations. There are several modified versions of this method such as SeSAM-Tv-II, SeSAM-Tv+, and SeSAM-III.

Single primer reactions in parallel (SPRINP)

This site saturation mutagenesis method involves two separate PCR reaction. The first of which uses only forward primers, while the second reaction uses only reverse primers. This avoids the formation of primer dimer formation.

Mega primed and ligase free focused mutagenesis

This site saturation mutagenic technique begins with one mutagenic oligonucleotide and one universal flanking primer. These two reactants are used for an initial PCR cycle. Products from this first PCR cycle are used as mega primers for the next PCR.

Ω-PCR

This site saturation mutagenic method is based on overlap extension PCR. It is used to introduce mutations at any site in a circular plasmid.

PFunkel-ominchange-OSCARR

This method utilizes user defined site directed mutagenesis at single or multiple sites simultaneously. OSCARR is an acronym for one pot simple methodology for cassette randomization and recombination. This randomization and recombination results in randomization of desired fragments of a protein. Omnichange is a sequence independent, multisite saturation mutagenesis which can saturate up to five independent codons on a gene.

Trimer-dimer mutagenesis

This method removes redundant codons and stop codons.

Cassette mutagenesis

This is a PCR based method. Cassette mutagenesis begins with the synthesis of a DNA cassette containing the gene of interest, which is flanked on either side by restriction sites. The endonuclease which cleaves these restriction sites also cleaves sites in the target plasmid. The DNA cassette and the target plasmid are both treated with endonucleases to cleave these restriction sites and create sticky ends. Next the products from this cleavage are ligated together, resulting in the insertion of the gene into the target plasmid. An alternative form of cassette mutagenesis called combinatorial cassette mutagenesis is used to identify the functions of individual amino acid residues in the protein of interest. Recursive ensemble mutagenesis then utilizes information from previous combinatorial cassette mutagenesis. Codon cassette mutagenesis allows you to insert or replace a single codon at a particular site in double stranded DNA.

Sexual methods

Sexual methods of directed evolution involve in vitro recombination which mimic natural in vivo recombination. Generally these techniques require high sequence homology between parental sequences. These techniques are often used to recombine two different parental genes, and these methods do create cross overs between these genes.

In vitro homologous recombination

Homologous recombination can be categorized as either in vivo or in vitro. In vitro homologous recombination mimics natural in vivo recombination. These in vitro recombination methods require high sequence homology between parental sequences. These techniques exploit the natural diversity in parental genes by recombining them to yield chimeric genes. The resulting chimera show a blend of parental characteristics.

DNA shuffling

This in vitro technique was one of the first techniques in the era of recombination. It begins with the digestion of homologous parental genes into small fragments by DNase1. These small fragments are then purified from undigested parental genes. Purified fragments are then reassembled using primer-less PCR. This PCR involves homologous fragments from different parental genes priming for each other, resulting in chimeric DNA. The chimeric DNA of parental size is then amplified using end terminal primers in regular PCR.

Random priming in vitro recombination (RPR)

This in vitro homologous recombination method begins with the synthesis of many short gene fragments exhibiting point mutations using random sequence primers. These fragments are reassembled to full length parental genes using primer-less PCR. These reassembled sequences are then amplified using PCR and subjected to further selection processes. This method is advantageous relative to DNA shuffling because there is no use of DNase1, thus there is no bias for recombination next to a pyrimidine nucleotide. This method is also advantageous due to its use of synthetic random primers which are uniform in length, and lack biases. Finally this method is independent of the length of DNA template sequence, and requires a small amount of parental DNA.

Truncated metagenomic gene-specific PCR

This method generates chimeric genes directly from metagenomic samples. It begins with isolation of the desired gene by functional screening from metagenomic DNA sample. Next, specific primers are designed and used to amplify the homologous genes from different environmental samples. Finally, chimeric libraries are generated to retrieve the desired functional clones by shuffling these amplified homologous genes.

Staggered extension process (StEP)

This in vitro method is based on template switching to generate chimeric genes. This PCR based method begins with an initial denaturation of the template, followed by annealing of primers and a short extension time. All subsequent cycle generate annealing between the short fragments generated in previous cycles and different parts of the template. These short fragments and the templates anneal together based on sequence complementarity. This process of fragments annealing template DNA is known as template switching. These annealed fragments will then serve as primers for further extension. This method is carried out until the parental length chimeric gene sequence is obtained. Execution of this method only requires flanking primers to begin. There is also no need for Dnase1 enzyme.

Random chimeragenesis on transient templates (RACHITT)

This method has been shown to generate chimeric gene libraries with an average of 14 crossovers per chimeric gene. It begins by aligning fragments from a parental top strand onto the bottom strand of a uracil containing template from a homologous gene. 5' and 3' overhang flaps are cleaved and gaps are filled by the exonuclease and endonuclease activities of Pfu and taq DNA polymerases. The uracil containing template is then removed from the heteroduplex by treatment with a uracil DNA glcosylase, followed by further amplification using PCR. This method is advantageous because it generates chimeras with relatively high crossover frequency. However it is somewhat limited due to the complexity and the need for generation of single stranded DNA and uracil containing single stranded template DNA.

Synthetic shuffling

Shuffling of synthetic degenerate oligonucleotides adds flexibility to shuffling methods, since oligonucleotides containing optimal codons and beneficial mutations can be included.

In vivo Homologous Recombination

Cloning performed in yeast involves PCR dependent reassembly of fragmented expression vectors. These reassembled vectors are then introduced to, and cloned in yeast. Using yeast to clone the vector avoids toxicity and counter-selection that would be introduced by ligation and propagation in E. coli.

Mutagenic organized recombination process by homologous in vivo grouping (MORPHING)

This method introduces mutations into specific regions of genes while leaving other parts intact by utilizing the high frequency of homologous recombination in yeast.

Phage-assisted continuous evolution (PACE)

This method utilizes a bacteriophage with a modified life cycle to transfer evolving genes from host to host. The phage's life cycle is designed in such a way that the transfer is correlated with the activity of interest from the enzyme. This method is advantageous because it requires minimal human intervention for the continuous evolution of the gene.

In vitro non-homologous recombination methods

These methods are based upon the fact that proteins can exhibit similar structural identity while lacking sequence homology.

Exon shuffling

Exon shuffling is the combination of exons from different proteins by recombination events occurring at introns. Orthologous exon shuffling involves combining exons from orthologous genes from different species. Orthologous domain shuffling involves shuffling of entire protein domains from orthologous genes from different species. Paralogous exon shuffling involves shuffling of exon from different genes from the same species. Paralogous domain shuffling involves shuffling of entire protein domains from paralogous proteins from the same species. Functional homolog shuffling involves shuffling of non-homologous domains which are functional related. All of these processes being with amplification of the desired exons from different genes using chimeric synthetic oligonucleotides. This amplification products are then reassembled into full length genes using primer-less PCR. During these PCR cycles the fragments act as templates and primers. This results in chimeric full length genes, which are then subjected to screening.

Incremental truncation for the creation of hybrid enzymes (ITCHY)

Fragments of parental genes are created using controlled digestion by exonuclease III. These fragments are blunted using endonuclease, and are ligated to produce hybrid genes. THIOITCHY is a modified ITCHY technique which utilized nucleotide triphosphate analogs such as α-phosphothioate dNTPs. Incorporation of these nucleotides blocks digestion by exonuclease III. This inhibition of digestion by exonuclease III is called spiking. Spiking can be accomplished by first truncating genes with exonuclease to create fragments with short single stranded overhangs. These fragments then serve as templates for amplification by DNA polymerase in the presence of small amounts of phosphothioate dNTPs. These resulting fragments are then ligated together to form full length genes. Alternatively the intact parental genes can be amplified by PCR in the presence of normal dNTPs and phosphothioate dNTPs. These full length amplification products are then subjected to digestion by an exonuclease. Digestion will continue until the exonuclease encounters an α-pdNTP, resulting in fragments of different length. These fragments are then ligated together to generate chimeric genes.

SCRATCHY

This method generates libraries of hybrid genes inhibiting multiple crossovers by combining DNA shuffling and ITCHY. This method begins with the construction of two independent ITCHY libraries. The first with gene A on the N-terminus. And the other having gene B on the N-terminus. These hybrid gene fragments are separated using either restriction enzyme digestion or PCR with terminus primers via agarose gel electrophoresis. These isolated fragments are then mixed together and further digested using DNase1. Digested fragments are then reassembled by primerless PCR with template switching.

Recombined extension on truncated templates (RETT)

This method generates libraries of hybrid genes by template switching of uni-directionally growing polynucleotides in the presence of single stranded DNA fragments as templates for chimeras. This method begins with the preparation of single stranded DNA fragments by reverse transcription from target mRNA. Gene specific primers are then annealed to the single stranded DNA. These genes are then extended during a PCR cycle. This cycle is followed by template switching and annealing of the short fragments obtained from the earlier primer extension to other single stranded DNA fragments. This process is repeated until full length single stranded DNA is obtained.

Sequence homology-independent protein recombination (SHIPREC)

This method generates recombination between genes with little to no sequence homology. These chimeras are fused via a linker sequence containing several restriction sites. This construct is then digested using DNase1. Fragments are made are made blunt ended using S1 nuclease. These blunt end fragments are put together into a circular sequence by ligation. This circular construct is then linearized using restriction enzymes for which the restriction sites are present in the linker region. This results in a library of chimeric genes in which contribution of genes to 5' and 3' end will be reversed as compared to the starting construct.

Sequence independent site directed chimeragenesis (SISDC)

This method results in a library of genes with multiple crossovers from several parental genes. This method does not require sequence identity among the parental genes. This does require one or two conserved amino acids at every crossover position. It begins with alignment of parental sequences and identification of consensus regions which serve as crossover sites. This is followed by the incorporation of specific tags containing restriction sites followed by the removal of the tags by digestion with Bac1, resulting in genes with cohesive ends. These gene fragments are mixed and ligated in an appropriate order to form chimeric libraries.

Degenerate homo-duplex recombination (DHR)

This method begins with alignment of homologous genes, followed by identification of regions of polymorphism. Next the top strand of the gene is divided into small degenerate oligonucleotides. The bottom strand is also digested into oligonucleotides to serve as scaffolds. These fragments are combined in solution are top strand oligonucleotides are assembled onto bottom strand oligonucleotides. Gaps between these fragments are filled with polymerase and ligated.

Random multi-recombinant PCR (RM-PCR)

This method involves the shuffling of plural DNA fragments without homology, in a single PCR. This results in the reconstruction of complete proteins by assembly of modules encoding different structural units.

User friendly DNA recombination (USERec)

This method begins with the amplification of gene fragments which need to be recombined, using uracil dNTPs. This amplification solution also contains primers, PfuTurbo, and Cx Hotstart DNA polymerase. Amplified products are next incubated with USER enzyme. This enzyme catalyzes the removal of uracil residues from DNA creating single base pair gaps. The USER enzyme treated fragments are mixed and ligated using T4 DNA ligase and subjected to Dpn1 digestion to remove the template DNA. These resulting dingle stranded fragments are subjected to amplification using PCR, and are transformed into E. coli.

Golden Gate shuffling (GGS) recombination

This method allows you to recombine at least 9 different fragments in an acceptor vector by using type 2 restriction enzyme which cuts outside of the restriction sites. It begins with sub cloning of fragments in separate vectors to create Bsa1 flanking sequences on both sides. These vectors are then cleaved using type II restriction enzyme Bsa1, which generates four nucleotide single strand overhangs. Fragments with complementary overhangs are hybridized and ligated using T4 DNA ligase. Finally these constructs are then transformed into E. coli cells, which are screened for expression levels.

Phosphoro thioate-based DNA recombination method (PRTec)

This method can be used to recombine structural elements or entire protein domains. This method is based on phosphorothioate chemistry which allows the specific cleavage of phosphorothiodiester bonds. The first step in the process begins with amplification of fragments that need to be recombined along with the vector backbone. This amplification is accomplished using primers with phosphorothiolated nucleotides at 5' ends. Amplified PCR products are cleaved in an ethanol-iodine solution at high temperatures. Next these fragments are hybridized at room temperature and transformed into E. coli which repair any nicks.

Integron

This system is based upon a natural site specific recombination system in E. coli. This system is called the integron system, and produces natural gene shuffling. This method was used to construct and optimize a functional tryptophan biosynthetic operon in trp-deficient E. coli by delivering individual recombination cassettes or trpA-E genes along with regulatory elements with the integron system.

Y-Ligation based shuffling (YLBS)

This method generates single stranded DNA strands, which encompass a single block sequence either at the 5' or 3' end, complementary sequences in a stem loop region, and a D branch region serving as a primer binding site for PCR. Equivalent amounts of both 5' and 3' half strands are mixed and formed a hybrid due to the complementarity in the stem region. Hybrids with free phosphorylated 5' end in 3' half strands are then ligated with free 3' ends in 5' half strands using T4 DNA ligase in the presence of 0.1 mM ATP. Ligated products are then amplified by two types of PCR to generate pre 5' half and pre 3' half PCR products. These PCR product are converted to single strands via avidin-biotin binding to the 5' end of the primes containing stem sequences that were biotin labeled. Next, biotinylated 5' half strands and non-biotinylated 3' half strands are used as 5' and 3' half strands for the next Y-ligation cycle.

Semi-rational design

Semi-rational design uses information about a proteins sequence, structure and function, in tandem with predictive algorithms. Together these are used to identify target amino acid residues which are most likely to influence protein function. Mutations of these key amino acid residues create libraries of mutant proteins that are more likely to have enhanced properties.

Advances in semi-rational enzyme engineering and de novo enzyme design provide researchers with powerful and effective new strategies to manipulate biocatalysts. Integration of sequence and structure based approaches in library design has proven to be a great guide for enzyme redesign. Generally, current computational de novo and redesign methods do not compare to evolved variants in catalytic performance. Although experimental optimization may be produced using directed evolution, further improvements in the accuracy of structure predictions and greater catalytic ability will be achieved with improvements in design algorithms. Further functional enhancements may be included in future simulations by integrating protein dynamics.

Biochemical and biophysical studies, along with fine-tuning of predictive frameworks will be useful to experimentally evaluate the functional significance of individual design features. Better understanding of these functional contributions will then give feedback for the improvement of future designs.

Directed evolution will likely not be replaced as the method of choice for protein engineering, although computational protein design has fundamentally changed the way protein engineering can manipulate bio-macromolecules. Smaller, more focused and functionally-rich libraries may be generated by using in methods which incorporate predictive frameworks for hypothesis-driven protein engineering. New design strategies and technical advances have begun a departure from traditional protocols, such as directed evolution, which represents the most effective strategy for identifying top-performing candidates in focused libraries. Whole-gene library synthesis is replacing shuffling and mutagenesis protocols for library preparation. Also highly specific low throughput screening assays are increasingly applied in place of monumental screening and selection efforts of millions of candidates. Together, these developments are poised to take protein engineering beyond directed evolution and towards practical, more efficient strategies for tailoring biocatalysts.

Screening and selection techniques

Once a protein has undergone directed evolution, ration design or semi-ration design, the libraries of mutant proteins must be screened to determine which mutants show enhanced properties. Phage display methods are one option for screening proteins. This method involves the fusion of genes encoding the variant polypeptides with phage coat protein genes. Protein variants expressed on phage surfaces are selected by binding with immobilized targets in vitro. Phages with selected protein variants are then amplified in bacteria, followed by the identification of positive clones by enzyme linked immunosorbent assay. These selected phages are then subjected to DNA sequencing.

Cell surface display systems can also be utilized to screen mutant polypeptide libraries. The library mutant genes are incorporated into expression vectors which are then transformed into appropriate host cells. These host cells are subjected to further high throughput screening methods to identify the cells with desired phenotypes.

Cell free display systems have been developed to exploit in vitro protein translation or cell free translation. These methods include mRNA display, ribosome display, covalent and non covalent DNA display, and in vitro compartmentalization.

Enzyme engineering

Enzyme engineering is the application of modifying an enzyme's structure (and, thus, its function) or modifying the catalytic activity of isolated enzymes to produce new metabolites, to allow new (catalyzed) pathways for reactions to occur, or to convert from some certain compounds into others (biotransformation). These products are useful as chemicals, pharmaceuticals, fuel, food, or agricultural additives.

An enzyme reactor consists of a vessel containing a reactional medium that is used to perform a desired conversion by enzymatic means. Enzymes used in this process are free in the solution.

Examples of engineered proteins

Computing methods have been used to design a protein with a novel fold, named Top7, and sensors for unnatural molecules. The engineering of fusion proteins has yielded rilonacept, a pharmaceutical that has secured Food and Drug Administration (FDA) approval for treating cryopyrin-associated periodic syndrome.

Another computing method, IPRO, successfully engineered the switching of cofactor specificity of Candida boidinii xylose reductase. Iterative Protein Redesign and Optimization (IPRO) redesigns proteins to increase or give specificity to native or novel substrates and cofactors. This is done by repeatedly randomly perturbing the structure of the proteins around specified design positions, identifying the lowest energy combination of rotamers, and determining whether the new design has a lower binding energy than prior ones.

Computation-aided design has also been used to engineer complex properties of a highly ordered nano-protein assembly. A protein cage, E. coli bacterioferritin (EcBfr), which naturally shows structural instability and an incomplete self-assembly behavior by populating two oligomerization states, is the model protein in this study. Through computational analysis and comparison to its homologs, it has been found that this protein has a smaller-than-average dimeric interface on its two-fold symmetry axis due mainly to the existence of an interfacial water pocket centered on two water-bridged asparagine residues. To investigate the possibility of engineering EcBfr for modified structural stability, a semi-empirical computational method is used to virtually explore the energy differences of the 480 possible mutants at the dimeric interface relative to the wild type EcBfr. This computational study also converges on the water-bridged asparagines. Replacing these two asparagines with hydrophobic amino acids results in proteins that fold into alpha-helical monomers and assemble into cages as evidenced by circular dichroism and transmission electron microscopy. Both thermal and chemical denaturation confirm that, all redesigned proteins, in agreement with the calculations, possess increased stability. One of the three mutations shifts the population in favor of the higher order oligomerization state in solution as shown by both size exclusion chromatography and native gel electrophoresis.

A in silico method, PoreDesigner, was successfully developed to redesign bacterial channel protein (OmpF) to reduce its 1 nm pore size to any desired sub-nm dimension. Transport experiments on the narrowest designed pores revealed complete salt rejection when assembled in biomimetic block-polymer matrices.

Directed evolution

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Directed_evolution

An example of directed evolution with comparison to natural evolution. The inner cycle indicates the 3 stages of the directed evolution cycle with the natural process being mimicked in brackets. The outer circle demonstrates steps in a typical experiment. The red symbols indicate functional variants, the pale symbols indicate variants with reduced function.

Directed evolution (DE) is a method used in protein engineering that mimics the process of natural selection to steer proteins or nucleic acids toward a user-defined goal. It consists of subjecting a gene to iterative rounds of mutagenesis (creating a library of variants), selection (expressing those variants and isolating members with the desired function) and amplification (generating a template for the next round). It can be performed in vivo (in living organisms), or in vitro (in cells or free in solution). Directed evolution is used both for protein engineering as an alternative to rationally designing modified proteins, as well as for experimental evolution studies of fundamental evolutionary principles in a controlled, laboratory environment.

History

Directed evolution has its origins in the 1960s with the evolution of RNA molecules in the "Spiegelman's Monster" experiment. The concept was extended to protein evolution via evolution of bacteria under selection pressures that favoured the evolution of a single gene in its genome.

Early phage display techniques in the 1980s allowed targeting of mutations and selection to a single protein. This enabled selection of enhanced binding proteins, but was not yet compatible with selection for catalytic activity of enzymes. Methods to evolve enzymes were developed in the 1990s and brought the technique to a wider scientific audience. The field rapidly expanded with new methods for making libraries of gene variants and for screening their activity. The development of directed evolution methods was honored in 2018 with the awarding of the Nobel Prize in Chemistry to Frances Arnold for evolution of enzymes, and George Smith and Gregory Winter for phage display.

Principles

Directed evolution is analogous to climbing a hill on a 'fitness landscape' where elevation represents the desired property. Each round of selection samples mutants on all sides of the starting template (1) and selects the mutant with the highest elevation, thereby climbing the hill. This is repeated until a local summit is reached (2).

Directed evolution is a mimic of the natural evolution cycle in a laboratory setting. Evolution requires three things to happen: variation between replicators, that the variation causes fitness differences upon which selection acts, and that this variation is heritable. In DE, a single gene is evolved by iterative rounds of mutagenesis, selection or screening, and amplification. Rounds of these steps are typically repeated, using the best variant from one round as the template for the next to achieve stepwise improvements.

The likelihood of success in a directed evolution experiment is directly related to the total library size, as evaluating more mutants increases the chances of finding one with the desired properties.

Generating variation

Starting gene (left) and library of variants (right). Point mutations change single nucleotides. Insertions and deletions add or remove sections of DNA. Shuffling recombines segments of two (or more) similar genes.

How DNA libraries generated by random mutagenesis sample sequence space. The amino acid substituted into a given position is shown. Each dot or set of connected dots is one member of the library. Error-prone PCR randomly mutates some residues to other amino acids. Alanine scanning replaces each residue of the protein with alanine, one-by-one. Site saturation substitutes each of the 20 possible amino acids (or some subset of them) at a single position, one-by-one.

The first step in performing a cycle of directed evolution is the generation of a library of variant genes. The sequence space for random sequence is vast (10¹³⁰ possible sequences for a 100 amino acid protein) and extremely sparsely populated by functional proteins. Neither experimental, nor natural evolution can ever get close to sampling so many sequences. Of course, natural evolution samples variant sequences close to functional protein sequences and this is imitated in DE by mutagenising an already functional gene. Some calculations suggest it is entirely feasible that for all practical (i.e. functional and structural) purposes, protein sequence space has been fully explored during the course of evolution of life on Earth.

The starting gene can be mutagenised by random point mutations (by chemical mutagens or error prone PCR) and insertions and deletions (by transposons). Gene recombination can be mimicked by DNA shuffling of several sequences (usually of more than 70% sequence identity) to jump into regions of sequence space between the shuffled parent genes. Finally, specific regions of a gene can be systematically randomised for a more focused approach based on structure and function knowledge. Depending on the method, the library generated will vary in the proportion of functional variants it contains. Even if an organism is used to express the gene of interest, by mutagenising only that gene the rest of the organism's genome remains the same and can be ignored for the evolution experiment (to the extent of providing a constant genetic environment).

Detecting fitness differences

The majority of mutations are deleterious and so libraries of mutants tend to mostly have variants with reduced activity. Therefore, a high-throughput assay is vital for measuring activity to find the rare variants with beneficial mutations that improve the desired properties. Two main categories of method exist for isolating functional variants. Selection systems directly couple protein function to survival of the gene, whereas screening systems individually assay each variant and allow a quantitative threshold to be set for sorting a variant or population of variants of a desired activity. Both selection and screening can be performed in living cells (in vivo evolution) or performed directly on the protein or RNA without any cells (in vitro evolution).

During in vivo evolution, each cell (usually bacteria or yeast) is transformed with a plasmid containing a different member of the variant library. In this way, only the gene of interest differs between the cells, with all other genes being kept the same. The cells express the protein either in their cytoplasm or surface where its function can be tested. This format has the advantage of selecting for properties in a cellular environment, which is useful when the evolved protein or RNA is to be used in living organisms. When performed without cells, DE involves using in vitro transcription translation to produce proteins or RNA free in solution or compartmentalised in artificial microdroplets. This method has the benefits of being more versatile in the selection conditions (e.g. temperature, solvent), and can express proteins that would be toxic to cells. Furthermore, in vitro evolution experiments can generate far larger libraries (up to 10¹⁵) because the library DNA need not be inserted into cells (often a limiting step).

Selection

Selection for binding activity is conceptually simple. The target molecule is immobilised on a solid support, a library of variant proteins is flowed over it, poor binders are washed away, and the remaining bound variants recovered to isolate their genes. Binding of an enzyme to immobilised covalent inhibitor has been also used as an attempt to isolate active catalysts. This approach, however, only selects for single catalytic turnover and is not a good model of substrate binding or true substrate reactivity. If an enzyme activity can be made necessary for cell survival, either by synthesizing a vital metabolite, or destroying a toxin, then cell survival is a function of enzyme activity. Such systems are generally only limited in throughput by the transformation efficiency of cells. They are also less expensive and labour-intensive than screening, however they are typically difficult to engineer, prone to artefacts and give no information on the range of activities present in the library.

Screening

An alternative to selection is a screening system. Each variant gene is individually expressed and assayed to quantitatively measure the activity (most often by a colourgenic or fluorogenic product). The variants are then ranked and the experimenter decides which variants to use as templates for the next round of DE. Even the most high throughput assays usually have lower coverage than selection methods but give the advantage of producing detailed information on each one of the screened variants. This disaggregated data can also be used to characterise the distribution of activities in libraries which is not possible in simple selection systems. Screening systems, therefore, have advantages when it comes to experimentally characterising adaptive evolution and fitness landscapes.

Ensuring heredity

An expressed protein can either be covalently linked to its gene (as in mRNA, left) or compartmentalized with it (cells or artificial compartments, right). Either way ensures that the gene can be isolated based on the activity of the encoded protein.

When functional proteins have been isolated, it is necessary that their genes are too, therefore a genotype–phenotype link is required. This can be covalent, such as mRNA display where the mRNA gene is linked to the protein at the end of translation by puromycin. Alternatively the protein and its gene can be co-localised by compartmentalisation in living cells or emulsion droplets. The gene sequences isolated are then amplified by PCR or by transformed host bacteria. Either the single best sequence, or a pool of sequences can be used as the template for the next round of mutagenesis. The repeated cycles of Diversification-Selection-Amplification generate protein variants adapted to the applied selection pressures.

Comparison to rational protein design

Advantages of directed evolution

Rational design of a protein relies on an in-depth knowledge of the protein structure, as well as its catalytic mechanism. Specific changes are then made by site-directed mutagenesis in an attempt to change the function of the protein. A drawback of this is that even when the structure and mechanism of action of the protein are well known, the change due to mutation is still difficult to predict. Therefore, an advantage of DE is that there is no need to understand the mechanism of the desired activity or how mutations would affect it.

Limitations of directed evolution

A restriction of directed evolution is that a high-throughput assay is required in order to measure the effects of a large number of different random mutations. This can require extensive research and development before it can be used for directed evolution. Additionally, such assays are often highly specific to monitoring a particular activity and so are not transferable to new DE experiments.

Additionally, selecting for improvement in the assayed function simply generates improvements in the assayed function. To understand how these improvements are achieved, the properties of the evolving enzyme have to be measured. Improvement of the assayed activity can be due to improvements in enzyme catalytic activity or enzyme concentration. There is also no guarantee that improvement on one substrate will improve activity on another. This is particularly important when the desired activity cannot be directly screened or selected for and so a ‘proxy’ substrate is used. DE can lead to evolutionary specialisation to the proxy without improving the desired activity. Consequently, choosing appropriate screening or selection conditions is vital for successful DE.

The speed of evolution in an experiment also poses a limitation on the utility of directed evolution. For instance, evolution of a particular phenotype, while theoretically feasible, may occur on time-scales that are not practically feasible. Recent theoretical approaches have aimed to overcome the limitation of speed through an application of counter-diabatic driving techniques from statistical physics, though this has yet to be implemented in a directed evolution experiment.

Combinatorial approaches

Combined, 'semi-rational' approaches are being investigated to address the limitations of both rational design and directed evolution. Beneficial mutations are rare, so large numbers of random mutants have to be screened to find improved variants. 'Focused libraries' concentrate on randomising regions thought to be richer in beneficial mutations for the mutagenesis step of DE. A focused library contains fewer variants than a traditional random mutagenesis library and so does not require such high-throughput screening.

Creating a focused library requires some knowledge of which residues in the structure to mutate. For example, knowledge of the active site of an enzyme may allow just the residues known to interact with the substrate to be randomised. Alternatively, knowledge of which protein regions are variable in nature can guide mutagenesis in just those regions.

Applications

Directed evolution is frequently used for protein engineering as an alternative to rational design, but can also be used to investigate fundamental questions of enzyme evolution.

Protein engineering

As a protein engineering tool, DE has been most successful in three areas:

Improving protein stability for biotechnological use at high temperatures or in harsh solvents
Improving binding affinity of therapeutic antibodies (Affinity maturation) and the activity of de novo designed enzymes
Altering substrate specificity of existing enzymes, (often for use in industry)

Evolution studies

The study of natural evolution is traditionally based on extant organisms and their genes. However, research is fundamentally limited by the lack of fossils (and particularly the lack of ancient DNA sequences) and incomplete knowledge of ancient environmental conditions. Directed evolution investigates evolution in a controlled system of genes for individual enzymes, ribozymes and replicators (similar to experimental evolution of eukaryotes, prokaryotes and viruses).

DE allows control of selection pressure, mutation rate and environment (both the abiotic environment such as temperature, and the biotic environment, such as other genes in the organism). Additionally, there is a complete record of all evolutionary intermediate genes. This allows for detailed measurements of evolutionary processes, for example epistasis, evolvability, adaptive constraint fitness landscapes, and neutral networks.

Adaptive laboratory evolution of microbial proteomes

The natural amino acid composition of proteomes can be changed by global canonical amino acids substitutions with suitable noncanonical counterparts under the experimentally imposed selective pressure. For example, global proteome-wide substitutions of natural amino acids with fluorinated analogs have been attempted in Escherichia coli and Bacillus subtilis. A complete tryptophan substitution with thienopyrrole-alanine in response to 20899 UGG codons in Escherichia coli was reported in 2015 by Budisa and Söll. The experimental evolution of microbial strains with a clear-cut accommodation of an additional amino acid is expected to be instrumental for widening the genetic code experimentally. Directed evolution typically targets a particular gene for mutagenesis and then screens the resulting variants for a phenotype of interest, often independent of fitness effects, whereas adaptive laboratory evolution selects many genome-wide mutations that contribute to the fitness of actively growing cultures.

Xenobiology

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Xenobiology

Xenobiology (XB) is a subfield of synthetic biology, the study of synthesizing and manipulating biological devices and systems. The name "xenobiology" derives from the Greek word xenos, which means "stranger, alien". Xenobiology is a form of biology that is not (yet) familiar to science and is not found in nature. In practice, it describes novel biological systems and biochemistries that differ from the canonical DNA–RNA-20 amino acid system (see central dogma of molecular biology). For example, instead of DNA or RNA, XB explores nucleic acid analogues, termed xeno nucleic acid (XNA) as information carriers. It also focuses on an expanded genetic code and the incorporation of non-proteinogenic amino acids into proteins.

Difference between xeno-, exo-, and astro-biology

"Astro" means "star" and "exo" means "outside". Both exo- and astrobiology deal with the search for naturally evolved life in the Universe, mostly on other planets in the circumstellar habitable zone. (These are also occasionally referred to as xenobiology.) Whereas astrobiologists are concerned with the detection and analysis of life elsewhere in the Universe, xenobiology attempts to design forms of life with a different biochemistry or different genetic code than on planet Earth.

Aims

Xenobiology has the potential to reveal fundamental knowledge about biology and the origin of life. In order to better understand the origin of life, it is necessary to know why life evolved seemingly via an early RNA world to the DNA-RNA-protein system and its nearly universal genetic code. Was it an evolutionary "accident" or were there constraints that ruled out other types of chemistries? By testing alternative biochemical "primordial soups", it is expected to better understand the principles that gave rise to life as we know it.
Xenobiology is an approach to develop industrial production systems with novel capabilities by means of enhanced biopolymer engineering and pathogen resistance. The genetic code encodes in all organisms 20 canonical amino acids that are used for protein biosynthesis. In rare cases, special amino acids such as selenocysteine, pyrrolysine or formylmethionine, can be incorporated by the translational apparatus in to proteins of some organisms. By using additional amino acids from among the over 700 known to biochemistry, the capabilities of proteins may be altered to give rise to more efficient catalytical or material functions. The EC-funded project Metacode, for example, aims to incorporate metathesis (a useful catalytical function so far not known in living organisms) into bacterial cells. Another reason why XB could improve production processes lies in the possibility to reduce the risk of virus or bacteriophage contamination in cultivations since XB cells would no longer provide suitable host cells, rendering them more resistant (an approach called semantic containment)
Xenobiology offers the option to design a "genetic firewall", a novel biocontainment system, which may help to strengthen and diversify current bio-containment approaches. One concern with traditional genetic engineering and biotechnology is horizontal gene transfer to the environment and possible risks to human health. One major idea in XB is to design alternative genetic codes and biochemistries so that horizontal gene transfer is no longer possible. Additionally alternative biochemistry also allows for new synthetic auxotrophies. The idea is to create an orthogonal biological system that would be incompatible with natural genetic systems.

Scientific approach

In xenobiology, the aim is to design and construct biological systems that differ from their natural counterparts on one or more fundamental levels. Ideally these new-to-nature organisms would be different in every possible biochemical aspect exhibiting a very different genetic code. The long-term goal is to construct a cell that would store its genetic information not in DNA but in an alternative informational polymer consisting of xeno nucleic acids (XNA), different base pairs, using non-canonical amino acids and an altered genetic code. So far cells have been constructed that incorporate only one or two of these features.

Xeno nucleic acids (XNA)

Originally this research on alternative forms of DNA was driven by the question of how life evolved on earth and why RNA and DNA were selected by (chemical) evolution over other possible nucleic acid structures. Two hypotheses for the selection of RNA and DNA as life's backbone are either they are favored under life on Earth's conditions, or they were coincidentally present in pre-life chemistry and continue to be used now. Systematic experimental studies aiming at the diversification of the chemical structure of nucleic acids have resulted in completely novel informational biopolymers. So far a number of XNAs with new chemical backbones or leaving group of the DNA have been synthesized, e.g.: hexose nucleic acid (HNA); threose nucleic acid (TNA), glycol nucleic acid (GNA) cyclohexenyl nucleic acid (CeNA). The incorporation of XNA in a plasmid, involving 3 HNA codons, has been accomplished already in 2003. This XNA is used in vivo (E coli) as template for DNA synthesis. This study, using a binary (G/T) genetic cassette and two non-DNA bases (Hx/U), was extended to CeNA, while GNA seems to be too alien at this moment for the natural biological system to be used as template for DNA synthesis. Extended bases using a natural DNA backbone could, likewise, be transliterated into natural DNA, although to a more limited extent.

Aside being used as extensions to template DNA strands, XNA activity has been tested for use as genetic catalysts. Although proteins are the most common components of cellular enzymatic activity, nucleic acids are also used in the cell to catalyze reactions. A 2015 study found several different kinds of XNA, most notably FANA (2'-fluoroarabino nucleic acids), as well as HNA, CeNA and ANA (arabino nucleic acids) could be used to cleave RNA during post-transcriptional RNA processing acting as XNA enzymes, hence the name XNAzymes. FANA XNAzymes also showed the ability to ligate DNA, RNA and XNA substrates. Although XNAzyme studies are still preliminary, this study was a step in the direction of searching for synthetic circuit components that are more efficient than those containing DNA and RNA counterparts that can regulate DNA, RNA, and their own, XNA, substrates.

Expanding the genetic alphabet

While XNAs have modified backbones, other experiments target the replacement or enlargement of the genetic alphabet of DNA with unnatural base pairs. For example, DNA has been designed that has – instead of the four standard bases A, T, G, and C – six bases A, T, G, C, and the two new ones P and Z (where Z stands for 6-Amino-5-nitro3-(l'-p-D-2'-deoxyribofuranosyl)-2(1H)-pyridone, and P stands for 2-Amino-8-(1-beta-D-2'-deoxyribofuranosyl)imidazo[1,2-a]-1,3,5-triazin-4 (8H)). In a systematic study, Leconte et al. tested the viability of 60 candidate bases (yielding potentially 3600 base pairs) for possible incorporation in the DNA.

In 2002, Hirao et al. developed an unnatural base pair between 2-amino-8-(2-thienyl)purine (s) and pyridine-2-one (y) that functions in vitro in transcription and translation toward a genetic code for protein synthesis containing a non-standard amino acid. In 2006, they created 7-(2-thienyl)imidazo[4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde (Pa) as a third base pair for replication and transcription, and afterward, Ds and 4-[3-(6-aminohexanamido)-1-propynyl]-2-nitropyrrole (Px) was discovered as a high fidelity pair in PCR amplification. In 2013, they applied the Ds-Px pair to DNA aptamer generation by in vitro selection (SELEX) and demonstrated the genetic alphabet expansion significantly augment DNA aptamer affinities to target proteins.

In May 2014, researchers announced that they had successfully introduced two new artificial nucleotides into bacterial DNA, alongside the four naturally occurring nucleotides, and by including individual artificial nucleotides in the culture media, were able to passage the bacteria 24 times; they did not create mRNA or proteins able to use the artificial nucleotides.

Novel polymerases

Neither the XNA nor the unnatural bases are recognized by natural polymerases. One of the major challenges is to find or create novel types of polymerases that will be able to replicate these new-to-nature constructs. In one case a modified variant of the HIV-reverse transcriptase was found to be able to PCR-amplify an oligonucleotide containing a third type base pair. Pinheiro et al. (2012) demonstrated that the method of polymerase evolution and design successfully led to the storage and recovery of genetic information (of less than 100bp length) from six alternative genetic polymers based on simple nucleic acid architectures not found in nature, xeno nucleic acids.

Genetic code engineering

One of the goals of xenobiology is to rewrite the genetic code. The most promising approach to change the code is the reassignment of seldom used or even unused codons. In an ideal scenario, the genetic code is expanded by one codon, thus having been liberated from its old function and fully reassigned to a non-canonical amino acid (ncAA) ("code expansion"). As these methods are laborious to implement, and some short cuts can be applied ("code engineering"), for example in bacteria that are auxotrophic for specific amino acids and at some point in the experiment are fed isostructural analogues instead of the canonical amino acids for which they are auxotrophic. In that situation, the canonical amino acid residues in native proteins are substituted with the ncAAs. Even the insertion of multiple different ncAAs into the same protein is possible. Finally, the repertoire of 20 canonical amino acids can not only be expanded, but also reduced to 19. By reassigning transfer RNA (tRNA)/aminoacyl-tRNA synthetase pairs the codon specificity can be changed. Cells endowed with such aminoacyl-[tRNA synthetases] are thus able to read [mRNA] sequences that make no sense to the existing gene expression machinery. Altering the codon: tRNA synthetases pairs may lead to the in vivo incorporation of the non-canonical amino acids into proteins. In the past reassigning codons was mainly done on a limited scale. In 2013, however, Farren Isaacs and George Church at Harvard University reported the replacement of all 321 TAG stop codons present in the genome of E. coli with synonymous TAA codons, thereby demonstrating that massive substitutions can be combined into higher-order strains without lethal effects. Following the success of this genome wide codon replacement, the authors continued and achieved the reprogramming of 13 codons throughout the genome, directly affecting 42 essential genes.

An even more radical change in the genetic code is the change of a triplet codon to a quadruplet and even pentaplet codon pioneered by Sisido in cell-free systems and by Schultz in bacteria. Finally, non-natural base pairs can be used to introduce novel amino acid in proteins.

Directed evolution

The goal of substituting DNA by XNA may also be reached by another route, namely by engineering the environment instead of the genetic modules. This approach has been successfully demonstrated by Marlière and Mutzel with the production of an E. coli strain whose DNA is composed of standard A, C and G nucleotides but has the synthetic thymine analogue 5-chlorouracil instead of thymine (T) in the corresponding positions of the sequence. These cells are then dependent on externally supplied 5-chlorouracil for growth, but otherwise they look and behave as normal E. coli. These cells, however, are currently not yet fully auxotrophic for the Xeno-base since they are still growing on thymine when this is supplied to the medium.

Biosafety

Xenobiological systems are designed to convey orthogonality to natural biological systems. A (still hypothetical) organism that uses XNA, different base pairs and polymerases and has an altered genetic code will hardly be able to interact with natural forms of life on the genetic level. Thus, these xenobiological organisms represent a genetic enclave that cannot exchange information with natural cells. Altering the genetic machinery of the cell leads to semantic containment. In analogy to information processing in IT, this safety concept is termed a “genetic firewall”. The concept of the genetic firewall seems to overcome a number of limitations of previous safety systems. A first experimental evidence of the theoretical concept of the genetic firewall was achieved in 2013 with the construction of a genomically recoded organism (GRO). In this GRO all known UAG stop codons in E.coli were replaced by UAA codons, which allowed for the deletion of release factor 1 and reassignment of UAG translation function. The GRO exhibited increased resistance to T7 bacteriophage, thus showing that alternative genetic codes do reduce genetic compatibility. This GRO, however, is still very similar to its natural “parent” and cannot be regarded as a genetic firewall. The possibility of reassigning the function of large number of triplets opens the perspective to have strains that combine XNA, novel base pairs, new genetic codes, etc. that cannot exchange any information with the natural biological world. Regardless of changes leading to a semantic containment mechanism in new organisms, any novel biochemical systems still has to undergo a toxicological screening. XNA, novel proteins, etc. might represent novel toxins, or have an allergic potential that needs to be assessed.

Governance and regulatory issues

Xenobiology might challenge the regulatory framework, as currently laws and directives deal with genetically modified organisms and do not directly mention chemically or genomically modified organisms. Taking into account that real xenobiology organisms are not expected in the next few years, policy makers do have some time at hand to prepare themselves for an upcoming governance challenge. Since 2012, the following groups have picked up the topic as a developing governance issue: policy advisers in the US, four National Biosafety Boards in Europe, the European Molecular Biology Organisation, and the European Commission's Scientific Committee on Emerging and Newly Identified Health Risks (SCENIHR) in three opinions (Definition, risk assessment methodologies and safety aspects, and risks to the environment and biodiversity related to synthetic biology and research priorities in the field of synthetic biology.).

Search This Blog

Thursday, November 25, 2021

Protein engineering

Approaches

Rational design

Multiple sequence alignment

Clustal omega

MAFFT

K-Align

Multiple sequence comparison by log expectation (MUSCLE)

T-Coffee

Coevolutionary analysis

Structural prediction

Ab initio

Fragment based

Homology modeling

Protein threading

Directed evolution

Asexual methods

Random mutagenesis

Error prone PCR

Rolling circle error-prone PCR

Chemical mutagenesis

Targeting glycosylases to embedded arrays for mutagenesis (TaGTEAM)

Mutagenesis by random insertion and deletion

Transposon based random mutagenesis

Random mutagenesis methods altering the target DNA length

Mutator strains

Focused mutagenesis

Site saturation mutagenesis

Sequence saturation mutagenesis (SeSaM)

Single primer reactions in parallel (SPRINP)

Mega primed and ligase free focused mutagenesis

Ω-PCR

PFunkel-ominchange-OSCARR

Trimer-dimer mutagenesis

Cassette mutagenesis

Sexual methods

In vitro homologous recombination

DNA shuffling

Random priming in vitro recombination (RPR)

Truncated metagenomic gene-specific PCR

Staggered extension process (StEP)

Random chimeragenesis on transient templates (RACHITT)

Synthetic shuffling

In vivo Homologous Recombination

Mutagenic organized recombination process by homologous in vivo grouping (MORPHING)

Phage-assisted continuous evolution (PACE)

In vitro non-homologous recombination methods

Exon shuffling

Incremental truncation for the creation of hybrid enzymes (ITCHY)

SCRATCHY

Recombined extension on truncated templates (RETT)

Sequence homology-independent protein recombination (SHIPREC)

Sequence independent site directed chimeragenesis (SISDC)

Degenerate homo-duplex recombination (DHR)

Random multi-recombinant PCR (RM-PCR)

User friendly DNA recombination (USERec)

Golden Gate shuffling (GGS) recombination

Phosphoro thioate-based DNA recombination method (PRTec)

Integron

Y-Ligation based shuffling (YLBS)

Semi-rational design

Screening and selection techniques

Enzyme engineering

Examples of engineered proteins

Directed evolution

History

Principles

Generating variation

Detecting fitness differences

Selection

Screening

Ensuring heredity

Comparison to rational protein design

Advantages of directed evolution

Limitations of directed evolution

Combinatorial approaches

Applications

Protein engineering