Search This Blog

Sunday, April 20, 2025

Protein complex

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Protein_complex
Kinesin is a protein functioning as a molecular biological machine. It uses protein domain dynamics on nanoscales

A protein complex or multiprotein complex is a group of two or more associated polypeptide chains. Protein complexes are distinct from multidomain enzymes, in which multiple catalytic domains are found in a single polypeptide chain.

Protein complexes are a form of quaternary structure. Proteins in a protein complex are linked by non-covalent protein–protein interactions. These complexes are a cornerstone of many (if not most) biological processes. The cell is seen to be composed of modular supramolecular complexes, each of which performs an independent, discrete biological function.

Through proximity, the speed and selectivity of binding interactions between enzymatic complex and substrates can be vastly improved, leading to higher cellular efficiency. Many of the techniques used to enter cells and isolate proteins are inherently disruptive to such large complexes, complicating the task of determining the components of a complex.

Examples of protein complexes include the proteasome for molecular degradation and most RNA polymerases. In stable complexes, large hydrophobic interfaces between proteins typically bury surface areas larger than 2500 square Ås.

Function

The Bacillus amyloliquefaciens ribonuclease barnase (colored) and its inhibitor (blue) in a complex

Protein complex formation can activate or inhibit one or more of the complex members and in this way, protein complex formation can be similar to phosphorylation. Individual proteins can participate in a variety of protein complexes. Different complexes perform different functions, and the same complex can perform multiple functions depending on various factors. Factors include:

  • Cell compartment location
  • Cell cycle stage
  • Cell nutritional status

Many protein complexes are well understood, particularly in the model organism Saccharomyces cerevisiae (yeast). For this relatively simple organism, the study of protein complexes is now genome wide and the elucidation of most of its protein complexes is ongoing.[citation needed] In 2021, researchers used deep learning software RoseTTAFold along with AlphaFold to solve the structures of 712 eukaryote complexes. They compared 6000 yeast proteins to those from 2026 other fungi and 4325 other eukaryotes.

Types of protein complexes

Obligate vs non-obligate protein complex

If a protein can form a stable well-folded structure on its own (without any other associated protein) in vivo, then the complexes formed by such proteins are termed "non-obligate protein complexes". However, some proteins can't be found to create a stable well-folded structure alone, but can be found as a part of a protein complex which stabilizes the constituent proteins. Such protein complexes are called "obligate protein complexes".

Transient vs permanent/stable protein complex

Transient protein complexes form and break down transiently in vivo, whereas permanent complexes have a relatively long half-life. Typically, the obligate interactions (protein–protein interactions in an obligate complex) are permanent, whereas non-obligate interactions have been found to be either permanent or transient. Note that there is no clear distinction between obligate and non-obligate interaction, rather there exist a continuum between them which depends on various conditions e.g. pH, protein concentration etc. However, there are important distinctions between the properties of transient and permanent/stable interactions: stable interactions are highly conserved but transient interactions are far less conserved, interacting proteins on the two sides of a stable interaction have more tendency of being co-expressed than those of a transient interaction (in fact, co-expression probability between two transiently interacting proteins is not higher than two random proteins), and transient interactions are much less co-localized than stable interactions. Though, transient by nature, transient interactions are very important for cell biology: the human interactome is enriched in such interactions, these interactions are the dominating players of gene regulation and signal transduction, and proteins with intrinsically disordered regions (IDR: regions in protein that show dynamic inter-converting structures in the native state) are found to be enriched in transient regulatory and signaling interactions.

Fuzzy complex

Fuzzy protein complexes have more than one structural form or dynamic structural disorder in the bound state. This means that proteins may not fold completely in either transient or permanent complexes. Consequently, specific complexes can have ambiguous interactions, which vary according to the environmental signals. Hence different ensembles of structures result in different (even opposite) biological functions. Post-translational modifications, protein interactions or alternative splicing modulate the conformational ensembles of fuzzy complexes, to fine-tune affinity or specificity of interactions. These mechanisms are often used for regulation within the eukaryotic transcription machinery.

Essential proteins in protein complexes

Essential proteins in yeast complexes occur much less randomly than expected by chance. Modified after Ryan et al. 2013

Although some early studies suggested a strong correlation between essentiality and protein interaction degree (the "centrality-lethality" rule) subsequent analyses have shown that this correlation is weak for binary or transient interactions (e.g., yeast two-hybrid). However, the correlation is robust for networks of stable co-complex interactions. In fact, a disproportionate number of essential genes belong to protein complexes. This led to the conclusion that essentiality is a property of molecular machines (i.e. complexes) rather than individual components. Wang et al. (2009) noted that larger protein complexes are more likely to be essential, explaining why essential genes are more likely to have high co-complex interaction degree. Ryan et al. (2013) referred to the observation that entire complexes appear essential as "modular essentiality". These authors also showed that complexes tend to be composed of either essential or non-essential proteins rather than showing a random distribution (see Figure). However, this not an all or nothing phenomenon: only about 26% (105/401) of yeast complexes consist of solely essential or solely nonessential subunits.

In humans, genes whose protein products belong to the same complex are more likely to result in the same disease phenotype.

Homomultimeric and heteromultimeric proteins

The subunits of a multimeric protein may be identical as in a homomultimeric (homooligomeric) protein or different as in a heteromultimeric protein. Many soluble and membrane proteins form homomultimeric complexes in a cell, majority of proteins in the Protein Data Bank are homomultimeric. Homooligomers are responsible for the diversity and specificity of many pathways, may mediate and regulate gene expression, activity of enzymes, ion channels, receptors, and cell adhesion processes.

The voltage-gated potassium channels in the plasma membrane of a neuron are heteromultimeric proteins composed of four of forty known alpha subunits. Subunits must be of the same subfamily to form the multimeric protein channel. The tertiary structure of the channel allows ions to flow through the hydrophobic plasma membrane. Connexons are an example of a homomultimeric protein composed of six identical connexins. A cluster of connexons forms the gap-junction in two neurons that transmit signals through an electrical synapse.

Intragenic complementation

When multiple copies of a polypeptide encoded by a gene form a complex, this protein structure is referred to as a multimer. When a multimer is formed from polypeptides produced by two different mutant alleles of a particular gene, the mixed multimer may exhibit greater functional activity than the unmixed multimers formed by each of the mutants alone. In such a case, the phenomenon is referred to as intragenic complementation (also called inter-allelic complementation). Intragenic complementation has been demonstrated in many different genes in a variety of organisms including the fungi Neurospora crassa, Saccharomyces cerevisiae and Schizosaccharomyces pombe; the bacterium Salmonella typhimurium; the virus bacteriophage T4, an RNA virus and humans. In such studies, numerous mutations defective in the same gene were often isolated and mapped in a linear order on the basis of recombination frequencies to form a genetic map of the gene. Separately, the mutants were tested in pairwise combinations to measure complementation. An analysis of the results from such studies led to the conclusion that intragenic complementation, in general, arises from the interaction of differently defective polypeptide monomers to form a multimer. Genes that encode multimer-forming polypeptides appear to be common. One interpretation of the data is that polypeptide monomers are often aligned in the multimer in such a way that mutant polypeptides defective at nearby sites in the genetic map tend to form a mixed multimer that functions poorly, whereas mutant polypeptides defective at distant sites tend to form a mixed multimer that functions more effectively. The intermolecular forces likely responsible for self-recognition and multimer formation were discussed by Jehle.

Structure determination

The molecular structure of protein complexes can be determined by experimental techniques such as X-ray crystallography, Single particle analysis or nuclear magnetic resonance. Increasingly the theoretical option of protein–protein docking is also becoming available. One method that is commonly used for identifying the meomplexes is immunoprecipitation. Recently, Raicu and coworkers developed a method to determine the quaternary structure of protein complexes in living cells. This method is based on the determination of pixel-level Förster resonance energy transfer (FRET) efficiency in conjunction with spectrally resolved two-photon microscope. The distribution of FRET efficiencies are simulated against different models to get the geometry and stoichiometry of the complexes.

Assembly

Proper assembly of multiprotein complexes is important, since misassembly can lead to disastrous consequences. In order to study pathway assembly, researchers look at intermediate steps in the pathway. One such technique that allows one to do that is electrospray mass spectrometry, which can identify different intermediate states simultaneously. This has led to the discovery that most complexes follow an ordered assembly pathway. In the cases where disordered assembly is possible, the change from an ordered to a disordered state leads to a transition from function to dysfunction of the complex, since disordered assembly leads to aggregation.

The structure of proteins play a role in how the multiprotein complex assembles. The interfaces between proteins can be used to predict assembly pathways. The intrinsic flexibility of proteins also plays a role: more flexible proteins allow for a greater surface area available for interaction.

While assembly is a different process from disassembly, the two are reversible in both homomeric and heteromeric complexes. Thus, the overall process can be referred to as (dis)assembly.

Evolutionary significance of multiprotein complex assembly

In homomultimeric complexes, the homomeric proteins assemble in a way that mimics evolution. That is, an intermediate in the assembly process is present in the complex's evolutionary history. The opposite phenomenon is observed in heteromultimeric complexes, where gene fusion occurs in a manner that preserves the original assembly pathway.

Essential gene

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Essential_gene

Essential genes are indispensable genes for organisms to grow and reproduce offspring under certain environment. However, being essential is highly dependent on the circumstances in which an organism lives. For instance, a gene required to digest starch is only essential if starch is the only source of energy. Recently, systematic attempts have been made to identify those genes that are absolutely required to maintain life, provided that all nutrients are available. Such experiments have led to the conclusion that the absolutely required number of genes for bacteria is on the order of about 250–300. Essential genes of single-celled organisms encode proteins for three basic functions including genetic information processing, cell envelopes and energy production. Those gene functions are used to maintain a central metabolism, replicate DNA, translate genes into proteins, maintain a basic cellular structure, and mediate transport processes into and out of the cell. Compared with single-celled organisms, multicellular organisms have more essential genes related to communication and development. Most of the essential genes in viruses are related to the processing and maintenance of genetic information. In contrast to most single-celled organisms, viruses lack many essential genes for metabolism, which forces them to hijack the host's metabolism. Most genes are not essential but convey selective advantages and increased fitness. Hence, the vast majority of genes are not essential and many can be deleted without consequences, at least under most circumstances.

Bacteria: genome-wide studies

Two main strategies have been employed to identify essential genes on a genome-wide basis: directed deletion of genes and random mutagenesis using transposons. In the first case, annotated individual genes (or ORFs) are completely deleted from the genome in a systematic way. In transposon-mediated mutagenesis, transposons are randomly inserted in as many positions in a genome as possible, aiming to disrupt the function of the targeted genes (see figure below). Insertion mutants that are still able to survive or grow suggest the transposon inserted in a gene that is not essential for survival. The location of the transposon insertions can be determined through hybridization to microarrays  or through transposon sequencing . With the development of CRISPR, gene essentiality has also been determined through inhibition of gene expression through CRISPR interference.

Essential genes in Mycobacterium tuberculosis H37Rv as found by using transposons which insert in random positions in the genome. If no transposons are found in a gene, the gene is most likely essential as it cannot tolerate any insertion. In this example, essential heme biosynthetic genes hemA, hemB, hemC, hemD are devoid of insertions. The number of sequence reads (‘‘reads/TA’’) is shown for the indicated region of the H37Rv chromosome. Potential TA dinucleotide insertions sites are indicated. Image from Griffin et al. 2011.

On the basis of genome-wide experimental studies and systems biology analysis, an essential gene database has been developed by Kong et al. (2019) for predicting > 4000 bacterial species.

Eukaryotes

In Saccharomyces cerevisiae (budding yeast) 15-20% of all genes are essential. In Schizosaccharomyces pombe (fission yeast) 4,836 heterozygous deletions covering 98.4% of the 4,914 protein coding open reading frames have been constructed. 1,260 of these deletions turned out to be essential.

Similar screens are more difficult to carry out in other multicellular organisms, including mammals (as a model for humans), due to technical reasons, and their results are less clear. However, various methods have been developed for the nematode worm C. elegans, the fruit fly, and zebrafish (see table). A recent study of 900 mouse genes concluded that 42% of them were essential although the selected genes were not representative.

Gene knockout experiments are not possible or at least not ethical in humans. However, natural mutations have led to the identification of mutations that lead to early embryonic or later death. Note that many genes in humans are not absolutely essential for survival but can cause severe disease when mutated. Such mutations are catalogued in the Online Mendelian Inheritance in Man (OMIM) database. In a computational analysis of genetic variation and mutations in 2,472 human orthologs of known essential genes in the mouse, Georgi et al. found strong, purifying selection and comparatively reduced levels of sequence variation, indicating that these human genes are essential too.

While it may be difficult to prove that a gene is essential in humans, it can be demonstrated that a gene is not essential or not even causing disease. For instance, sequencing the genomes of 2,636 Icelandic citizens and the genotyping of 101,584 additional subjects found 8,041 individuals who had 1 gene completely knocked out (i.e. these people were homozygous for a non-functional gene). Of the 8,041 individuals with complete knock-outs, 6,885 were estimated to be homozygotes, 1,249 were estimated to be compound heterozygotes (i.e. they had both alleles of a gene knocked out but the two alleles had different mutations). In these individuals, a total of 1,171 of the 19,135 human (RefSeq) genes (6.1%) were completely knocked out. It was concluded that these 1,171 genes are non-essential in humans — at least no associated diseases were reported. Similarly, the exome sequences of 3222 British Pakistani-heritage adults with high parental relatedness revealed 1111 rare-variant homozygous genotypes with predicted loss of gene function (LOF = knockouts) in 781 genes. This study found an average of 140 predicted LOF genotypes (per subject), including 16 rare (minor allele frequency <1%) heterozygotes, 0.34 rare homozygotes, 83.2 common heterozygotes and 40.6 common homozygotes. Nearly all rare homozygous LOF genotypes were found within autozygous segments (94.9%). Even though most of these individuals had no obvious health issue arising from their defective genes, it is possible that minor health issues may be found upon more detailed examination.

A summary of essentiality screens is shown in the table below (mostly based on the Database of Essential Genes.

Organism Method Essential genes
Arabidopsis thaliana T-DNA insertion 777
Caenorhabditis elegans (worm) RNA interference 294
Danio rerio (zebrafish) Insertion mutagenesis 288
Drosophila melanogaster (fruit fly) P-element insertion mutagenesis 339
Homo sapiens (human) Literature search 118
Homo sapiens (human) CRISPR/Cas9-based screen 1,878
Homo sapiens (human) Haploid gene-trap screen ~2,000
Homo sapiens (human) mouse orthologs 2,472
Mus musculus (mouse) Literature search 2114
Saccharomyces cerevisiae (yeast) Single-gene deletions 878
Saccharomyces cerevisiae (yeast) Single-gene deletions 1,105
Schizosaccharomyces pombe (yeast) Single-gene deletions 1,260

Viruses

Viruses lack many genes necessary for metabolism, forcing them to hijack the host's metabolism. Screens for essential genes have been carried out in a few viruses. For instance, human cytomegalovirus (CMV) was found to have 41 essential, 88 nonessential, and 27 augmenting ORFs (150 total ORFs). Most essential and augmenting genes are located in the central region, and nonessential genes generally cluster near the ends of the viral genome.

Tscharke and Dobson (2015) compiled a comprehensive survey of essential genes in Vaccinia Virus and assigned roles to each of the 223 ORFs of the Western Reserve (WR) strain and 207 ORFs of the Copenhagen strain, assessing their role in replication in cell culture. According to their definition, a gene is considered essential (i.e. has a role in cell culture) if its deletion results in a decrease in virus titre of greater than 10-fold in either a single or multiple step growth curve. All genes involved in wrapped virion production, actin tail formation, and extracellular virion release were also considered as essential. Genes that influence plaque size, but not replication were defined as non-essential. By this definition 93 genes are required for Vaccinia Virus replication in cell culture, while 108 and 94 ORFs, from WR and Copenhagen respectively, are non-essential. Vaccinia viruses with deletions at either end of the genome behaved as expected, exhibiting only mild or host range defects. In contrast, combining deletions at both ends of the genome for VACV strain WR caused a devastating growth defect on all cell lines tested. This demonstrates that single gene deletions are not sufficient to assess the essentiality of genes and that more genes are essential in Vaccinia virus than originally thought.

One of the bacteriophages screened for essential genes includes mycobacteriophage Giles. At least 35 of the 78 predicted Giles genes (45%) are non-essential for lytic growth. 20 genes were found to be essential. A major problem with phage genes is that a majority of their genes remain functionally unknown, hence their role is difficult to assess. A screen of Salmonella enterica phage SPN3US revealed 13 essential genes although it remains a bit obscure how many genes were really tested.

Quantitative gene essentiality analysis

In theory, essential genes are qualitative. However, depending on the surrounding environment, certain essential gene mutants may show partial functions, which can be quantitatively determined in some studies. For instance, a particular gene deletion may reduce growth rate (or fertility rate or other characters) to 90% of the wild-type. If there are isozymes or alternative pathways for the essential genes, they can be deleted completely. Using CRISPR interference, the expression of essential genes can be modulated or "tuned", leading to quantitative (or continuous) relationships between the level of gene-expression and the magnitude of fitness cost exhibited by a given mutant.

Synthetic lethality

Two genes are synthetic lethal if neither one is essential but when both are mutated the double-mutant is lethal. Some studies have estimated that the number of synthetic lethal genes may be on the order of 45% of all genes.

Conditionally essential genes

A schematic view of essential genes (or proteins) in lysine biosynthesis of different bacteria. The same protein may be essential in one species but not another.

Many genes are essential only under certain circumstances. For instance, if the amino acid lysine is supplied to a cell any gene that is required to make lysine is non-essential. However, when there is no lysine supplied, genes encoding enzymes for lysine biosynthesis become essential, as no protein synthesis is possible without lysine.

Streptococcus pneumoniae appears to require 147 genes for growth and survival in saliva, more than the 113-133 that have been found in previous studies.

The deletion of a gene may result in death or in a block of cell division. While the latter case may implicate "survival" for some time, without cell division the cell may still die eventually. Similarly, instead of blocked cell division a cell may have reduced growth or metabolism ranging from nearly undetectable to almost normal. Thus, there is gradient from "essential" to completely non-essential, again depending on the condition. Some authors have thus distinguished between genes "essential for survival" and "essential for fitness".

The role of genetic background. Similar to environmental conditions, the genetic background can determine the essentiality of a gene: a gene may be essential in one individual but not another, given his or her genetic background. Gene duplications are one possible explanation (see below).

Metabolic dependency. Genes involved in certain biosynthetic pathways, such as amino acid synthesis, can become non-essential if one or more amino acids are supplied by culture medium or by another organism. This is the main reason why many parasites (e.g. Cryptosporidium hominis) or endosymbiontic bacteria lost many genes (e.g. Chlamydia). Such genes may be essential but only present in the host organism. For instance, Chlamydia trachomatis cannot synthesize purine and pyrimidine nucleotides de novo, so these bacteria are dependent on the nucleotide biosynthetic genes of the host.

Another kind of metabolic dependency, unrelated to cross-species interactions, can be found when bacteria are grown under specific nutrient conditions. For example, more than 100 genes become essential when Escherichia coli is grown on nutrient-limited media. Specifically, isocitrate dehydrogenase (icd) and citrate synthase (gltA) are two enzymes that are part of the tricarboxylic acid (TCA) cycle. Both genes are essential in M9 minimal media (which provides only the most basic nutrients). However, when the media is supplementing with 2-oxoglutarate or glutamate, these genes are not essential any more.

Gene duplications and alternative metabolic pathways

Many genes are duplicated within a genome and many organisms have different metabolic pathways (alternative metabolic pathway) to synthesis same products. Such duplications (paralogs) and alternative metabolic pathways often render essential genes non-essential because the duplicate can replace the original copy. For instance, the gene encoding the enzyme aspartokinase is essential in E. coli. By contrast, the Bacillus subtilis genome contains three copies of this gene, none of which is essential on its own. However, a triple-deletion of all three genes is lethal. In such cases, the essentiality of a gene or a group of paralogs can often be predicted based on the essentiality of an essential single gene in a different species. In yeast, few of the essential genes are duplicated within the genome: 8.5% of the non-essential genes, but only 1% of the essential genes have a homologue in the yeast genome.

In the worm C. elegans, non-essential genes are highly over-represented among duplicates, possibly because duplication of essential genes causes overexpression of these genes. Woods et al. found that non-essential genes are more often successfully duplicated (fixed) and lost compared to essential genes. By contrast, essential genes are less often duplicated but upon successful duplication are maintained over longer periods.

Conservation

In bacteria, essential genes appear to be more conserved than nonessential genes but the correlation is not very strong. For instance, only 34% of the B. subtilis essential genes have reliable orthologs in all Bacillota and 61% of the E. coli essential genes have reliable orthologs in all Gamma-proteobacteria. Fang et al. (2005) defined persistent genes as the genes present in more than 85% of the genomes of the clade. They found 475 and 611 of such genes for B. subtilis and E. coli, respectively. Furthermore, they classified genes into five classes according to persistence and essentiality: persistent genes, essential genes, persistent nonessential (PNE) genes (276 in B. subtilis, 409 in E. coli), essential nonpersistent (ENP) genes (73 in B. subtilis, 33 in E. coli), and nonpersistent nonessential (NPNE) genes (3,558 in B. subtilis, 3,525 in E. coli). Fang et al. found 257 persistent genes, which exist both in B. subtilis (for the Bacillota) and E. coli (for the Gamma-proteobacteria). Among these, 144 (respectively 139) were previously identified as essential in B. subtilis (respectively E. coli) and 25 (respectively 18) of the 257 genes are not present in the 475 B. subtilis (respectively 611 E. coli) persistent genes. All the other members of the pool are PNE genes.

In eukaryotes, 83% of the one-to-one orthologs between Schizosaccharomyces pombe and Saccharomyces cerevisiae have conserved essentiality, that is, they are nonessential in both species or essential in both species. The remaining 17% of genes are nonessential in one species and essential in the other. This is quite remarkable, given that S. pombe is separated from S. cerevisiae by approximately 400 million years of evolution.

In general, highly conserved and thus older genes (i.e. genes with earlier phylogenetic origin) are more likely to be essential than younger genes - even if they have been duplicated.

Study

The experimental study of essential genes is limited by the fact that, by definition, inactivation of an essential gene is lethal to the organism. Therefore, they cannot be simply deleted or mutated to analyze the resulting phenotypes (a common technique in genetics).

There are, however, some circumstances in which essential genes can be manipulated. In diploid organisms, only a single functional copy of some essential genes may be needed (haplosufficiency), with the heterozygote displaying an instructive phenotype. Some essential genes can tolerate mutations that are deleterious, but not wholly lethal, since they do not completely abolish the gene's function.

Computational analysis can reveal many properties of proteins without analyzing them experimentally, e.g. by looking at homologous proteins, function, structure etc. (see also below, Predicting essential genes). The products of essential genes can also be studied when expressed in other organisms, or when purified and studied in vitro.

Conditionally essential genes are easier to study. Temperature-sensitive variants of essential genes have been identified which encode products that lose function at high temperatures, and so only show a phenotype at increased temperature.

Reproducibility

If screens for essential genes are repeated in independent laboratories, they often result in different gene lists. For instance, screens in E. coli have yielded from ~300 to ~600 essential genes (see Table 1). Such differences are even more pronounced when different bacterial strains are used (see Figure 2). A common explanation is that the experimental conditions are different or that the nature of the mutation may be different (e.g. a complete gene deletion vs. a transposon mutant). Transposon screens in particular are hard to reproduce, given that a transposon can insert at many positions within a gene. Insertions towards the 3' end of an essential gene may not have a lethal phenotype (or no phenotype at all) and thus may not be recognized as such. This can lead to erroneous annotations (here: false negatives).

Comparison of CRISPR/cas9 and RNAi screens. Screens to identify essential genes in the human chronic myelogenous leukemia cell line K562 with these two methods showed only limited overlap. At a 10% false positive rate there were ~4,500 genes identified in the Cas9 screen versus ~3,100 in the shRNA screen, with only ~1,200 genes identified in both.

Different essential genes in different organisms

Different organisms may have different essential genes. For instance, Bacillus subtilis has 271 essential genes. About one-half (150) of the orthologous genes in E. coli are also essential. Another 67 genes that are essential in E. coli are not essential in B. subtilis, while 86 E. coli essential genes have no B. subtilis ortholog. In Mycoplasma genitalium at least 18 genes are essential that are not essential in M. bovis. Many of these different essential genes are caused by paralogs or alternative metabolic pathways.

Such different essential genes in bacteria can be used to develop targeted antibacterial therapies against certain specific pathogens to reduce antibiotic resistance in the microbiome era. Stone et al (2015) have used the difference in essential genes in bacteria to develop selective drugs against the oral pathogen Porphyromonas gingivalis, rather than the beneficial bacteria Streptococcus sanguis.

Prediction

Essential genes can be predicted computationally. However, most methods use experimental data ("training sets") to some extent. Chen et al. determined four criteria to select training sets for such predictions: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. They also found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Some approaches for predicting essential genes are:

Comparative genomics. Shortly after the first genomes (of Haemophilus influenzae and Mycoplasma genitalium) became available, Mushegian et al. tried to predict the number of essential genes based on common genes in these two species. It was surmised that only essential genes should be conserved over the long evolutionary distance that separated the two bacteria. This study identified approximately 250 candidate essential genes. As more genomes became available the number of predicted essential genes kept shrinking because more genomes shared fewer and fewer genes. As a consequence, it was concluded that the universal conserved core consists of less than 40 genes. However, this set of conserved genes is not identical to the set of essential genes as different species rely on different essential genes.

A similar approach has been used to infer essential genes from the pan-genome of Brucella species. 42 complete Brucella genomes and a total of 132,143 protein-coding genes were used to predict 1252 potential essential genes, derived from the core genome by comparison with a prokaryote database of essential genes.

Network analysis. After the first protein interaction networks of yeast had been published, it was found that highly connected proteins (e.g. by protein-protein interactions) are more likely to be essential. However, highly connected proteins may be experimental artifacts and high connectivity may rather represent pleiotropy instead of essentiality. Nevertheless, network methods have been improved by adding other criteria and therefore do have some value in predicting essential genes.

Machine Learning. Hua et al. used Machine Learning to predict essential genes in 25 bacterial species.

Hurst index. Liu et al. (2015) used the Hurst exponent, a characteristic parameter to describe long-range correlation in DNA to predict essential genes. In 31 out of 33 bacterial genomes the significance levels of the Hurst exponents of the essential genes were significantly higher than for the corresponding full-gene-set, whereas the significance levels of the Hurst exponents of the nonessential genes remained unchanged or increased only slightly.

Minimal genomes. It was also thought that essential genes could be inferred from minimal genomes which supposedly contain only essential genes. The problem here is that the smallest genomes belong to parasitic (or symbiontic) species which can survive with a reduced gene set as they obtain many nutrients from their hosts. For instance, one of the smallest genomes is that of Hodgkinia cicadicola, a symbiont of cicadas, containing only 144 Kb of DNA encoding only 188 genes. Like other symbionts, Hodgkinia receives many of its nutrients from its host, so its genes do not need to be essential.

Metabolic modelling. Essential genes may be also predicted in completely sequenced genomes by metabolic reconstruction, that is, by reconstructing the complete metabolism from the gene content and then identifying those genes and pathways that have been found to be essential in other species. However, this method can be compromised by proteins of unknown function. In addition, many organisms have backup or alternative pathways which have to be taken into account (see figure 1). Metabolic modeling was also used by Basler (2015) to develop a method to predict essential metabolic genes. Flux balance analysis, a method of metabolic modeling, has recently been used to predict essential genes in clear cell renal cell carcinoma metabolism.

Genes of unknown function. Surprisingly, a significant number of essential genes has no known function. For instance, among the 385 essential candidates in M. genitalium, no function could be ascribed to 95 genes even though this number had been reduced to 75 by 2011. Most of unknown functionally essential genes have potential biological functions related to one of the three fundamental functions.

ZUPLS. Song et al. presented a novel method to predict essential genes that only uses the Z-curve and other sequence-based features. Such features can be calculated readily from the DNA/amino acid sequences. However, the reliability of this method remains a bit obscure.

Essential gene prediction servers. Guo et al. (2015) have developed three online services to predict essential genes in bacterial genomes. These freely available tools are applicable for single gene sequences without annotated functions, single genes with definite names, and complete genomes of bacterial strains. Kong et al. (2019) have developed the ePath database, which can be used to search > 4000 bacterial species for predicting essential genes.

Essential protein domains

Although most essential genes encode proteins, many essential proteins consist of a single domain. This fact has been used to identify essential protein domains. Goodacre et al. have identified hundreds of essential domains of unknown function (eDUFs). Lu et al. presented a similar approach and identified 3,450 domains that are essential in at least one microbial species.

Germline

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Germline
Cormlets of Watsonia meriana, an example of apomixis
Clathria tuberosa, an example of a sponge that can grow indefinitely from somatic tissue and reconstitute itself from totipotent separated somatic cells

In biology and genetics, the germline is the population of a multicellular organism's cells that develop into germ cells. In other words, they are the cells that form gametes (eggs and sperm), which can come together to form a zygote. They differentiate in the gonads from primordial germ cells into gametogonia, which develop into gametocytes, which develop into the final gametes. This process is known as gametogenesis.

Germ cells pass on genetic material through the process of sexual reproduction. This includes fertilization, recombination and meiosis. These processes help to increase genetic diversity in offspring.

Certain organisms reproduce asexually via processes such as apomixis, parthenogenesis, autogamy, and cloning. Apomixis and Parthenogenesis both refer to the development of an embryo without fertilization. The former typically occurs in plants seeds, while the latter tends to be seen in nematodes, as well as certain species of reptiles, birds, and fish. Autogamy is a term used to describe self pollination in plants. Cloning is a technique used to creation of genetically identical cells or organisms.

In sexually reproducing organisms, cells that are not in the germline are called somatic cells. According to this definition, mutations, recombinations and other genetic changes in the germline may be passed to offspring, but changes in a somatic cell will not be. This need not apply to somatically reproducing organisms, such as some Porifera and many plants. For example, many varieties of citrus, plants in the Rosaceae and some in the Asteraceae, such as Taraxacum, produce seeds apomictically when somatic diploid cells displace the ovule or early embryo.

In an earlier stage of genetic thinking, there was a clear distinction between germline and somatic cells. For example, August Weismann proposed and pointed out, a germline cell is immortal in the sense that it is part of a lineage that has reproduced indefinitely since the beginning of life and, barring accident, could continue doing so indefinitely. However, it is now known in some detail that this distinction between somatic and germ cells is partly artificial and depends on particular circumstances and internal cellular mechanisms such as telomeres and controls such as the selective application of telomerase in germ cells, stem cells and the like.

Not all multicellular organisms differentiate into somatic and germ lines, but in the absence of specialised technical human intervention practically all but the simplest multicellular structures do so. In such organisms somatic cells tend to be practically totipotent, and for over a century sponge cells have been known to reassemble into new sponges after having been separated by forcing them through a sieve.

Germline can refer to a lineage of cells spanning many generations of individuals—for example, the germline that links any living individual to the hypothetical last universal common ancestor, from which all plants and animals descend.

Evolution

Plants and basal metazoans such as sponges (Porifera) and corals (Anthozoa) do not sequester a distinct germline, generating gametes from multipotent stem cell lineages that also give rise to ordinary somatic tissues. It is therefore likely that germline sequestration first evolved in complex animals with sophisticated body plans, i.e. bilaterians. There are several theories on the origin of the strict germline-soma distinction. Setting aside an isolated germ cell population early in embryogenesis might promote cooperation between the somatic cells of a complex multicellular organism. Another recent theory suggests that early germline sequestration evolved to limit the accumulation of deleterious mutations in mitochondrial genes in complex organisms with high energy requirements and fast mitochondrial mutation rates.

DNA damage, mutation and repair

Reactive oxygen species (ROS) are produced as byproducts of metabolism. In germline cells, ROS are likely a significant cause of DNA damages that, upon DNA replication, lead to mutations. 8-Oxoguanine, an oxidized derivative of guanine, is produced by spontaneous oxidation in the germline cells of mice, and during the cell's DNA replication cause GC to TA transversion mutations. Such mutations occur throughout the mouse chromosomes as well as during different stages of gametogenesis.

The mutation frequencies for cells in different stages of gametogenesis are about 5 to 10-fold lower than in somatic cells both for spermatogenesis and oogenesis. The lower frequencies of mutation in germline cells compared to somatic cells appears to be due to more efficient DNA repair of DNA damages, particularly homologous recombinational repair, during germline meiosis. Among humans, about five percent of live-born offspring have a genetic disorder, and of these, about 20% are due to newly arisen germline mutations.

Epigenetic alterations

5 methylcytosine methyl highlight. The image shows a cytosine single ring base and a methyl group added on to the 5 carbon. In mammals, DNA methylation occurs almost exclusively at a cytosine that is followed by a guanine.

Epigenetic alterations of DNA include modifications that affect gene expression, but are not caused by changes in the sequence of bases in DNA. A well-studied example of such an alteration is the methylation of DNA cytosine to form 5-methylcytosine. This usually occurs in the DNA sequence CpG, changing the DNA at the CpG site from CpG to 5-mCpG. Methylation of cytosines in CpG sites in promoter regions of genes can reduce or silence gene expression. About 28 million CpG dinucleotides occur in the human genome, and about 24 million CpG sites in the mouse genome (which is 86% as large as the human genome). In most tissues of mammals, on average, 70% to 80% of CpG cytosines are methylated (forming 5-mCpG).

In the mouse, by days 6.25 to 7.25 after fertilization of an egg by a sperm, cells in the embryo are set aside as primordial germ cells (PGCs). These PGCs will later give rise to germline sperm cells or egg cells. At this point the PGCs have high typical levels of methylation. Then primordial germ cells of the mouse undergo genome-wide DNA demethylation, followed by subsequent new methylation to reset the epigenome in order to form an egg or sperm.

In the mouse, PGCs undergo DNA demethylation in two phases. The first phase, starting at about embryonic day 8.5, occurs during PGC proliferation and migration, and it results in genome-wide loss of methylation, involving almost all genomic sequences. This loss of methylation occurs through passive demethylation due to repression of the major components of the methylation machinery. The second phase occurs during embryonic days 9.5 to 13.5 and causes demethylation of most remaining specific loci, including germline-specific and meiosis-specific genes. This second phase of demethylation is mediated by the TET enzymes TET1 and TET2, which carry out the first step in demethylation by converting 5-mC to 5-hydroxymethylcytosine (5-hmC) during embryonic days 9.5 to 10.5. This is likely followed by replication-dependent dilution during embryonic days 11.5 to 13.5.[26] At embryonic day 13.5, PGC genomes display the lowest level of global DNA methylation of all cells in the life cycle.

In the mouse, the great majority of differentially expressed genes in PGCs from embryonic day 9.5 to 13.5, when most genes are demethylated, are upregulated in both male and female PGCs.

Following erasure of DNA methylation marks in mouse PGCs, male and female germ cells undergo new methylation at different time points during gametogenesis. While undergoing mitotic expansion in the developing gonad, the male germline starts the re-methylation process by embryonic day 14.5. The sperm-specific methylation pattern is maintained during mitotic expansion. DNA methylation levels in primary oocytes before birth remain low, and re-methylation occurs after birth in the oocyte growth phase.

Gene knockout

From Wikipedia, the free encyclopedia

Gene knockouts (also known as gene deletion or gene inactivation) are a widely used genetic engineering technique that involves the targeted removal or inactivation of a specific gene within an organism's genome. This can be done through a variety of methods, including homologous recombination, CRISPR-Cas9, and TALENs.

One of the main advantages of gene knockouts is that they allow researchers to study the function of a specific gene in vivo, and to understand the role of the gene in normal development and physiology as well as in the pathology of diseases. By studying the phenotype of the organism with the knocked out gene, researchers can gain insights into the biological processes that the gene is involved in.

There are two main types of gene knockouts: complete and conditional. A complete gene knockout permanently inactivates the gene, while a conditional gene knockout allows for the gene to be turned off and on at specific times or in specific tissues. Conditional knockouts are particularly useful for studying developmental processes and for understanding the role of a gene in specific cell types or tissues.

Gene knockouts have been widely used in many different organisms, including bacteria, yeast, fruit flies, zebrafish, and mice. In mice, gene knockouts are commonly used to study the function of specific genes in development, physiology, and cancer research.

The use of gene knockouts in mouse models has been particularly valuable in the study of human diseases. For example, gene knockouts in mice have been used to study the role of specific genes in cancer, neurological disorders, immune disorders, and metabolic disorders.

However, gene knockouts also have some limitations. For example, the loss of a single gene may not fully mimic the effects of a genetic disorder, and the knockouts may have unintended effects on other genes or pathways. Additionally, gene knockouts are not always a good model for human disease as the mouse genome is not identical to the human genome, and mouse physiology is different from human physiology.

The KO technique is essentially the opposite of a gene knock-in. Knocking out two genes simultaneously in an organism is known as a double knockout (DKO). Similarly the terms triple knockout (TKO) and quadruple knockouts (QKO) are used to describe three or four knocked out genes, respectively. However, one needs to distinguish between heterozygous and homozygous KOs. In the former, only one of two gene copies (alleles) is knocked out, in the latter both are knocked out.

Methods

Knockouts are accomplished through a variety of techniques. Originally, naturally occurring mutations were identified and then gene loss or inactivation had to be established by DNA sequencing or other methods.

A laboratory mouse in which a gene affecting hair growth has been knocked out (left), is shown next to a normal lab mouse.

Gene knockout by mutation

Gene knockout by mutation is commonly carried out in bacteria. An early instance of the use of this technique in Escherichia coli was published in 1989 by Hamilton, et al. In this experiment, two sequential recombinations were used to delete the gene. This work established the feasibility of removing or replacing a functional gene in bacteria. That method has since been developed for other organisms, particularly research animals, like mice. Knockout mice are commonly used to study genes with human equivalents that may have significance for disease. An example of a study using knockout mice is an investigation of the roles of Xirp proteins in Sudden Unexplained Nocturnal Death Syndrome (SUNDS) and Brugada Syndrome in the Chinese Han Population.

Gene silencing

For gene knockout investigations, RNA interference (RNAi), a more recent method, also known as gene silencing, has gained popularity. In RNA interference (RNAi), messenger RNA for a particular gene is inactivated using small interfering RNA (siRNA) or short hairpin RNA (shRNA). This effectively stops the gene from being expressed. Oncogenes like Bcl-2 and p53, as well as genes linked to neurological disease, genetic disorders, and viral infections, have all been targeted for gene silencing utilizing RNA interference (RNAi).

Homologous recombination

Homologous recombination is the exchange of genes between two DNA strands that include extensive regions of base sequences that are identical to one another. In eukaryotic species, bacteria, and some viruses, homologous recombination happens spontaneously and is a useful tool in genetic engineering. Homologous recombination, which takes place during meiosis in eukaryotes, is essential for the repair of double-stranded DNA breaks and promotes genetic variation by allowing the movement of genetic information during chromosomal crossing. Homologous recombination, a key DNA repair mechanism in bacteria, enables the insertion of genetic material acquired through horizontal transfer of genes and transformation into DNA. Homologous recombination in viruses influences the course of viral evolution. Homologous recombination, a type of gene targeting used in genetic engineering, involves the introduction of an engineered mutation into a particular gene in order to learn more about the function of that gene. This method involves inserting foreign DNA into a cell that has a sequence similar to the target gene while being flanked by sequences that are the same upstream and downstream of the target gene. The target gene's DNA is substituted with the foreign DNA sequence during replication when the cell detects the similar flanking regions as homologues. The target gene is "knocked out" by the exchange. By using this technique to target particular alleles in embryonic stem cells in mice, it is possible to create knockout mice. With the aid of gene targeting, numerous mouse genes have been shut down, leading to the creation of hundreds of distinct mouse models of various human diseases, such as cancer, diabetes, cardiovascular diseases, and neurological disorders.[citation needed] Mario Capecchi, Sir Martin J. Evans, and Oliver Smithies performed groundbreaking research on homologous recombination in mouse stem cells, and they shared the 2007 Nobel Prize in Physiology or Medicine for their findings. Traditionally, homologous recombination was the main method for causing a gene knockout. This method involves creating a DNA construct containing the desired mutation. For knockout purposes, this typically involves a drug resistance marker in place of the desired knockout gene. The construct will also contain a minimum of 2kb of homology to the target sequence. The construct can be delivered to stem cells either through microinjection or electroporation. This method then relies on the cell's own repair mechanisms to recombine the DNA construct into the existing DNA. This results in the sequence of the gene being altered, and most cases the gene will be translated into a nonfunctional protein, if it is translated at all. However, this is an inefficient process, as homologous recombination accounts for only 10−2 to 10−3 of DNA integrations. Often, the drug selection marker on the construct is used to select for cells in which the recombination event has occurred.

Wild-type Physcomitrella and knockout mosses: Deviating phenotypes induced in gene-disruption library transformants. Physcomitrella wild-type and transformed plants were grown on minimal Knop medium to induce differentiation and development of gametophores. For each plant, an overview (upper row; scale bar corresponds to 1 mm) and a close-up (bottom row; scale bar equals 0.5 mm) are shown. A: Haploid wild-type moss plant completely covered with leafy gametophores and close-up of wild-type leaf. B–E: Different mutants.

These stem cells now lacking the gene could be used in vivo, for instance in mice, by inserting them into early embryos. If the resulting chimeric mouse contained the genetic change in their germline, this could then be passed on offspring.

In diploid organisms, which contain two alleles for most genes, and may as well contain several related genes that collaborate in the same role, additional rounds of transformation and selection are performed until every targeted gene is knocked out. Selective breeding may be required to produce homozygous knockout animals.

Site-specific nucleases

Frameshift mutation resulting from a single base pair deletion, causing altered amino acid sequence and premature stop codon

There are currently three methods in use that involve precisely targeting a DNA sequence in order to introduce a double-stranded break. Once this occurs, the cell's repair mechanisms will attempt to repair this double stranded break, often through non-homologous end joining (NHEJ), which involves directly ligating the two cut ends together. This may be done imperfectly, therefore sometimes causing insertions or deletions of base pairs, which cause frameshift mutations. These mutations can render the gene in which they occur nonfunctional, thus creating a knockout of that gene. This process is more efficient than homologous recombination, and therefore can be more easily used to create biallelic knockouts.

Zinc-fingers

Zinc-finger nucleases consist of DNA binding domains that can precisely target a DNA sequence. Each zinc-finger can recognize codons of a desired DNA sequence, and therefore can be modularly assembled to bind to a particular sequence. These binding domains are coupled with a restriction endonuclease that can cause a double stranded break (DSB) in the DNA. Repair processes may introduce mutations that destroy functionality of the gene.

TALENS

Transcription activator-like effector nucleases (TALENs) also contain a DNA binding domain and a nuclease that can cleave DNA. The DNA binding region consists of amino acid repeats that each recognize a single base pair of the desired targeted DNA sequence. If this cleavage is targeted to a gene coding region, and NHEJ-mediated repair introduces insertions and deletions, a frameshift mutation often results, thus disrupting function of the gene.

CRISPR/Cas9

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a genetic engineering technique that allows for precise editing of the genome. One application of CRISPR is gene knockout, which involves disabling or "knocking out" a specific gene in an organism.

The process of gene knockout with CRISPR involves three main steps: designing a guide RNA (gRNA) that targets a specific location in the genome, delivering the gRNA and a Cas9 enzyme (which acts as a molecular scissors) to the target cell, and then allowing the cell to repair the cut in the DNA. When the cell repairs the cut, it can either join the cut ends back together, resulting in a non-functional gene, or introduce a mutation that disrupts the gene's function.

This technique can be used in a variety of organisms, including bacteria, yeast, plants, and animals, and it allows scientists to study the function of specific genes by observing the effects of their absence. CRISPR-based gene knockout is a powerful tool for understanding the genetic basis of disease and for developing new therapies.

It is important to note that CRISPR-based gene knockout, like any genetic engineering technique, has the potential to produce unintended or harmful effects on the organism, so it should be used with caution. The coupled Cas9 will cause a double stranded break in the DNA. Following the same principle as zinc-fingers and TALENs, the attempts to repair these double stranded breaks often result in frameshift mutations that result in an nonfunctional gene. Non invasive CRISPR-Cas9 technology has successfully knocked out a gene associated in depression and anxiety in mice, being the first successful delivery passing through the blood–brain barrier to enable gene modification.

Knock-in

Gene knock-in is similar to gene knockout, but it replaces a gene with another instead of deleting it.

Types

Conditional knockouts

A conditional gene knockout allows gene deletion in a tissue in a tissue specific manner. This is required in place of a gene knockout if the null mutation would lead to embryonic death, or a specific tissue or cell type is of specific interest. This is done by introducing short sequences called loxP sites around the gene. These sequences will be introduced into the germ-line via the same mechanism as a knockout. This germ-line can then be crossed to another germline containing Cre-recombinase which is a viral enzyme that can recognize these sequences, recombines them and deletes the gene flanked by these sites. Other recombinases have since been created and employed in conditional knockout experiments.

Use

A knockout mouse (left) that is a model of obesity, compared with a normal mouse

Knockouts are primarily used to understand the role of a specific gene or DNA region by comparing the knockout organism to a wildtype with a similar genetic background.

Knockout organisms are also used as screening tools in the development of drugs, to target specific biological processes or deficiencies by using a specific knockout, or to understand the mechanism of action of a drug by using a library of knockout organisms spanning the entire genome, such as in Saccharomyces cerevisiae.

Clinical trial

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Clinical_...