A Medley of Potpourri

Wednesday, December 6, 2023

Evolvability

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Evolvability

Evolvability is defined as the capacity of a system for adaptive evolution. Evolvability is the ability of a population of organisms to not merely generate genetic diversity, but to generate adaptive genetic diversity, and thereby evolve through natural selection.

In order for a biological organism to evolve by natural selection, there must be a certain minimum probability that new, heritable variants are beneficial. Random mutations, unless they occur in DNA sequences with no function, are expected to be mostly detrimental. Beneficial mutations are always rare, but if they are too rare, then adaptation cannot occur. Early failed efforts to evolve computer programs by random mutation and selection showed that evolvability is not a given, but depends on the representation of the program as a data structure, because this determines how changes in the program map to changes in its behavior. Analogously, the evolvability of organisms depends on their genotype–phenotype map. This means that genomes are structured in ways that make beneficial changes more likely. This has been taken as evidence that evolution has created fitter populations of organisms that are better able to evolve.

Alternative definitions

Andreas Wagner describes two definitions of evolvability. According to the first definition, a biological system is evolvable:

if its properties show heritable genetic variation, and
if natural selection can thus change these properties.

According to the second definition, a biological system is evolvable:

if it can acquire novel functions through genetic change, functions that help the organism survive and reproduce.

For example, consider an enzyme with multiple alleles in the population. Each allele catalyzes the same reaction, but with a different level of activity. However, even after millions of years of evolution, exploring many sequences with similar function, no mutation might exist that gives this enzyme the ability to catalyze a different reaction. Thus, although the enzyme's activity is evolvable in the first sense, that does not mean that the enzyme's function is evolvable in the second sense. However, every system evolvable in the second sense must also be evolvable in the first.

Pigliucci recognizes three classes of definition, depending on timescale. The first corresponds to Wagner's first, and represents the very short timescales that are described by quantitative genetics. He divides Wagner's second definition into two categories, one representing the intermediate timescales that can be studied using population genetics, and one representing exceedingly rare long-term innovations of form.

Pigliucci's second definition of evolvability includes Altenberg's quantitative concept of evolvability, being not a single number, but the entire upper tail of the fitness distribution of the offspring produced by the population. This quantity was considered a "local" property of the instantaneous state of a population, and its integration over the population's evolutionary trajectory, and over many possible populations, would be necessary to give a more global measure of evolvability.

Generating more variation

More heritable phenotypic variation means more evolvability. While mutation is the ultimate source of heritable variation, its permutations and combinations also make a big difference. Sexual reproduction generates more variation (and thereby evolvability) relative to asexual reproduction (see evolution of sexual reproduction). Evolvability is further increased by generating more variation when an organism is stressed, and thus likely to be less well adapted, but less variation when an organism is doing well. The amount of variation generated can be adjusted in many different ways, for example via the mutation rate, via the probability of sexual vs. asexual reproduction, via the probability of outcrossing vs. inbreeding, via dispersal, and via access to previously cryptic variants through the switching of an evolutionary capacitor. A large population size increases the influx of novel mutations in each generation.

Enhancement of selection

Rather than creating more phenotypic variation, some mechanisms increase the intensity and effectiveness with which selection acts on existing phenotypic variation. For example:

Mating rituals that allow sexual selection on "good genes", and so intensify natural selection.
Large effective population size increasing the threshold value of the selection coefficient above which selection becomes an important player. This could happen through an increase in the census population size, decreasing genetic drift, through an increase in the recombination rate, decreasing genetic draft, or through changes in the probability distribution of the numbers of offspring.
Recombination decreasing the importance of the Hill-Robertson effect, where different genotypes contain different adaptive mutations. Recombination brings the two alleles together, creating a super-genotype in place of two competing lineages.
Shorter generation time.

Robustness and evolvability

The relationship between robustness and evolvability depends on whether recombination can be ignored. Recombination can generally be ignored in asexual populations and for traits affected by single genes.

Without recombination

Robustness in the face of mutation does not increase evolvability in the first sense. In organisms with a high level of robustness, mutations have smaller phenotypic effects than in organisms with a low level of robustness. Thus, robustness reduces the amount of heritable genetic variation on which selection can act. However, robustness may allow exploration of large regions of genotype space, increasing evolvability according to the second sense. Even without genetic diversity, some genotypes have higher evolvability than others, and selection for robustness can increase the "neighborhood richness" of phenotypes that can be accessed from the same starting genotype by mutation. For example, one reason many proteins are less robust to mutation is that they have marginal thermodynamic stability, and most mutations reduce this stability further. Proteins that are more thermostable can tolerate a wider range of mutations and are more evolvable. For polygenic traits, neighborhood richness contributes more to evolvability than does genetic diversity or "spread" across genotype space.

With recombination

Temporary robustness, or canalisation, may lead to the accumulation of significant quantities of cryptic genetic variation. In a new environment or genetic background, this variation may be revealed and sometimes be adaptive.

Factors affecting evolvability via robustness

Different genetic codes have the potential to change robustness and evolvability by changing the effect of single-base mutational changes.

Exploration ahead of time

When mutational robustness exists, many mutants will persist in a cryptic state. Mutations tend to fall into two categories, having either a very bad effect or very little effect: few mutations fall somewhere in between. Sometimes, these mutations will not be completely invisible, but still have rare effects, with very low penetrance. When this happens, natural selection weeds out the very bad mutations, while leaving the others relatively unaffected. While evolution has no "foresight" to know which environment will be encountered in the future, some mutations cause major disruption to a basic biological process, and will never be adaptive in any environment. Screening these out in advance leads to preadapted stocks of cryptic genetic variation.

Another way that phenotypes can be explored, prior to strong genetic commitment, is through learning. An organism that learns gets to "sample" several different phenotypes during its early development, and later sticks to whatever worked best. Later in evolution, the optimal phenotype can be genetically assimilated so it becomes the default behavior rather than a rare behavior. This is known as the Baldwin effect, and it can increase evolvability.

Learning biases phenotypes in a beneficial direction. But an exploratory flattening of the fitness landscape can also increase evolvability even when it has no direction, for example when the flattening is a result of random errors in molecular and/or developmental processes. This increase in evolvability can happen when evolution is faced with crossing a "valley" in an adaptive landscape. This means that two mutations exist that are deleterious by themselves, but beneficial in combination. These combinations can evolve more easily when the landscape is first flattened, and the discovered phenotype is then fixed by genetic assimilation.

Modularity

If every mutation affected every trait, then a mutation that was an improvement for one trait would be a disadvantage for other traits. This means that almost no mutations would be beneficial overall. But if pleiotropy is restricted to within functional modules, then mutations affect only one trait at a time, and adaptation is much less constrained. In a modular gene network, for example, a gene that induces a limited set of other genes that control a specific trait under selection may evolve more readily than one that also induces other gene pathways controlling traits not under selection. Individual genes also exhibit modularity. A mutation in one cis-regulatory element of a gene's promoter region may allow the expression of the gene to be altered only in specific tissues, developmental stages, or environmental conditions rather than changing gene activity in the entire organism simultaneously.

Evolution of evolvability

While variation yielding high evolvability could be useful in the long term, in the short term most of that variation is likely to be a disadvantage. For example, naively it would seem that increasing the mutation rate via a mutator allele would increase evolvability. But as an extreme example, if the mutation rate is too high then all individuals will be dead or at least carry a heavy mutation load. Short-term selection for low variation most of the time is usually thought likely to be more powerful than long-term selection for evolvability, making it difficult for natural selection to cause the evolution of evolvability. Other forces of selection also affect the generation of variation; for example, mutation and recombination may in part be byproducts of mechanisms to cope with DNA damage.

When recombination is low, mutator alleles may still sometimes hitchhike on the success of adaptive mutations that they cause. In this case, selection can take place at the level of the lineage. This may explain why mutators are often seen during experimental evolution of microbes. Mutator alleles can also evolve more easily when they only increase mutation rates in nearby DNA sequences, not across the whole genome: this is known as a contingency locus.

The evolution of evolvability is less controversial if it occurs via the evolution of sexual reproduction, or via the tendency of variation-generating mechanisms to become more active when an organism is stressed. The yeast prion [PSI+] may also be an example of the evolution of evolvability through evolutionary capacitance. An evolutionary capacitor is a switch that turns genetic variation on and off. This is very much like bet-hedging the risk that a future environment will be similar or different. Theoretical models also predict the evolution of evolvability via modularity. When the costs of evolvability are sufficiently short-lived, more evolvable lineages may be the most successful in the long-term. However, the hypothesis that evolvability is an adaptation is often rejected in favor of alternative hypotheses, e.g. minimization of costs.

Applications

Evolvability phenomena have practical applications. For protein engineering we wish to increase evolvability, and in medicine and agriculture we wish to decrease it. Protein evolvability is defined as the ability of the protein to acquire sequence diversity and conformational flexibility which can enable it to evolve toward a new function.

In protein engineering, both rational design and directed evolution approaches aim to create changes rapidly through mutations with large effects. Such mutations, however, commonly destroy enzyme function or at least reduce tolerance to further mutations. Identifying evolvable proteins and manipulating their evolvability is becoming increasingly necessary in order to achieve ever larger functional modification of enzymes. Proteins are also often studied as part of the basic science of evolvability, because the biophysical properties and chemical functions can be easily changed by a few mutations. More evolvable proteins can tolerate a broader range of amino acid changes and allow them to evolve toward new functions. The study of evolvability has fundamental importance for understanding very long term evolution of protein superfamilies.

Many human diseases are capable of evolution. Viruses, bacteria, fungi and cancers evolve to be resistant to host immune defences, as well as pharmaceutical drugs. These same problems occur in agriculture with pesticide and herbicide resistance. It is possible that we are facing the end of the effective life of most of available antibiotics. Predicting the evolution and evolvability of our pathogens, and devising strategies to slow or circumvent the development of resistance, demands deeper knowledge of the complex forces driving evolution at the molecular level.

A better understanding of evolvability is proposed to be part of an Extended Evolutionary Synthesis.

Fusion gene

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Fusion_gene

A fusion gene is a hybrid gene formed from two previously independent genes. It can occur as a result of translocation, interstitial deletion, or chromosomal inversion. Fusion genes have been found to be prevalent in all main types of human neoplasia. The identification of these fusion genes play a prominent role in being a diagnostic and prognostic marker.

History

The first fusion gene was described in cancer cells in the early 1980s. The finding was based on the discovery in 1960 by Peter Nowell and David Hungerford in Philadelphia of a small abnormal marker chromosome in patients with chronic myeloid leukemia—the first consistent chromosome abnormality detected in a human malignancy, later designated the Philadelphia chromosome. In 1973, Janet Rowley in Chicago showed that the Philadelphia chromosome had originated through a translocation between chromosomes 9 and 22, and not through a simple deletion of chromosome 22 as was previously thought. Several investigators in the early 1980s showed that the Philadelphia chromosome translocation led to the formation of a new BCR::ABL1 fusion gene, composed of the 3' part of the ABL1 gene in the breakpoint on chromosome 9 and the 5' part of a gene called BCR in the breakpoint in chromosome 22. In 1985 it was clearly established that the fusion gene on chromosome 22 produced an abnormal chimeric BCR::ABL1 protein with the capacity to induce chronic myeloid leukemia.

Oncogenes

It has been known for 30 years that the corresponding gene fusion plays an important role in tumorigenesis. Fusion genes can contribute to tumor formation because fusion genes can produce much more active abnormal protein than non-fusion genes. Often, fusion genes are oncogenes that cause cancer; these include BCR-ABL, TEL-AML1 (ALL with t(12 ; 21)), AML1-ETO (M2 AML with t(8 ; 21)), and TMPRSS2-ERG with an interstitial deletion on chromosome 21, often occurring in prostate cancer. In the case of TMPRSS2-ERG, by disrupting androgen receptor (AR) signaling and inhibiting AR expression by oncogenic ETS transcription factor, the fusion product regulates the prostate cancer. Most fusion genes are found from hematological cancers, sarcomas, and prostate cancer. BCAM-AKT2 is a fusion gene that is specific and unique to high-grade serous ovarian cancer.

Oncogenic fusion genes may lead to a gene product with a new or different function from the two fusion partners. Alternatively, a proto-oncogene is fused to a strong promoter, and thereby the oncogenic function is set to function by an upregulation caused by the strong promoter of the upstream fusion partner. The latter is common in lymphomas, where oncogenes are juxtaposed to the promoters of the immunoglobulin genes. Oncogenic fusion transcripts may also be caused by trans-splicing or read-through events.

Since chromosomal translocations play such a significant role in neoplasia, a specialized database of chromosomal aberrations and gene fusions in cancer has been created. This database is called Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer.

Diagnostics

Presence of certain chromosomal aberrations and their resulting fusion genes is commonly used within cancer diagnostics in order to set a precise diagnosis. Chromosome banding analysis, fluorescence in situ hybridization (FISH), and reverse transcription polymerase chain reaction (RT-PCR) are common methods employed at diagnostic laboratories. These methods all have their distinct shortcomings due to the very complex nature of cancer genomes. Recent developments such as high-throughput sequencing and custom DNA microarrays bear promise of introduction of more efficient methods.

Evolution

Gene fusion plays a key role in the evolution of gene architecture. We can observe its effect if gene fusion occurs in coding sequences. Duplication, sequence divergence, and recombination are the major contributors at work in gene evolution. These events can probably produce new genes from already existing parts. When gene fusion happens in non-coding sequence region, it can lead to the misregulation of the expression of a gene now under the control of the cis-regulatory sequence of another gene. If it happens in coding sequences, gene fusion cause the assembly of a new gene, then it allows the appearance of new functions by adding peptide modules into a multi-domain protein. The detecting methods to inventory gene fusion events on a large biological scale can provide insights about the multi modular architecture of proteins.

Purine biosynthesis

The purines adenine and guanine are two of the four information encoding bases of the universal genetic code. Biosynthesis of these purines occurs by similar, but not identical, pathways in different species of the three domains of life, the Archaea, Bacteria and Eukaryotes. A major distinctive feature of the purine biosynthetic pathways in Bacteria is the prevalence of gene fusions where two or more purine biosynthetic enzymes are encoded by a single gene. Such gene fusions are almost exclusively between genes that encode enzymes that perform sequential steps in the biosynthetic pathway. Eukaryotic species generally exhibit the most common gene fusions seen in the Bacteria, but in addition have new fusions that potentially increase metabolic flux.

Detection

In recent years, next generation sequencing technology has already become available to screen known and novel gene fusion events on a genome wide scale. However, the precondition for large scale detection is a paired-end sequencing of the cell's transcriptome. The direction of fusion gene detection is mainly towards data analysis and visualization. Some researchers already developed a new tool called Transcriptome Viewer (TViewer) to directly visualize detected gene fusions on the transcript level.

Research applications

Biologists may also deliberately create fusion genes for research purposes. The fusion of reporter genes to the regulatory elements of genes of interest allows researches to study gene expression. Reporter gene fusions can be used to measure activity levels of gene regulators, identify the regulatory sites of genes (including the signals required), identify various genes that are regulated in response to the same stimulus, and artificially control the expression of desired genes in particular cells. For example, by creating a fusion gene of a protein of interest and green fluorescent protein, the protein of interest may be observed in cells or tissue using fluorescence microscopy. The protein synthesized when a fusion gene is expressed is called a fusion protein.

Mobile genetic elements

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Mobile_genetic_elements

Mobile genetic elements (MGEs) sometimes called selfish genetic elements are a type of genetic material that can move around within a genome, or that can be transferred from one species or replicon to another. MGEs are found in all organisms. In humans, approximately 50% of the genome is thought to be MGEs. MGEs play a distinct role in evolution. Gene duplication events can also happen through the mechanism of MGEs. MGEs can also cause mutations in protein coding regions, which alters the protein functions. These mechanisms can also rearrange genes in the host genome generating variation. These mechanism can increase fitness by gaining new or additional functions. An example of MGEs in evolutionary context are that virulence factors and antibiotic resistance genes of MGEs can be transported to share genetic code with neighboring bacteria. However, MGEs can also decrease fitness by introducing disease-causing alleles or mutations. The set of MGEs in an organism is called a mobilome, which is composed of a large number of plasmids, transposons and viruses.

Types

Plasmids: These are generally circular extrachromosomal DNA molecules that replicate and are transmitted independent from chromosomal DNA. These molecules are present in prokaryotes (bacteria and archaea) and sometimes in eukaryotic organisms such as yeast. Fitness of a plasmid is determined by its mobility. The first factor of plasmid fitness is its ability to replicate DNA. The second fitness factor is a plasmid's ability to horizontally transfer. Plasmids during their cycle carry genes from one organism to another through a process called conjugation. Plasmids usually contain a set of mobility genes that are necessary for conjugation. Some plasmids employ membrane associated mating pair formation (MPF). A plasmid containing its own MPF genes is considered to be self transmissible or conjugative. Plasmids can be further divided into mobilizable and non-mobilizable classes. Plasmids that use other genetic element MFPs in the cell are mobilizable. Plasmids that are not mobilizable but spread by transduction or transformation are termed non-mobilizable. Plasmids can often inject genes that make bacteria resistant to antibiotics.

Cloning vectors: These are types of hybrid plasmids with bacteriophages, used to transfer and replicate DNA . Fragments of DNA can be inserted by recombinant DNA techniques. A viable vector must be able to replicate together with the DNA fragments it carries. These vectors can contain desired genes for insertion into an organism's genome. Examples are cosmids and phagemids.

Transposons: These are DNA sequences that can move and replicate in different parts of a cell's genome. Also called "jumping genes", they can be transferred horizontally between organisms that live in symbiosis. Transposons are present in all living things and in giant viruses.

DNA transposons: These are transposons that move directly from one position to another in the genome using a transposase to cut and stick at another locus. These genetic elements are cleaved at four single stranded sites in DNA by transposase. In order to achieve max stability of the intermediate transposon, one single strand cleavage at the target DNA occurs. Simultaneously the donor strand is ligated to the target strand after cleavage leaving a single strand overhang on either end of the target sequence. These sites usually contain a 5 to 9 base pair overhang that can create a cohesive end. Transposase then holds the sequence in a crossed formation and ligates the donor strand to the target strand. The structure formed by the duplex of DNA and transposase in replicative transposons is known as the Shapiro Intermediate. The 5 to 9 base pair overhang is left on either side of the target sequence allowing it to join to its target sequence in either orientation. The sequence of these overhangs can determine joining orientation. Before site specific recombination can occur, the oligonucleotide ends must be filled. The ligation of these ends generates a replication fork at each end of the transposable element. The single strand displacement causes synthesis from the un-ligated 3' hydroxyl group to form long single stranded sections adjacent to the 5' end. Therefore, the opposite strand is sequenced discontinuously as both replication forks approach the center of the transposable element. This results in two recombinant duplexes containing the semi conserved transposable element flanked by the previous 5 to 9 base pair overhang. Site specific reciprocal recombination takes place between the two transposable elements facilitated by proteins. This reciprocal replication overlaps in time and occurs between duplicated segments of the replication element before replication is completed. The target molecule as a result contains the inserted element flanked by the 5 to 9 base pair sequences. Transposition of these elements duplicates the transposition element leaving a transposition element in its original location and a new transposon at the reciprocal replication site. In doing so, organisms total base pairs in their genomes are increased. Transposition occurrences increase over time and as organisms age.

Retrotransposon mechanism that uses reverse transcriptase to change mRNA transposon back into DNA for integration.

Retrotransposons: These are transposons that move in the genome, being transcribed into RNA and later into DNA by reverse transcriptase. Many retrotransposons also exhibit replicative transposition. Retrotransposons are present exclusively in eukaryotes. Retrotransposons consist of two major types, long terminal repeats (LTRs) and Non-LTR transposons. Non-LTR transposons can be further classified into Long interspersed nuclear element (LINEs) and Short interspersed nuclear element (SINEs). These retrotransposons are regulated by a family of short non-coding RNAs termed as PIWI [P-element induced wimpy testis]-interacting RNAs (piRNAs). piRNA is a recently discovered class of ncRNAs, which are in the length range of ~24-32 nucleotides. Initially, piRNAs were described as repeat-associated siRNAs (rasiRNAs) because of their origin from the repetitive elements such as transposable sequences of the genome. However, later it was identified that they acted via PIWI-protein. In addition to having a role in the suppression of genomic transposons, various roles of piRNAs have been recently reported like regulation of 3’ UTR of protein-coding genes via RNAi, transgenerational epigenetic inheritance to convey a memory of past transposon activity, and RNA-induced epigenetic silencing.
Integrons: These are gene cassettes that usually carry antibiotic resistance genes to bacterial plasmids and transposons.
Introns: Group I and II introns are nucleotide sequences with catalytic activity that are part of host transcripts and act as ribozymes that can invade genes that encode tRNA, rRNA, and proteins. They are present in all cellular organisms and viruses.
Introners: Sequences similar to transposons that can jump in the genome leaving new introns where they were, they have been pointed as a possible mechanism of intron gain in the evolution of eukaryotes where they are present in at least 5% of all species, specially in the aquatic taxa due possibly to horizontal gene transfer that occurs more frequently in these animals. They were first described in 2009 in the unicellular green algae micromonas.
Viral agents: These are mostly infective acellular agents that replicate in cellular hosts. During their infective cycle they can carry genes from one host to another. They can also carry genes from one organism to another in case that viral agent infects more than two different species. Traditionally they are considered separate entities, but the truth is that many researchers who study their characteristics and evolution refer to them as mobile genetic elements. This is based on the fact that viral agents are simple particles or molecules that replicate and are transferred between various hosts like the remaining non-viral mobile genetic elements. According to this point of view, viruses and other viral agents should not be considered living beings and should be better conceived as mobile genetic elements. Viral agents are evolutionarily connected with various mobile genetic elements. These viral agents are thought to have arisen from secreted or ejected plasmids of other organisms. Transposons also provide insight into how these elements may have originally started. This theory is known as the vagrancy hypothesis proposed by Barbara McClintock in 1950.
- Viruses: These are viral agents composed of a molecule of genetic material (DNA or RNA) and with the ability to form complex particles called virions to be able to move easily between their hosts. Viruses are present in all living things. Viral particles are manufactured by the host's replicative machinery for horizontal transfer.
- Satellite nucleic acids: These are DNA or RNA molecules, which are encapsulated as a stowaway in the virions of certain helper viruses and which depend on these to be able to replicate. Although they are sometimes considered genetic elements of their helper viruses, they are not always found within their helper viruses.
- Viroids: These are viral agents that consist of small circular RNA molecules that infect and replicate in plants. These mobile genetic elements do not have a protective protein coating. Specifically, these mobile genetic elements are found in angiosperms.
- Endogenous viral element: These are viral nucleic acids integrated into the genome of a cell. They can move and replicate multiple times in the host cell without causing disease or mutation. They are considered autonomous forms of transposons. Examples are proviruses and endogenous retroviruses.

Research examples

CRISPR-Cas systems in bacteria and archaea are adaptive immune systems to protect against deadly consequences from MGEs. Using comparative genomic and phylogenetic analysis, researchers found that CRISPR-Cas variants are associated with distinct types of MGEs such as transposable elements. In CRISPR-associated transposons, CRISPR-Cas controls transposable elements for their propagation.

MGEs such as plasmids by a horizontal transmission are generally beneficial to an organism. The ability of transferring plasmids (sharing) is important in an evolutionary perspective. Tazzyman and Bonhoeffer found that fixation (receiving) of the transferred plasmids in a new organism is just as important as the ability to transfer them. Beneficial rare and transferable plasmids have a higher fixation probability, whereas deleterious transferable genetic elements have a lower fixation probability because they are lethal to the host organisms.

One type of MGEs, namely the Integrative Conjugative Elements (ICEs) are central to horizontal gene transfer shaping the genomes of prokaryotes enabling rapid acquisition of novel adaptive traits.

As a representative example of ICEs, the ICEBs1 is well-characterized for its role in the global DNA damage SOS response of Bacillus subtilis and also its potential link to the radiation and desiccation resistance of Bacillus pumilus SAFR-032 spores, isolated from spacecraft cleanroom facilities.

Transposition by transposable elements is mutagenic. Thus, organisms have evolved to repress the transposition events, and failure to repress the events causes cancers in somatic cells. Cecco et al. found that during early age transcription of retrotransposable elements are minimal in mice, but in advanced age the transcription level increases. This age-dependent expression level of transposable elements is reduced by calorie restriction diet. Replication of transposable elements often results in repeated sequences being added into the genome. These sequences are often non coding but can interfere with coding sequences of DNA. Though mutagenetic by nature, transposons increase the genome of an organism that they transpose into. More research should be conducted into how these elements may serve as a rapid adaptation tool employed by organisms to generate variability. Many transposition elements are dormant or require activation. should also be noted that current values for coding sequences of DNA would be higher if transposition elements that code for their own transposition machinery were considered as coding sequences.

Some others researched examples include Mavericks, Starships and Space invaders (or SPINs)

Diseases

The consequence of mobile genetic elements can alter the transcriptional patterns, which frequently leads to genetic disorders such as immune disorders, breast cancer, multiple sclerosis, and amyotrophic lateral sclerosis. In humans, stress can lead to transactional activation of MGEs such as endogenous retroviruses, and this activation has been linked to neurodegeneration.

Other notes

The total of all mobile genetic elements in a genome may be referred to as the mobilome.

Barbara McClintock was awarded the 1983 Nobel Prize in Physiology or Medicine "for her discovery of mobile genetic elements" (transposable elements).

Mobile genetic elements play a critical role in the spread of virulence factors, such as exotoxins and exoenzymes, among bacteria. Strategies to combat certain bacterial infections by targeting these specific virulence factors and mobile genetic elements have been proposed.

Batch file

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Batch_file


Filename extensions	`.bat`, `.cmd`, `.btm`
Internet media type	`application/bat` `application/x-bat` `application/x-msdos-program` `text/plain`
Type of format	Scripting
Container for	Scripts

A batch file is a script file in DOS, OS/2 and Microsoft Windows. It consists of a series of commands to be executed by the command-line interpreter, stored in a plain text file. A batch file may contain any command the interpreter accepts interactively and use constructs that enable conditional branching and looping within the batch file, such as IF, FOR, and GOTO labels. The term "batch" is from batch processing, meaning "non-interactive execution", though a batch file might not process a batch of multiple data.

Similar to Job Control Language (JCL), DCL and other systems on mainframe and minicomputer systems, batch files were added to ease the work required for certain regular tasks by allowing the user to set up a script to automate them. When a batch file is run, the shell program (usually COMMAND.COM or cmd.exe) reads the file and executes its commands, normally line-by-line. Unix-like operating systems, such as Linux, have a similar, but more flexible, type of file called a shell script.

The filename extension .bat is used in DOS and Windows. Windows NT and OS/2 also added .cmd. Batch files for other environments may have different extensions, e.g., .btm in 4DOS, 4OS2 and 4NT related shells.

The detailed handling of batch files has changed significantly between versions. Some of the detail in this article applies to all batch files, while other details apply only to certain versions.

Variants

DOS

In MS-DOS, a batch file can be started from the command-line interface by typing its name, followed by any required parameters and pressing the ↵ Enter key. When DOS loads, the file AUTOEXEC.BAT, when present, is automatically executed, so any commands that need to be run to set up the DOS environment may be placed in this file. Computer users would have the AUTOEXEC.BAT file set up the system date and time, initialize the DOS environment, load any resident programs or device drivers, or initialize network connections and assignments.

A .bat file name extension identifies a file containing commands that are executed by the command interpreter COMMAND.COM line by line, as if it were a list of commands entered manually, with some extra batch-file-specific commands for basic programming functionality, including a GOTO command for changing flow of line execution.

Early Windows

Microsoft Windows was introduced in 1985 as a graphical user interface-based (GUI) overlay on text-based operating systems and was designed to run on DOS. In order to start it, the WIN command was used, which could be added to the end of the AUTOEXEC.BAT file to allow automatic loading of Windows. In the earlier versions, one could run a .bat type file from Windows in the MS-DOS Prompt. Windows 3.1x and earlier, as well as Windows 9x invoked COMMAND.COM to run batch files.

OS/2

The IBM OS/2 operating system supported DOS-style batch files. It also included a version of REXX, a more advanced batch-file scripting language. IBM and Microsoft started developing this system, but during the construction of it broke up after a dispute; as a result of this, IBM referred to their DOS-like console shell without mention of Microsoft, naming it just DOS, although this seemingly made no difference with regard to the way batch files worked from COMMAND.COM.

OS/2's batch file interpreter also supports an EXTPROC command. This passes the batch file to the program named on the EXTPROC file as a data file. The named program can be a script file; this is similar to the #! mechanism used by Unix-like operating systems.

Windows NT

Unlike Windows 98 and earlier, the Windows NT family of operating systems does not depend on MS-DOS. Windows NT introduced an enhanced 32-bit command interpreter (cmd.exe) that could execute scripts with either the .CMD or .BAT extension. Cmd.exe added additional commands, and implemented existing ones in a slightly different way, so that the same batch file (with different extension) might work differently with cmd.exe and COMMAND.COM. In most cases, operation is identical if the few unsupported commands are not used. Cmd.exe's extensions to COMMAND.COM can be disabled for compatibility.

Microsoft released a version of cmd.exe for Windows 9x and ME called WIN95CMD to allow users of older versions of Windows to use certain cmd.exe-style batch files.

As of Windows 8, cmd.exe is the normal command interpreter for batch files; the older COMMAND.COM can be run as well in 32-bit versions of Windows able to run 16-bit programs.

Filename extensions

.bat: The first filename extension used by Microsoft for batch files. This extension runs with DOS and all versions of Windows, under COMMAND.COM or cmd.exe, despite the different ways the two command interpreters execute batch files.
.cmd: Used for batch files in Windows NT family and sent to cmd.exe for interpretation. COMMAND.COM does not recognize this file name extension, so cmd.exe scripts are not executed in the wrong Windows environment by mistake. In addition, append, dpath, ftype, set, path, assoc and prompt commands, when executed from a .bat file, alter the value of the errorlevel variable only upon an error, whereas from within a .cmd file, they would affect errorlevel even when returning without an error. It is also used by IBM's OS/2 for batch files.
.btm: The extension used by 4DOS, 4OS2, 4NT and Take Command. These scripts are faster, especially with longer ones, as the script is loaded entirely ready for execution, rather than line-by-line.

Batch file parameters

COMMAND.COM and cmd.exe support special variables (%0, %1 through %9) in order to refer to the path and name of the batch job and the first nine calling parameters from within the batch job, see also SHIFT. Non-existent parameters are replaced by a zero-length string. They can be used similar to environment variables, but are not stored in the environment. Microsoft and IBM refer to these variables as replacement parameters or replaceable parameters, whereas Digital Research, Novell and Caldera established the term replacement variables for them. JP Software calls them batch file parameters.

Examples

This example batch file displays Hello World!, prompts and waits for the user to press a key, and then terminates. (Note: It does not matter if commands are lowercase or uppercase unless working with variables)

@ECHO OFF
ECHO Hello World!
PAUSE

To execute the file, it must be saved with the filename extension suffix .bat (or .cmd for Windows NT-type operating systems) in plain text format, typically created by using a text editor such as Microsoft Notepad or a word processor working in plain text mode.

When executed, the following is displayed:

Hello World!
Press any key to continue . . .

Explanation

The interpreter executes each line in turn, starting with the first. The @ symbol at the start of any line prevents the prompt from displaying that command as it is executed. The command ECHO OFF turns off the prompt permanently, or until it is turned on again. The combined @ECHO OFF is often as here the first line of a batch file, preventing any commands from displaying, itself included. Then the next line is executed and the ECHO Hello World! command outputs Hello World!. The next line is executed and the PAUSE command displays Press any key to continue . . . and pauses the script's execution. After a key is pressed, the script terminates, as there are no more commands. In Windows, if the script is executed from an already running command prompt window, the window remains open at the prompt as in MS-DOS; otherwise, the window closes on termination.

Limitations and exceptions

Null values in variables

Variable expansions are substituted textually into the command, and thus variables which contain nothing simply disappear from the syntax, and variables which contain spaces turn into multiple tokens. This can lead to syntax errors or bugs.

For example, if %foo% is empty, this statement:

IF %foo%==bar ECHO Equal

parses as the erroneous construct:

IF ==bar ECHO Equal

Similarly, if %foo% contains abc def, then a different syntax error results:

IF abc def==bar ECHO Equal

The usual way to prevent this problem is to surround variable expansions in quotes so that an empty variable expands into the valid expression IF ""=="bar" instead of the invalid IF ==bar. The text that is being compared to the variable must also be enclosed in quotes, because the quotes are not special delimiting syntax; these characters represent themselves.

IF "%foo%"=="bar" ECHO Equal

The delayed !VARIABLE! expansion available in Windows 2000 and later may be used to avoid these syntactical errors. In this case, null or multi-word variables do not fail syntactically because the value is expanded after the IF command is parsed:

IF !foo!==bar ECHO Equal

Another difference in Windows 2000 or higher is that an empty variable (undefined) is not substituted. As described in previous examples, previous batch interpreter behaviour would have resulted in an empty string. Example:

C:\>set MyVar=
C:\>echo %MyVar%
%MyVar%

C:\>if "%MyVar%"=="" (echo MyVar is not defined) else (echo MyVar is %MyVar%)
MyVar is %MyVar%

Batch interpreters prior to Windows 2000 would have displayed result MyVar is not defined.

Quotation marks and spaces in passed strings

Unlike Unix/POSIX processes, which receive their command-line arguments already split up by the shell into an array of strings, a Windows process receives the entire command-line as a single string, via the GetCommandLine API function. As a result, each Windows application can implement its own parser to split the entire command line into arguments. Many applications and command-line tools have evolved their own syntax for doing that, and so there is no single convention for quoting or escaping metacharacters on Windows command lines.

For some commands, spaces are treated as delimiters that separate arguments, unless those spaces are enclosed by quotation marks. Various conventions exist of how quotation marks can be passed on to the application:
- A widely used convention is implemented by the command-line parser built into the Microsoft Visual C++ runtime library in the CommandLineToArgvW function. It uses the convention that 2n backslashes followed by a quotation mark (") produce n backslashes followed by a begin/end quote, whereas (2n)+1 backslashes followed by a quotation mark again produce n backslashes followed by a quotation mark literal. The same convention is part of the .NET Framework specification.
  - An undocumented aspect is that "" occurring in the middle of a quoted string produces a single quotation mark. (A CRT change in 2008 [msvcr90] modified this undocumented handling of quotes.) This is helpful for inserting a quotation mark in an argument without re-enabling interpretation of cmd metacharacters like |, & and >. (cmd does not recognize the usual \" as escaping the quote. It re-enables these special meanings on seeing the quote, thinking the quotation has ended.)
- Another convention is that a single quotation mark (") is not included as part of the string. However, an escaped quotation mark (""") can be part of the string.
- Yet another common convention comes from the use of Cygwin-derived ported programs. It does not differentiate between backslashes occurring before or not before quotes. See glob (programming) § Windows and DOS for information on these alternative command-line parsers.
- Some important Windows commands, like cmd.exe and wscript.exe, use their own rules.
For other commands, spaces are not treated as delimiters and therefore do not need quotation marks. If quotes are included they become part of the string. This applies to some built-in commands like echo.

Where a string contains quotation marks, and is to be inserted into another line of text that must also be enclosed in quotation marks, particular attention to the quoting mechanism is required:

C:\>set foo="this string is enclosed in quotation marks"

C:\>echo "test 1 %foo%"
"test 1 "this string is enclosed in quotation marks""

C:\>eventcreate /T Warning /ID 1 /L System /SO "Source" /D "Example: %foo%"
ERROR: Invalid Argument/Option - 'string'.
Type "EVENTCREATE /?" for usage.

On Windows 2000 and later, the solution is to replace each occurrence of a quote character within a value by a series of three quote characters:

C:\>set foo="this string is enclosed in quotes"

C:\>set foo=%foo:"="""%

C:\>echo "test 1 %foo%"
"test 1 """this string is enclosed in quotes""""

C:\>eventcreate /T Warning /ID 1 /L System /SO "Source" /D "Example: %foo%"
SUCCESS: A 'Warning' type event is created in the 'Source' log/source.

Escaped characters in strings

Some characters, such as pipe (|) characters, have special meaning to the command line. They cannot be printed as text using the ECHO command unless escaped using the caret ^ symbol:

C:\>echo foo | bar
'bar' is not recognized as an internal or external command,
operable program or batch file.

C:\>echo foo ^| bar
foo | bar

However, escaping does not work as expected when inserting the escaped character into an environment variable. The variable ends up containing a live pipe command when merely echoed. It is necessary to escape both the caret itself and the escaped character for the character display as text in the variable:

C:\>set foo=bar | baz
'baz' is not recognized as an internal or external command,
operable program or batch file.

C:\>set foo=bar ^| baz
C:\>echo %foo%
'baz' is not recognized as an internal or external command,
operable program or batch file.

C:\>set foo=bar ^^^| baz
C:\>echo %foo%
bar | baz

The delayed expansion available with or with in Windows 2000 and later may be used to show special characters stored in environment variables because the variable value is expanded after the command was parsed:

C:\>cmd /V:ON
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.

C:\>set foo=bar ^| baz
C:\>echo !foo!
bar | baz

Sleep or scripted delay

Until the TIMEOUT command was introduced with Windows Vista, there was no easy way to implement a timed pause, as the PAUSE command halts script activity indefinitely until any key is pressed.

Many workarounds were possible, but generally only worked in some environments: The CHOICE command was not available in older DOS versions, PING was only available if TCP/IP was installed, and so on. No solution was available from Microsoft, but a number of small utility programs, could be installed from other sources. A commercial example would be the 1988 Norton Utilities Batch Enhancer (BE) command, where BE DELAY 18 would wait for 1 second, or the free 94-byte WAIT.COM where WAIT 5 would wait for 5 seconds, then return control to the script. Most such programs are 16-bit .COM files, so are incompatible with 64-bit Windows.

Text output with stripped CR/LF

Normally, all printed text automatically has the control characters for carriage return (CR) and line feed (LF) appended to the end of each line.

batchtest.bat

@echo foo
@echo bar

C:\>batchtest.bat
foo
bar

It does not matter if the two echo commands share the same command line; the CR/LF codes are inserted to break the output onto separate lines:

C:\>@echo Message 1&@echo Message 2
Message 1
Message 2

A trick discovered with Windows 2000 and later is to use the special prompt for input to output text without CR/LF trailing the text. In this example, the CR/LF does not follow Message 1, but does follow Line 2 and Line 3:

batchtest2.bat

@echo off
set /p ="Message 1"<nul
echo Message 2
echo Message 3

C:\>batchtest2.bat
Message 1Message 2
Message 3

This can be used to output data to a text file without CR/LF appended to the end:

C:\>set /p ="Message 1"<nul >data.txt
C:\>set /p ="Message 2"<nul >>data.txt
C:\>set /p ="Message 3"<nul >>data.txt
C:\>type data.txt
Message 1Message 2Message 3

However, there is no way to inject this stripped CR/LF prompt output directly into an environment variable.

Setting a Uniform Naming Convention (UNC) working directory from a shortcut

It is not possible to have a command prompt that uses a UNC path as the current working directory; e.g. \\server\share\directory\

The command prompt requires the use of drive letters to assign a working directory, which makes running complex batch files stored on a server UNC share more difficult. While a batch file can be run from a UNC file path, the working directory default is C:\Windows\System32\.

In Windows 2000 and later, a workaround is to use the PUSHD and POPD command with command extensions.

If not enabled by default, command extensions can be temporarily enabled using the /E:ON switch for the command interpreter.

So to run a batch file on a UNC share, assign a temporary drive letter to the UNC share, and use the UNC share as the working directory of the batch file, a Windows shortcut can be constructed that looks like this:

Target:

The working directory attribute of this shortcut is ignored.

This also solves a problem related to User Account Control (UAC) on Windows Vista and newer. When an administrator is logged on and UAC is enabled, and they try to run a batch file as administrator from a network drive letter, using the right-click file context menu, the operation will unexpectedly fail. This is because the elevated UAC privileged account context does not have network drive letter assignments, and it is not possible to assign drive letters for the elevated context via the Explorer shell or logon scripts. However, by creating a shortcut to the batch file using the above PUSHD / POPD construct, and using the shortcut to run the batch file as administrator, the temporary drive letter will be created and removed in the elevated account context, and the batch file will function correctly.

The following syntax does correctly expand to the path of the current batch script.

%~dp0

UNC default paths are turned off by default as they used to crash older programs.

The Dword registry value DisableUNCCheck at HKEY_CURRENT_USER\Software\Microsoft\Command Processor allows the default directory to be UNC. CD command will refuse to change but placing a UNC path in Default Directory in a shortcut to Cmd or by using the Start command. (C$ share is for administrators).

Character set

Batch files use an OEM character set, as defined by the computer, e.g. Code page 437. The non-ASCII parts of these are incompatible with the Unicode or Windows character sets otherwise used in Windows so care needs to be taken. Non-English file names work only if entered through a DOS character set compatible editor. File names with characters outside this set do not work in batch files.

To get a command prompt with Unicode instead of Code page 437 or similar, one can use the cmd /U command. In such a command prompt, a batch file with Unicode filenames will work. Also one can use cmd /U to directly execute commands with Unicode as character set. For example, cmd /U /C dir > files.txt creates a file containing a directory listing with correct Windows characters, in the UTF-16LE encoding.

Batch viruses and malware

As with any other programming language, batch files can be used maliciously. Simple trojans and fork bombs are easily created, and batch files can do a form of DNS poisoning by modifying the hosts file. Batch viruses are possible, and can also spread themselves via USB flash drives by using Windows' Autorun capability.

The following command in a batch file will delete all the data in the current directory (folder) - without first asking for confirmation:

del /Q *.*

These three commands are a simple fork bomb that will continually replicate itself to deplete available system resources, slowing down or crashing the system:

:TOP
 start "" %0
 goto TOP

Other Windows scripting languages

The cmd.exe command processor that interprets .cmd files is supported in all 32- and 64-bit versions of Windows up to at least Windows 10. COMMAND.EXE, which interprets .BAT files, was supported in all 16- and 32-bit versions up to at least Windows 10.

There are other, later and more powerful, scripting languages available for Windows. However, these require the scripting language interpreter to be installed before they can be used:

Extended Batch Language (EBL) (.bat) — developed by Frank Canova as an 'own-time' project while working at IBM in 1982. It was subsequently sold by Seaware Corp as an interpreter and compiler primarily for DOS, but later for Windows.
KiXtart (.kix) — developed by a Microsoft employee in 1991, specifically to meet the need for commands useful in a network logon script while retaining the simple 'feel' of a .cmd file.
Windows Script Host (.vbs , .js and .wsf) — released by Microsoft in 1998, and consisting of cscript.exe and wscript.exe, runs scripts written in VBScript or JScript. It can run them in windowed mode (with the wscript.exe host) or in console-based mode (with the cscript.exe host). They have been a part of Windows since Windows 98.
PowerShell (.ps1) — released in 2006 by Microsoft and can operate with Windows XP (SP2/SP3) and later versions. PowerShell can operate both interactively (from a command-line interface) and also via saved scripts, and has a strong resemblance to Unix shells.
Unix-style shell scripting languages can be used if a Unix compatibility tool, such as Cygwin, is installed.
Cross-platform scripting tools including Perl, Python, Ruby, Rexx, Node.js and PHP are available for Windows.

Script files run if the filename without extension is entered. There are rules of precedence governing interpretation of, say, DoThis if DoThis.com, DoThis.exe, DoThis.bat, DoThis.cmd, etc. exist; by default DoThis.com has highest priority. This default order may be modified in newer operating systems by the user-settable PATHEXT environment variable.

Search This Blog