Search This Blog

Wednesday, October 9, 2019

Scientific notation

From Wikipedia, the free encyclopedia
 
Scientific notation (also referred to as scientific form or standard index form, or standard form in the UK) is a way of expressing numbers that are too big or too small to be conveniently written in decimal form. It is commonly used by scientists, mathematicians and engineers, in part because it can simplify certain arithmetic operations. On scientific calculators it is usually known as "SCI" display mode.

Decimal notation Scientific notation
2 2×100
300 3×102
4,321.768 4.321768×103
−53,000 −5.3×104
6,720,000,000 6.72×109
0.2 2×10−1
987 9.87×102
0.000 000 007 51 7.51×10−9
In scientific notation, all numbers are written in the form
m × 10n
(m times ten raised to the power of n), where the exponent n is an integer, and the coefficient m is any real number. The integer n is called the order of magnitude and the real number m is called the significand or mantissa. However, the term "mantissa" may cause confusion because it is the name of the fractional part of the common logarithm. If the number is negative then a minus sign precedes m (as in ordinary decimal notation). In normalized notation, the exponent is chosen so that the absolute value of the coefficient is at least one but less than ten.

Decimal floating point is a computer arithmetic system closely related to scientific notation.

Normalized notation

Any given real number can be written in the form m×10n in many ways: for example, 350 can be written as 3.5×102 or 35×101 or 350×100.

In normalized scientific notation (called "standard form" in the UK), the exponent n is chosen so that the absolute value of m remains at least one but less than ten (1 ≤ |m| < 10). Thus 350 is written as 3.5×102. This form allows easy comparison of numbers, as the exponent n gives the number's order of magnitude. In normalized notation, the exponent n is negative for a number with absolute value between 0 and 1 (e.g. 0.5 is written as 5×10−1). The 10 and exponent are often omitted when the exponent is 0.

Normalized scientific form is the typical form of expression of large numbers in many fields, unless an unnormalized form, such as engineering notation, is desired. Normalized scientific notation is often called exponential notation—although the latter term is more general and also applies when m is not restricted to the range 1 to 10 (as in engineering notation for instance) and to bases other than 10 (for example, 3.15×220).

Engineering notation

Engineering notation (often named "ENG" display mode on scientific calculators) differs from normalized scientific notation in that the exponent n is restricted to multiples of 3. Consequently, the absolute value of m is in the range 1 ≤ |m| < 1000, rather than 1 ≤ |m| < 10. Though similar in concept, engineering notation is rarely called scientific notation. Engineering notation allows the numbers to explicitly match their corresponding SI prefixes, which facilitates reading and oral communication. For example, 12.5×10−9 m can be read as "twelve-point-five nanometers" and written as 12.5 nm, while its scientific notation equivalent 1.25×10−8 m would likely be read out as "one-point-two-five times ten-to-the-negative-eight meters".

Significant figures

A significant figure is a digit in a number that adds to its precision. This includes all nonzero numbers, zeroes between significant digits, and zeroes indicated to be significant. Leading and trailing zeroes are not significant because they exist only to show the scale of the number. Therefore, 1,230,400 usually has five significant figures: 1, 2, 3, 0, and 4; the final two zeroes serve only as placeholders and add no precision to the original number.

When a number is converted into normalized scientific notation, it is scaled down to a number between 1 and 10. All of the significant digits remain, but the place holding zeroes are no longer required. Thus 1,230,400 would become 1.2304 × 106. However, there is also the possibility that the number may be known to six or more significant figures, in which case the number would be shown as (for instance) 1.23040 × 106. Thus, an additional advantage of scientific notation is that the number of significant figures is clearer.

Estimated final digit(s)

It is customary in scientific measurements to record all the definitely known digits from the measurements, and to estimate at least one additional digit if there is any information at all available to enable the observer to make an estimate. The resulting number contains more information than it would without that extra digit(s), and it (or they) may be considered a significant digit because it conveys some information leading to greater precision in measurements and in aggregations of measurements (adding them or multiplying them together).

Additional information about precision can be conveyed through additional notations. It is often useful to know how exact the final digit(s) are. For instance, the accepted value of the unit of elementary charge can properly be expressed as 1.6021766208(98)×10−19 C, which is shorthand for (1.6021766208±0.0000000098)×10−19 C.

E-notation

A calculator display showing the Avogadro constant in E-notation
 
Most calculators and many computer programs present very large and very small results in scientific notation, typically invoked by a key labelled EXP (for exponent), EEX (for enter exponent), EE, EX, E, or ×10x depending on vendor and model. Because superscripted exponents like 107 cannot always be conveniently displayed, the letter E (or e) is often used to represent "times ten raised to the power of" (which would be written as "× 10n") and is followed by the value of the exponent; in other words, for any two real numbers m and n, the usage of "mEn" would indicate a value of m × 10n. In this usage the character e is not related to the mathematical constant e or the exponential function ex (a confusion that is unlikely if scientific notation is represented by a capital E). Although the E stands for exponent, the notation is usually referred to as (scientific) E-notation rather than (scientific) exponential notation. The use of E-notation facilitates data entry and readability in textual communication since it minimizes keystrokes, avoids reduced font sizes and provides a simpler and more concise display, but it is not encouraged in some publications.

Examples and other notations

  • In most popular programming languages, 6.022E23 (or 6.022e23) is equivalent to 6.022×1023, and 1.6×10−35 would be written 1.6E-35 (e.g. Ada, Analytica, C/C++, FORTRAN (since FORTRAN II as of 1958), MATLAB, Scilab, Perl, Java, Python, Lua, JavaScript, and others).
  • After the introduction of the first pocket calculators supporting scientific notation in 1972 (HP-35, SR-10) the term decapower was sometimes used in the emerging user communities for the power-of-ten multiplier in order to better distinguish it from "normal" exponents. Likewise, the letter "D" was used in typewritten numbers. This notation was proposed by Jim Davidson and published in the January 1976 issue of Richard J. Nelson's Hewlett-Packard newsletter 65 Notes for HP-65 users, and it was adopted and carried over into the Texas Instruments community by Richard C. Vanderburgh, the editor of the 52-Notes newsletter for SR-52 users in November 1976.
  • FORTRAN (at least since FORTRAN IV as of 1961) also uses "D" to signify double precision numbers in scientific notation.
  • Similar, a "D" was used by Sharp pocket computers PC-1280, PC-1470U, PC-1475, PC-1480U, PC-1490U, PC-1490UII, PC-E500, PC-E500S, PC-E550, PC-E650 and PC-U6000 to indicate 20-digit double-precision numbers in scientific notation in BASIC between 1987 and 1995.
  • The ALGOL 60 (1960) programming language uses a subscript ten "10" character instead of the letter E, for example: 6.0221023.
  • The use of the "10" in the various Algol standards provided a challenge on some computer systems that did not provide such a "10" character. As a consequence Stanford University Algol-W required the use of a single quote, e.g. 6.02486'+23, and some Soviet Algol variants allowed the use of the Cyrillic character "ю" character, e.g. 6.022ю+23.
  • Subsequently, the ALGOL 68 programming language provided the choice of 4 characters: E, e, \, or 10. By examples: 6.022E23, 6.022e23, 6.022\23 or 6.0221023.
  • Decimal Exponent Symbol is part of the Unicode Standard, e.g. 6.022⏨23. It is included as U+23E8 DECIMAL EXPONENT SYMBOL to accommodate usage in the programming languages Algol 60 and Algol 68.
  • The TI-83 series and TI-84 Plus series of calculators use a stylized E character to display decimal exponent and the 10 character to denote an equivalent ×10^ operator.
  • The Simula programming language requires the use of & (or && for long), for example: 6.022&23 (or 6.022&&23).
  • The Wolfram Language (utilized in Mathematica) allows a shorthand notation of 6.022*^23. (Instead, E denotes the mathematical constant e).

Order of magnitude

Scientific notation also enables simpler order-of-magnitude comparisons. A proton's mass is 0.0000000000000000000000000016726 kg. If written as 1.6726×10−27 kg, it is easier to compare this mass with that of an electron, given below. The order of magnitude of the ratio of the masses can be obtained by comparing the exponents instead of the more error-prone task of counting the leading zeros. In this case, −27 is larger than −31 and therefore the proton is roughly four orders of magnitude (10,000 times) more massive than the electron. 

Scientific notation also avoids misunderstandings due to regional differences in certain quantifiers, such as billion, which might indicate either 109 or 1012.

In physics and astrophysics, the number of orders of magnitude between two numbers is sometimes referred to as "dex", a contraction of "decimal exponent". For instance, if two numbers are within 1 dex of each other, then the ratio of the larger to the smaller number is less than 10. Fractional values can be used, so if within 0.5 dex, the ratio is less than 100.5, and so on.

Use of spaces

In normalized scientific notation, in E-notation, and in engineering notation, the space (which in typesetting may be represented by a normal width space or a thin space) that is allowed only before and after "×" or in front of "E" is sometimes omitted, though it is less common to do so before the alphabetical character.

Further examples of scientific notation

  • An electron's mass is about 0.000000000000000000000000000000910938356 kg. In scientific notation, this is written 9.10938356×10−31 kg (in SI units).
  • The Earth's mass is about 5972400000000000000000000 kg. In scientific notation, this is written 5.9724×1024 kg.
  • The Earth's circumference is approximately 40000000 m. In scientific notation, this is 4×107 m. In engineering notation, this is written 40×106 m. In SI writing style, this may be written 40 Mm (40 megameters).
  • An inch is defined as exactly 25.4 mm. Quoting a value of 25.400 mm shows that the value is correct to the nearest micrometer. An approximated value with only two significant digits would be 2.5×101 mm instead. As there is no limit to the number of significant digits, the length of an inch could, if required, be written as (say) 2.54000000000×101 mm instead.

Converting numbers

Converting a number in these cases means to either convert the number into scientific notation form, convert it back into decimal form or to change the exponent part of the equation. None of these alter the actual number, only how it's expressed.

Decimal to scientific

First, move the decimal separator point sufficient places, n, to put the number's value within a desired range, between 1 and 10 for normalized notation. If the decimal was moved to the left, append "× 10n"; to the right, "× 10−n". To represent the number 1,230,400 in normalized scientific notation, the decimal separator would be moved 6 digits to the left and "× 106" appended, resulting in 1.2304×106. The number −0.0040321 would have its decimal separator shifted 3 digits to the right instead of the left and yield −4.0321×10−3 as a result.

Scientific to decimal

Converting a number from scientific notation to decimal notation, first remove the × 10n on the end, then shift the decimal separator n digits to the right (positive n) or left (negative n). The number 1.2304×106 would have its decimal separator shifted 6 digits to the right and become 1,230,400, while −4.0321×10−3 would have its decimal separator moved 3 digits to the left and be −0.0040321.

Exponential

Conversion between different scientific notation representations of the same number with different exponential values is achieved by performing opposite operations of multiplication or division by a power of ten on the significand and an subtraction or addition of one on the exponent part. The decimal separator in the significand is shifted x places to the left (or right) and x is added to (or subtracted from) the exponent, as shown below.
1.234×103 = 12.34×102 = 123.4×101 = 1234

Basic operations

Given two numbers in scientific notation,
and
Multiplication and division are performed using the rules for operation with exponentiation:
and
Some examples are:
and
Addition and subtraction require the numbers to be represented using the same exponential part, so that the significand can be simply added or subtracted:
 
and
with
Next, add or subtract the significands:
An example:

Other bases

While base ten is normally used for scientific notation, powers of other bases can be used too, base 2 being the next most commonly used one.

For example, in base-2 scientific notation, the number 1001b in binary (=9d) is written as 1.001b × 2d11b or 1.001b × 10b11b using binary numbers (or shorter 1.001 × 1011 if binary context is obvious). In E-notation, this is written as 1.001bE11b (or shorter: 1.001E11) with the letter E now standing for "times two (10b) to the power" here. In order to better distinguish this base-2 exponent from a base-10 exponent, a base-2 exponent is sometimes also indicated by using the letter B instead of E, a shorthand notation originally proposed by Bruce Alan Martin of Brookhaven National Laboratory in 1968, as in 1.001bB11b (or shorter: 1.001B11). For comparison, the same number in decimal representation: 1.125 × 23 (using decimal representation), or 1.125B3 (still using decimal representation). Some calculators use a mixed representation for binary floating point numbers, where the exponent is displayed as decimal number even in binary mode, so the above becomes 1.001b × 10b3d or shorter 1.001B3.

This is closely related to the base-2 floating-point representation commonly used in computer arithmetic, and the usage of IEC binary prefixes (e.g. 1B10 for 1×210 (kibi), 1B20 for 1×220 (mebi), 1B30 for 1×230 (gibi), 1B40 for 1×240 (tebi)). 

Similar to B (or b), the letters H (or h) and O (or o, or C) are sometimes also used to indicate times 16 or 8 to the power as in 1.25 = 1.40h × 10h0h = 1.40H0 = 1.40h0, or 98000 = 2.7732o × 10o5o = 2.7732o5 = 2.7732C5.

Another similar convention to denote base-2 exponents is using a letter P (or p, for "power"). In this notation the significand is always meant to be hexadecimal, whereas the exponent is always meant to be decimal. This notation can be produced by implementations of the printf family of functions following the C99 specification and (Single Unix Specification) IEEE Std 1003.1 POSIX standard, when using the %a or %A conversion specifiers. Starting with C++11, C++ I/O functions could parse and print the P-notation as well. Meanwhile, the notation has been fully adopted by the language standard since C++17. Apple's Swift supports it as well. It is also required by the IEEE 754-2008 binary floating-point standard. Example: 1.3DEp42 represents 1.3DEh × 242

Engineering notation can be viewed as a base-1000 scientific notation.

Hectare

From Wikipedia, the free encyclopedia
 
Hectare
Illustration of One Hectare.png
Visualization of one hectare
General information
Unit systemNon-SI unit accepted for use with SI
Unit ofArea
Symbolha 
In SI base units:1 ha = 104 m2

The hectare (/ˈhɛktɛər, -tɑːr/; SI symbol: ha) is an SI accepted metric system unit of area equal to a square with 100-metre sides, or 10,000 m2, and is primarily used in the measurement of land.[1] There are 100 hectares in one square kilometre. An acre is about 0.405 hectare and one hectare contains about 2.47 acres.

In 1795, when the metric system was introduced, the "are" was defined as 100 square metres and the hectare ("hecto-" + "are") was thus 100 "ares" or ​1100 km2 (10,000 square metres). When the metric system was further rationalised in 1960, resulting in the International System of Units (SI), the are was not included as a recognised unit. The hectare, however, remains as a non-SI unit accepted for use with the SI units, mentioned in Section 4.1 of the SI Brochure as a unit whose use is "expected to continue indefinitely".

The name was coined in French, from the Latin ārea.

Comparison of area units
Unit SI
1 ca 1 m2
1 a 100 m2
1 ha 10,000 m2
100 ha 1,000,000 m2
1 km2
non-SI comparisons
non-SI metric
0.3861 sq mi 1 km2
2.471 acre 1 ha
107,639 sq ft 1 ha
1 sq mi 259.0 ha
1 acre 0.4047 ha

History

The metric system of measurement was first given a legal basis in 1795 by the French Revolutionary government. The law of 18 Germinal, Year III (7 April 1795) defined five units of measure:
  • The metre for length
  • The are (100 m2) for area [of land]
  • The stère (1 m3) for volume of stacked firewood
  • The litre (1 dm3) for volumes of liquid
  • The gram for mass
In 1960, when the metric system was updated as the International System of Units (SI), the are did not receive international recognition. The International Committee for Weights and Measures (CIPM) makes no mention of the are in the current (2006) definition of the SI, but classifies the hectare as a "Non-SI unit accepted for use with the International System of Units".

In 1972, the European Economic Community (EEC) passed directive 71/354/EEC, which catalogued the units of measure that might be used within the Community. The units that were catalogued replicated the recommendations of the CGPM, supplemented by a few other units including the are (and implicitly the hectare) whose use was limited to the measurement of land.

Units

Definition of a hectare and of an are
 
The names centiare, deciare, decare and hectare are derived by adding the standard metric prefixes to the original base unit of area, the are.

Centiare

The centiare is one square metre.

Deciare

The deciare is ten square metres.

Are

The are (/ɑːr/ or /ɛər/) is a unit of area, equal to 100 square metres (10 m × 10 m), used for measuring land area. It was defined by older forms of the metric system, but is now outside the modern International System of Units (SI). It is still commonly used in colloquial speech to measure real estate, in particular in Indonesia, India, and in various European countries.

In Russian and other languages of the former Soviet Union, the are is called sotka (Russian: сотка: 'a hundred', i.e. 100 m2 or ​1100 hectare). It is used to describe the size of suburban dacha or allotment garden plots or small city parks where the hectare would be too large.

Decare

The decare (/ˈdɛkɑːr, -ɛər/) is derived from deca and are, and is equal to 10 ares or 1000 square metres. It is used in Norway and in the former Ottoman areas of the Middle East and the Balkans (Bulgaria) as a measure of land area. Instead of the name "decare", the names of traditional land measures are usually used, redefined as one decare:
  • Stremma in Greece
  • Dunam, dunum, donum, or dönüm in Israel, Palestine, Jordan, Lebanon, Syria and Turkey
  • Mål is sometimes used for decare in Norway, from the old measure of about the same area.

Hectare

Trafalgar Square has an area of about one hectare.
 
The hectare (/ˈhɛktɛər, -tɑːr/), although not a unit of SI, is the only named unit of area that is accepted for use within the SI. In practice the hectare is fully derived from the SI, being equivalent to a square hectometre. It is widely used throughout the world for the measurement of large areas of land, and it is the legal unit of measure in domains concerned with land ownership, planning, and management, including law (land deeds), agriculture, forestry, and town planning throughout the European Union. The United Kingdom, United States, Burma, and to some extent Canada use the acre instead.

Some countries that underwent a general conversion from traditional measurements to metric measurements (e.g. Canada) required a resurvey when units of measure in legal descriptions relating to land were converted to metric units. Others, such as South Africa, published conversion factors which were to be used particularly "when preparing consolidation diagrams by compilation".

In many countries, metrication redefined or clarified existing measures in terms of metric units. The following legacy units of area have been redefined as being equal to one hectare:
  • Jerib in Iran
  • Djerib in Turkey
  • Gong Qing (公頃/公顷 – gōngqǐng) in Hong Kong / mainland China
  • Manzana in Argentina
  • Bunder in The Netherlands (until 1937)

Conversions

Metric and imperial/US customary comparisons ck
Metric equivalents Imperial/US customary equivalents
centiare ca
1 m2 0.01 a 1.19599 sq yd
are a 100 ca 100 m2 0.01 ha 3.95369 perches
decare daa 10 a 1,000 m2 0.1 ha 0.98842 roods
hectare ha 100 a 10,000 m2 0.01 km2 about 2.4710538 acres
square kilometre km2 100 ha 1,000,000 m2
0.38610 sq mi
The most commonly used units are in bold.

One hectare is also equivalent to:
  • 1 square hectometre
  • 15 or 0.15 qǐng
  • 10 dunam or dönüm (Middle East)
  • 10 stremmata (Greece)
  • 6.25 rai (Thailand)
  • ≈ 1.008 chō (Japan)
  • ≈ 2.381 feddan (Egypt)

Visualising a hectare

International rugby pitch

Waikato Stadium – Hamilton, New Zealand
 
The maximum playing area of an international-sized rugby union pitch is about one hectare.
 
On an international rugby union field the goal lines are up to 100 metres apart. Behind the goal line is the in-goal area (which is also a playing area). This area extends between 10 and 22 metres behind the goal line, giving a maximum length of 144 metres for the playing area. The maximum width of the pitch is 70 metres, giving a maximum playing area of 10,080 square metres or 1.008 hectares.

Statue of Liberty

The Statue of Liberty occupies a square of land with an area of one hectare.
 
The Statue of Liberty is located on Liberty Island at the entrance to New York Harbor. Its base is built on eighteenth-century fortifications.

The distance between the apex of the bastions in the front of the base to those at the back (where the entrance to the statue is located) is approximately 100 m while the distance between the apices of the left-hand and right-hand bastions is a little under 100 m. Thus, if a square were to enscribe the bastions, it would have sides of approximately 100 m, giving it an area of one hectare.

Interior of all-weather athletics track

Hansen Field at Western Illinois University in Macomb, Illinois incorporates an all-weather running track.
 
The grass in the centre of a standard athletic track is a little over one hectare in extent.
 
Athletics tracks are found in almost every country of the world. Although many tracks consist of markings on a field of suitable size, where funds permit, specialist all-weather tracks have a rubberized artificial running surface with a grass interior (as shown in the picture and diagram). The perimeter of the inside kerb of the track is a little under 400 metres, as the actual length of the track is measured 300 mm from the inside kerb. The IAAF specifications state that the radius of the kerb is 36.5 m, from which it can be calculated that the area inside the kerb is 1.035 ha.
The socc

Comparative genomics

From Wikipedia, the free encyclopedia
 
Whole genome alignment is a typical method in comparative genomics. This alignment of eight Yersinia bacteria genomes reveals 78 locally collinear blocks conserved among all eight taxa. Each chromosome has been laid out horizontally and homologous blocks in each genome are shown as identically colored regions linked across genomes. Regions that are inverted relative to Y. pestis KIM are shifted below a genome's center axis.
 
Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural landmarks. In this branch of genomics, whole or large parts of genomes resulting from genome projects are compared to study basic biological similarities and differences as well as evolutionary relationships between organisms. The major principle of comparative genomics is that common features of two organisms will often be encoded within the DNA that is evolutionarily conserved between them. Therefore, comparative genomic approaches start with making some form of alignment of genome sequences and looking for orthologous sequences (sequences that share a common ancestry) in the aligned genomes and checking to what extent those sequences are conserved. Based on these, genome and molecular evolution are inferred and this may in turn be put in the context of, for example, phenotypic evolution or population genetics.

Virtually started as soon as the whole genomes of two organisms became available (that is, the genomes of the bacteria Haemophilus influenzae and Mycoplasma genitalium) in 1995, comparative genomics is now a standard component of the analysis of every new genome sequence. With the explosion in the number of genome projects due to the advancements in DNA sequencing technologies, particularly the next-generation sequencing methods in late 2000s, this field has become more sophisticated, making it possible to deal with many genomes in a single study. Comparative genomics has revealed high levels of similarity between closely related organisms, such as humans and chimpanzees, and, more surprisingly, similarity between seemingly distantly related organisms, such as humans and the yeast Saccharomyces cerevisiae. It has also showed the extreme diversity of the gene composition in different evolutionary lineages.

History

Comparative genomics has a root in the comparison of virus genomes in the early 1980s. For example, small RNA viruses infecting animals (picornaviruses) and those infecting plants (cowpea mosaic virus) were compared and turned out to share significant sequence similarity and, in part, the order of their genes. In 1986, the first comparative genomic study at a larger scale was published, comparing the genomes of varicella-zoster virus and Epstein-Barr virus that contained more than 100 genes each.

The first complete genome sequence of a cellular organism, that of Haemophilus influenzae Rd, was published in 1995. The second genome sequencing paper was of the small parasitic bacterium Mycoplasma genitalium published in the same year. Starting from this paper, reports on new genomes inevitably became comparative-genomic studies.

The first high-resolution whole genome comparison system was developed in 1998 by Art Delcher, Simon Kasif and Steven Salzberg and applied to the comparison of entire highly related microbial organisms with their collaborators at the Institute for Genomic Research (TIGR). The system is called MUMMER and was described in a publication in Nucleic Acids Research in 1999. The system helps researchers to identify large rearrangements, single base mutations, reversals, tandem repeat expansions and other polymorphisms. In bacteria, MUMMER enables the identification of polymorphisms that are responsible for virulence, pathogenicity, and anti-biotic resistance. The system was also applied to the Minimal Organism Project at TIGR and subsequently to many other comparative genomics projects.

Saccharomyces cerevisiae, the baker's yeast, was the first eukaryote to have its complete genome sequence published in 1996. After the publication of the roundworm Caenorhabditis elegans genome in 1998 and together with the fruit fly Drosophila melanogaster genome in 2000, Gerald M. Rubin and his team published a paper titled "Comparative Genomics of the Eukaryotes", in which they compared the genomes of the eukaryotes D. melanogaster, C. elegans, and S. cerevisiae, as well as the prokaryote H. influenzae. At the same time, Bonnie Berger, Eric Lander, and their team published a paper on whole-genome comparison of human and mouse.

With the publication of the large genomes of vertebrates in the 2000s, including human, the Japanese pufferfish Takifugu rubripes, and mouse, precomputed results of large genome comparisons have been released for downloading or for visualization in a genome browser. Instead of undertaking their own analyses, most biologists can access these large cross-species comparisons and avoid the impracticality caused by the size of the genomes.

Next-generation sequencing methods, which were first introduced in 2007, have produced an enormous amount of genomic data and have allowed researchers to generate multiple (prokaryotic) draft genome sequences at once. These methods can also quickly uncover single-nucleotide polymorphisms, insertions and deletions by mapping unassembled reads against a well annotated reference genome, and thus provide a list of possible gene differences that may be the basis for any functional variation among strains.

Evolutionary principles

One character of biology is evolution, evolutionary theory is also the theoretical foundation of comparative genomics, and at the same time the results of comparative genomics unprecedentedly enriched and developed the theory of evolution. When two or more of the genome sequence are compared, one can deduce the evolutionary relationships of the sequences in a phylogenetic tree. Based on a variety of biological genome data and the study of vertical and horizontal evolution processes, one can understand vital parts of the gene structure and its regulatory function.

Similarity of related genomes is the basis of comparative genomics. If two creatures have a recent common ancestor, the differences between the two species genomes are evolved from the ancestors’ genome. The closer the relationship between two organisms, the higher the similarities between their genomes. If there is close relationship between them, then their genome will display a linear behaviour (synteny), namely some or all of the genetic sequences are conserved. Thus, the genome sequences can be used to identify gene function, by analyzing their homology (sequence similarity) to genes of known function.

Orthologous sequences are related sequences in different species: a gene exists in the original species, the species divided into two species, so genes in new species are orthologous to the sequence in the original species. Paralogous sequences are separated by gene cloning (gene duplication): if a particular gene in the genome is copied, then the copy of the two sequences is paralogous to the original gene. A pair of orthologous sequences is called orthologous pairs (orthologs), a pair of paralogous sequence is called collateral pairs (paralogs). Orthologous pairs usually have the same or similar function, which is not necessarily the case for collateral pairs. In collateral pairs, the sequences tend to evolve into having different functions.

Human FOXP2 gene and evolutionary conservation is shown in and multiple alignment (at bottom of figure) in this image from the UCSC Genome Browser. Note that conservation tends to cluster around coding regions (exons).
 
Comparative genomics exploits both similarities and differences in the proteins, RNA, and regulatory regions of different organisms to infer how selection has acted upon these elements. Those elements that are responsible for similarities between different species should be conserved through time (stabilizing selection), while those elements responsible for differences among species should be divergent (positive selection). Finally, those elements that are unimportant to the evolutionary success of the organism will be unconserved (selection is neutral).

One of the important goals of the field is the identification of the mechanisms of eukaryotic genome evolution. It is however often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism. For this reason comparative genomics studies of small model organisms (for example the model Caenorhabditis elegans and closely related Caenorhabditis briggsae) are of great importance to advance our understanding of general mechanisms of evolution.

Methods

Computational approaches to genome comparison have recently become a common research topic in computer science. A public collection of case studies and demonstrations is growing, ranging from whole genome comparisons to gene expression analysis. This has increased the introduction of different ideas, including concepts from systems and control, information theory, strings analysis and data mining. It is anticipated that computational approaches will become and remain a standard topic for research and teaching, while multiple courses will begin training students to be fluent in both topics.

Tools

Computational tools for analyzing sequences and complete genomes are developing quickly due to the availability of large amount of genomic data. At the same time, comparative analysis tools are progressed and improved. In the challenges about these analyses, it is very important to visualize the comparative results.

Visualization of sequence conservation is a tough task of comparative sequence analysis. As we know, it is highly inefficient to examine the alignment of long genomic regions manually. Internet-based genome browsers provide many useful tools for investigating genomic sequences due to integrating all sequence-based biological information on genomic regions. When we extract large amount of relevant biological data, they can be very easy to use and less time-consuming.
  • UCSC Browser: This site contains the reference sequence and working draft assemblies for a large collection of genomes.
  • Ensembl: The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online.
  • MapView: The Map Viewer provides a wide variety of genome mapping and sequencing data.
  • VISTA is a comprehensive suite of programs and databases for comparative analysis of genomic sequences. It was built to visualize the results of comparative analysis based on DNA alignments. The presentation of comparative data generated by VISTA can easily suit both small and large scale of data.
  • BlueJay Genome Browser: a stand-alone visualization tool for the multi-scale viewing of annotated genomes and other genomic elements.
An advantage of using online tools is that these websites are being developed and updated constantly. There are many new settings and content can be used online to improve efficiency.

Applications

Agriculture

Agriculture is a field that reaps the benefits of comparative genomics. Identifying the loci of advantageous genes is a key step in breeding crops that are optimized for greater yield, cost-efficiency, quality, and disease resistance. For example, one genome wide association study conducted on 517 rice landraces revealed 80 loci associated with several categories of agronomic performance, such as grain weight, amylose content, and drought tolerance. Many of the loci were previously uncharacterized. Not only is this methodology powerful, it is also quick. Previous methods of identifying loci associated with agronomic performance required several generations of carefully monitored breeding of parent strains, a time consuming effort that is unnecessary for comparative genomic studies.

Medicine

The medical field also benefits from the study of comparative genomics. Vaccinology in particular has experienced useful advances in technology due to genomic approaches to problems. In an approach known as reverse vaccinology, researchers can discover candidate antigens for vaccine development by analyzing the genome of a pathogen or a family of pathogens. Applying a comparative genomics approach by analyzing the genomes of several related pathogens can lead to the development of vaccines that are multiprotective. A team of researchers employed such an approach to create a universal vaccine for Group B Streptococcus, a group of bacteria responsible for severe neonatal infection. Comparative genomics can also be used to generate specificity for vaccines against pathogens that are closely related to commensal microorganisms. For example, researchers used comparative genomic analysis of commensal and pathogenic strains of E. coli to identify pathogen specific genes as a basis for finding antigens that result in immune response against pathogenic strains but not commensal ones. In May of 2019, using the Global Genome Set, a team in the UK and Australia sequenced thousands of globally-collected isolates of Group A Streptococcus, providing potential targets for developing a vaccine against the pathogen, also known as S. pyogenes.

Research

Comparative genomics also opens up new avenues in other areas of research. As DNA sequencing technology has become more accessible, the number of sequenced genomes has grown. With the increasing reservoir of available genomic data, the potency of comparative genomic inference has grown as well. A notable case of this increased potency is found in recent primate research. Comparative genomic methods have allowed researchers to gather information about genetic variation, differential gene expression, and evolutionary dynamics in primates that were indiscernible using previous data and methods. The Great Ape Genome Project used comparative genomic methods to investigate genetic variation with reference to the six great ape species, finding healthy levels of variation in their gene pool despite shrinking population size. Another study showed that patterns of DNA methylation, which are a known regulation mechanism for gene expression, differ in the prefrontal cortex of humans versus chimps, and implicated this difference in the evolutionary divergence of the two species.

Political psychology

From Wikipedia, the free encyclopedia ...