Search This Blog

Monday, September 10, 2018

Genetic algorithm

From Wikipedia, the free encyclopedia
 
The 2006 NASA ST5 spacecraft antenna. This complicated shape was found by an evolutionary computer design program to create the best radiation pattern. It is known as an evolved antenna.

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection.

Methodology

Optimization problems

In a genetic algorithm, a population of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem is evolved toward better solutions. Each candidate solution has a set of properties (its chromosomes or genotype) which can be mutated and altered; traditionally, solutions are represented in binary as strings of 0s and 1s, but other encodings are also possible.

The evolution usually starts from a population of randomly generated individuals, and is an iterative process, with the population in each iteration called a generation. In each generation, the fitness of every individual in the population is evaluated; the fitness is usually the value of the objective function in the optimization problem being solved. The more fit individuals are stochastically selected from the current population, and each individual's genome is modified (recombined and possibly randomly mutated) to form a new generation. The new generation of candidate solutions is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population.

A typical genetic algorithm requires:
  1. a genetic representation of the solution domain,
  2. a fitness function to evaluate the solution domain.
A standard representation of each candidate solution is as an array of bits. Arrays of other types and structures can be used in essentially the same way. The main property that makes these genetic representations convenient is that their parts are easily aligned due to their fixed size, which facilitates simple crossover operations. Variable length representations may also be used, but crossover implementation is more complex in this case. Tree-like representations are explored in genetic programming and graph-form representations are explored in evolutionary programming; a mix of both linear chromosomes and trees is explored in gene expression programming.

Once the genetic representation and the fitness function are defined, a GA proceeds to initialize a population of solutions and then to improve it through repetitive application of the mutation, crossover, inversion and selection operators.

Initialization

The population size depends on the nature of the problem, but typically contains several hundreds or thousands of possible solutions. Often, the initial population is generated randomly, allowing the entire range of possible solutions (the search space). Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to be found.

Selection

During each successive generation, a portion of the existing population is selected to breed a new generation. Individual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function) are typically more likely to be selected. Certain selection methods rate the fitness of each solution and preferentially select the best solutions. Other methods rate only a random sample of the population, as the former process may be very time-consuming.

The fitness function is defined over the genetic representation and measures the quality of the represented solution. The fitness function is always problem dependent. For instance, in the knapsack problem one wants to maximize the total value of objects that can be put in a knapsack of some fixed capacity. A representation of a solution might be an array of bits, where each bit represents a different object, and the value of the bit (0 or 1) represents whether or not the object is in the knapsack. Not every such representation is valid, as the size of objects may exceed the capacity of the knapsack. The fitness of the solution is the sum of values of all objects in the knapsack if the representation is valid, or 0 otherwise.

In some problems, it is hard or even impossible to define the fitness expression; in these cases, a simulation may be used to determine the fitness function value of a phenotype (e.g. computational fluid dynamics is used to determine the air resistance of a vehicle whose shape is encoded as the phenotype), or even interactive genetic algorithms are used.

Genetic operators

The next step is to generate a second generation population of solutions from those selected through a combination of genetic operators: crossover (also called recombination), and mutation.

For each new solution to be produced, a pair of "parent" solutions is selected for breeding from the pool selected previously. By producing a "child" solution using the above methods of crossover and mutation, a new solution is created which typically shares many of the characteristics of its "parents". New parents are selected for each new child, and the process continues until a new population of solutions of appropriate size is generated. Although reproduction methods that are based on the use of two parents are more "biology inspired", some research suggests that more than two "parents" generate higher quality chromosomes.

These processes ultimately result in the next generation population of chromosomes that is different from the initial generation. Generally the average fitness will have increased by this procedure for the population, since only the best organisms from the first generation are selected for breeding, along with a small proportion of less fit solutions. These less fit solutions ensure genetic diversity within the genetic pool of the parents and therefore ensure the genetic diversity of the subsequent generation of children.

Opinion is divided over the importance of crossover versus mutation. There are many references in Fogel (2006) that support the importance of mutation-based search.

Although crossover and mutation are known as the main genetic operators, it is possible to use other operators such as regrouping, colonization-extinction, or migration in genetic algorithms.

It is worth tuning parameters such as the mutation probability, crossover probability and population size to find reasonable settings for the problem class being worked on. A very small mutation rate may lead to genetic drift (which is non-ergodic in nature). A recombination rate that is too high may lead to premature convergence of the genetic algorithm. A mutation rate that is too high may lead to loss of good solutions, unless elitist selection is employed.

Heuristics

In addition to the main operators above, other heuristics may be employed to make the calculation faster or more robust. The speciation heuristic penalizes crossover between candidate solutions that are too similar; this encourages population diversity and helps prevent premature convergence to a less optimal solution.

Termination

This generational process is repeated until a termination condition has been reached. Common terminating conditions are:
  • A solution is found that satisfies minimum criteria
  • Fixed number of generations reached
  • Allocated budget (computation time/money) reached
  • The highest ranking solution's fitness is reaching or has reached a plateau such that successive iterations no longer produce better results
  • Manual inspection
  • Combinations of the above

The building block hypothesis

Genetic algorithms are simple to implement, but their behavior is difficult to understand. In particular it is difficult to understand why these algorithms frequently succeed at generating solutions of high fitness when applied to practical problems. The building block hypothesis (BBH) consists of:
  1. A description of a heuristic that performs adaptation by identifying and recombining "building blocks", i.e. low order, low defining-length schemata with above average fitness.
  2. A hypothesis that a genetic algorithm performs adaptation by implicitly and efficiently implementing this heuristic.
Goldberg describes the heuristic as follows:
"Short, low order, and highly fit schemata are sampled, recombined [crossed over], and resampled to form strings of potentially higher fitness. In a way, by working with these particular schemata [the building blocks], we have reduced the complexity of our problem; instead of building high-performance strings by trying every conceivable combination, we construct better and better strings from the best partial solutions of past samplings.
"Because highly fit schemata of low defining length and low order play such an important role in the action of genetic algorithms, we have already given them a special name: building blocks. Just as a child creates magnificent fortresses through the arrangement of simple blocks of wood, so does a genetic algorithm seek near optimal performance through the juxtaposition of short, low-order, high-performance schemata, or building blocks."
Despite the lack of consensus regarding the validity of the building-block hypothesis, it has been consistently evaluated and used as reference throughout the years. Many estimation of distribution algorithms, for example, have been proposed in an attempt to provide an environment in which the hypothesis would hold. Although good results have been reported for some classes of problems, skepticism concerning the generality and/or practicality of the building-block hypothesis as an explanation for GAs efficiency still remains. Indeed, there is a reasonable amount of work that attempts to understand its limitations from the perspective of estimation of distribution algorithms.

Limitations

There are limitations of the use of a genetic algorithm compared to alternative optimization algorithms:
  • Repeated fitness function evaluation for complex problems is often the most prohibitive and limiting segment of artificial evolutionary algorithms. Finding the optimal solution to complex high-dimensional, multimodal problems often requires very expensive fitness function evaluations. In real world problems such as structural optimization problems, a single function evaluation may require several hours to several days of complete simulation. Typical optimization methods can not deal with such types of problem. In this case, it may be necessary to forgo an exact evaluation and use an approximated fitness that is computationally efficient. It is apparent that amalgamation of approximate models may be one of the most promising approaches to convincingly use GA to solve complex real life problems.
  • Genetic algorithms do not scale well with complexity. That is, where the number of elements which are exposed to mutation is large there is often an exponential increase in search space size. This makes it extremely difficult to use the technique on problems such as designing an engine, a house or plane. In order to make such problems tractable to evolutionary search, they must be broken down into the simplest representation possible. Hence we typically see evolutionary algorithms encoding designs for fan blades instead of engines, building shapes instead of detailed construction plans, and airfoils instead of whole aircraft designs. The second problem of complexity is the issue of how to protect parts that have evolved to represent good solutions from further destructive mutation, particularly when their fitness assessment requires them to combine well with other parts.
  • The "better" solution is only in comparison to other solutions. As a result, the stop criterion is not clear in every problem.
  • In many problems, GAs have a tendency to converge towards local optima or even arbitrary points rather than the global optimum of the problem. This means that it does not "know how" to sacrifice short-term fitness to gain longer-term fitness. The likelihood of this occurring depends on the shape of the fitness landscape: certain problems may provide an easy ascent towards a global optimum, others may make it easier for the function to find the local optima. This problem may be alleviated by using a different fitness function, increasing the rate of mutation, or by using selection techniques that maintain a diverse population of solutions, although the No Free Lunch theorem proves that there is no general solution to this problem. A common technique to maintain diversity is to impose a "niche penalty", wherein, any group of individuals of sufficient similarity (niche radius) have a penalty added, which will reduce the representation of that group in subsequent generations, permitting other (less similar) individuals to be maintained in the population. This trick, however, may not be effective, depending on the landscape of the problem. Another possible technique would be to simply replace part of the population with randomly generated individuals, when most of the population is too similar to each other. Diversity is important in genetic algorithms (and genetic programming) because crossing over a homogeneous population does not yield new solutions. In evolution strategies and evolutionary programming, diversity is not essential because of a greater reliance on mutation.
  • Operating on dynamic data sets is difficult, as genomes begin to converge early on towards solutions which may no longer be valid for later data. Several methods have been proposed to remedy this by increasing genetic diversity somehow and preventing early convergence, either by increasing the probability of mutation when the solution quality drops (called triggered hypermutation), or by occasionally introducing entirely new, randomly generated elements into the gene pool (called random immigrants). Again, evolution strategies and evolutionary programming can be implemented with a so-called "comma strategy" in which parents are not maintained and new parents are selected only from offspring. This can be more effective on dynamic problems.
  • GAs cannot effectively solve problems in which the only fitness measure is a single right/wrong measure (like decision problems), as there is no way to converge on the solution (no hill to climb). In these cases, a random search may find a solution as quickly as a GA. However, if the situation allows the success/failure trial to be repeated giving (possibly) different results, then the ratio of successes to failures provides a suitable fitness measure.
  • For specific optimization problems and problem instances, other optimization algorithms may be more efficient than genetic algorithms in terms of speed of convergence. Alternative and complementary algorithms include evolution strategies, evolutionary programming, simulated annealing, Gaussian adaptation, hill climbing, and swarm intelligence (e.g.: ant colony optimization, particle swarm optimization) and methods based on integer linear programming. The suitability of genetic algorithms is dependent on the amount of knowledge of the problem; well known problems often have better, more specialized approaches.

Variants

Chromosome representation

The simplest algorithm represents each chromosome as a bit string. Typically, numeric parameters can be represented by integers, though it is possible to use floating point representations. The floating point representation is natural to evolution strategies and evolutionary programming. The notion of real-valued genetic algorithms has been offered but is really a misnomer because it does not really represent the building block theory that was proposed by John Henry Holland in the 1970s. This theory is not without support though, based on theoretical and experimental results. The basic algorithm performs crossover and mutation at the bit level. Other variants treat the chromosome as a list of numbers which are indexes into an instruction table, nodes in a linked list, hashes, objects, or any other imaginable data structure. Crossover and mutation are performed so as to respect data element boundaries. For most data types, specific variation operators can be designed. Different chromosomal data types seem to work better or worse for different specific problem domains. When bit-string representations of integers are used, Gray coding is often employed. In this way, small changes in the integer can be readily affected through mutations or crossovers. This has been found to help prevent premature convergence at so called Hamming walls, in which too many simultaneous mutations (or crossover events) must occur in order to change the chromosome to a better solution.

Other approaches involve using arrays of real-valued numbers instead of bit strings to represent chromosomes. Results from the theory of schemata suggest that in general the smaller the alphabet, the better the performance, but it was initially surprising to researchers that good results were obtained from using real-valued chromosomes. This was explained as the set of real values in a finite population of chromosomes as forming a virtual alphabet (when selection and recombination are dominant) with a much lower cardinality than would be expected from a floating point representation.
An expansion of the Genetic Algorithm accessible problem domain can be obtained through more complex encoding of the solution pools by concatenating several types of heterogenously encoded genes into one chromosome. This particular approach allows for solving optimization problems that require vastly disparate definition domains for the problem parameters. For instance, in problems of cascaded controller tuning, the internal loop controller structure can belong to a conventional regulator of three parameters, whereas the external loop could implement a linguistic controller (such as a fuzzy system) which has an inherently different description. This particular form of encoding requires a specialized crossover mechanism that recombines the chromosome by section, and it is a useful tool for the modelling and simulation of complex adaptive systems, especially evolution processes.

Elitism

A practical variant of the general process of constructing a new population is to allow the best organism(s) from the current generation to carry over to the next, unaltered. This strategy is known as elitist selection and guarantees that the solution quality obtained by the GA will not decrease from one generation to the next.

Parallel implementations

Parallel implementations of genetic algorithms come in two flavors. Coarse-grained parallel genetic algorithms assume a population on each of the computer nodes and migration of individuals among the nodes. Fine-grained parallel genetic algorithms assume an individual on each processor node which acts with neighboring individuals for selection and reproduction. Other variants, like genetic algorithms for online optimization problems, introduce time-dependence or noise in the fitness function.

Adaptive GAs

Genetic algorithms with adaptive parameters (adaptive genetic algorithms, AGAs) is another significant and promising variant of genetic algorithms. The probabilities of crossover (pc) and mutation (pm) greatly determine the degree of solution accuracy and the convergence speed that genetic algorithms can obtain. Instead of using fixed values of pc and pm, AGAs utilize the population information in each generation and adaptively adjust the pc and pm in order to maintain the population diversity as well as to sustain the convergence capacity. In AGA (adaptive genetic algorithm), the adjustment of pc and pm depends on the fitness values of the solutions. In CAGA (clustering-based adaptive genetic algorithm), through the use of clustering analysis to judge the optimization states of the population, the adjustment of pc and pm depends on these optimization states. It can be quite effective to combine GA with other optimization methods. GA tends to be quite good at finding generally good global solutions, but quite inefficient at finding the last few mutations to find the absolute optimum. Other techniques (such as simple hill climbing) are quite efficient at finding absolute optimum in a limited region. Alternating GA and hill climbing can improve the efficiency of GA[citation needed] while overcoming the lack of robustness of hill climbing.

This means that the rules of genetic variation may have a different meaning in the natural case. For instance – provided that steps are stored in consecutive order – crossing over may sum a number of steps from maternal DNA adding a number of steps from paternal DNA and so on. This is like adding vectors that more probably may follow a ridge in the phenotypic landscape. Thus, the efficiency of the process may be increased by many orders of magnitude. Moreover, the inversion operator has the opportunity to place steps in consecutive order or any other suitable order in favour of survival or efficiency.

A variation, where the population as a whole is evolved rather than its individual members, is known as gene pool recombination.

A number of variations have been developed to attempt to improve performance of GAs on problems with a high degree of fitness epistasis, i.e. where the fitness of a solution consists of interacting subsets of its variables. Such algorithms aim to learn (before exploiting) these beneficial phenotypic interactions. As such, they are aligned with the Building Block Hypothesis in adaptively reducing disruptive recombination. Prominent examples of this approach include the mGA, GEMGA and LLGA.

Problem domains

Problems which appear to be particularly appropriate for solution by genetic algorithms include timetabling and scheduling problems, and many scheduling software packages are based on GAs. GAs have also been applied to engineering. Genetic algorithms are often applied as an approach to solve global optimization problems.

Genetic algorithms have also been applied to evolving neural networks. In particular recurrent neural networks which are difficult to train with back propagation can be evolved using GAs.

As a general rule of thumb genetic algorithms might be useful in problem domains that have a complex fitness landscape as mixing, i.e., mutation in combination with crossover, is designed to move the population away from local optima that a traditional hill climbing algorithm might get stuck in. Observe that commonly used crossover operators cannot change any uniform population. Mutation alone can provide ergodicity of the overall genetic algorithm process (seen as a Markov chain).

Examples of problems solved by genetic algorithms include: mirrors designed to funnel sunlight to a solar collector, antennae designed to pick up radio signals in space, walking methods for computer figures, optimal design of aerodynamic bodies in complex flowfields.
 
In his Algorithm Design Manual, Skiena advises against genetic algorithms for any task:
[I]t is quite unnatural to model applications in terms of genetic operators like mutation and crossover on bit strings. The pseudobiology adds another level of complexity between you and your problem. Second, genetic algorithms take a very long time on nontrivial problems. [...] [T]he analogy with evolution—where significant progress require [sic] millions of years—can be quite appropriate.

[...]

I have never encountered any problem where genetic algorithms seemed to me the right way to attack it. Further, I have never seen any computational results reported using genetic algorithms that have favorably impressed me. Stick to simulated annealing for your heuristic search voodoo needs.
— Steven Skiena

History

In 1950, Alan Turing proposed a "learning machine" which would parallel the principles of evolution. Computer simulation of evolution started as early as in 1954 with the work of Nils Aall Barricelli, who was using the computer at the Institute for Advanced Study in Princeton, New Jersey. His 1954 publication was not widely noticed. Starting in 1957, the Australian quantitative geneticist Alex Fraser published a series of papers on simulation of artificial selection of organisms with multiple loci controlling a measurable trait. From these beginnings, computer simulation of evolution by biologists became more common in the early 1960s, and the methods were described in books by Fraser and Burnell (1970) and Crosby (1973). Fraser's simulations included all of the essential elements of modern genetic algorithms. In addition, Hans-Joachim Bremermann published a series of papers in the 1960s that also adopted a population of solution to optimization problems, undergoing recombination, mutation, and selection. Bremermann's research also included the elements of modern genetic algorithms. Other noteworthy early pioneers include Richard Friedberg, George Friedman, and Michael Conrad. Many early papers are reprinted by Fogel (1998).

Although Barricelli, in work he reported in 1963, had simulated the evolution of ability to play a simple game, artificial evolution became a widely recognized optimization method as a result of the work of Ingo Rechenberg and Hans-Paul Schwefel in the 1960s and early 1970s – Rechenberg's group was able to solve complex engineering problems through evolution strategies. Another approach was the evolutionary programming technique of Lawrence J. Fogel, which was proposed for generating artificial intelligence. Evolutionary programming originally used finite state machines for predicting environments, and used variation and selection to optimize the predictive logics. Genetic algorithms in particular became popular through the work of John Holland in the early 1970s, and particularly his book Adaptation in Natural and Artificial Systems (1975). His work originated with studies of cellular automata, conducted by Holland and his students at the University of Michigan. Holland introduced a formalized framework for predicting the quality of the next generation, known as Holland's Schema Theorem. Research in GAs remained largely theoretical until the mid-1980s, when The First International Conference on Genetic Algorithms was held in Pittsburgh, Pennsylvania.

Commercial products

In the late 1980s, General Electric started selling the world's first genetic algorithm product, a mainframe-based toolkit designed for industrial processes. In 1989, Axcelis, Inc. released Evolver, the world's first commercial GA product for desktop computers. The New York Times technology writer John Markoff wrote about Evolver in 1990, and it remained the only interactive commercial genetic algorithm until 1995. Evolver was sold to Palisade in 1997, translated into several languages, and is currently in its 6th version.

Related techniques

Parent fields

Genetic algorithms are a sub-field of:

Related fields

Evolutionary algorithms

Evolutionary algorithms is a sub-field of evolutionary computing.
  • Evolution strategies (ES, see Rechenberg, 1994) evolve individuals by means of mutation and intermediate or discrete recombination. ES algorithms are designed particularly to solve problems in the real-value domain. They use self-adaptation to adjust control parameters of the search. De-randomization of self-adaptation has led to the contemporary Covariance Matrix Adaptation Evolution Strategy (CMA-ES).
  • Evolutionary programming (EP) involves populations of solutions with primarily mutation and selection and arbitrary representations. They use self-adaptation to adjust parameters, and can include other variation operations such as combining information from multiple parents.
  • Estimation of Distribution Algorithm (EDA) substitutes traditional reproduction operators by model-guided operators. Such models are learned from the population by employing machine learning techniques and represented as Probabilistic Graphical Models, from which new solutions can be sampled or generated from guided-crossover.
  • Gene expression programming (GEP) also uses populations of computer programs. These complex computer programs are encoded in simpler linear chromosomes of fixed length, which are afterwards expressed as expression trees. Expression trees or computer programs evolve because the chromosomes undergo mutation and recombination in a manner similar to the canonical GA. But thanks to the special organization of GEP chromosomes, these genetic modifications always result in valid computer programs.
  • Genetic programming (GP) is a related technique popularized by John Koza in which computer programs, rather than function parameters, are optimized. Genetic programming often uses tree-based internal data structures to represent the computer programs for adaptation instead of the list structures typical of genetic algorithms.
  • Grouping genetic algorithm (GGA) is an evolution of the GA where the focus is shifted from individual items, like in classical GAs, to groups or subset of items. The idea behind this GA evolution proposed by Emanuel Falkenauer is that solving some complex problems, a.k.a. clustering or partitioning problems where a set of items must be split into disjoint group of items in an optimal way, would better be achieved by making characteristics of the groups of items equivalent to genes. These kind of problems include bin packing, line balancing, clustering with respect to a distance measure, equal piles, etc., on which classic GAs proved to perform poorly. Making genes equivalent to groups implies chromosomes that are in general of variable length, and special genetic operators that manipulate whole groups of items. For bin packing in particular, a GGA hybridized with the Dominance Criterion of Martello and Toth, is arguably the best technique to date.
  • Interactive evolutionary algorithms are evolutionary algorithms that use human evaluation. They are usually applied to domains where it is hard to design a computational fitness function, for example, evolving images, music, artistic designs and forms to fit users' aesthetic preference.

Swarm intelligence

Swarm intelligence is a sub-field of evolutionary computing.
  • Ant colony optimization (ACO) uses many ants (or agents) equipped with a pheromone model to traverse the solution space and find locally productive areas. Although considered an Estimation of distribution algorithm,
  • Particle swarm optimization (PSO) is a computational method for multi-parameter optimization which also uses population-based approach. A population (swarm) of candidate solutions (particles) moves in the search space, and the movement of the particles is influenced both by their own best known position and swarm's global best known position. Like genetic algorithms, the PSO method depends on information sharing among population members. In some problems the PSO is often more computationally efficient than the GAs, especially in unconstrained problems with continuous variables.

Other evolutionary computing algorithms

Evolutionary computation is a sub-field of the metaheuristic methods.
  • Memetic algorithm (MA), often called hybrid genetic algorithm among others, is a population-based method in which solutions are also subject to local improvement phases. The idea of memetic algorithms comes from memes, which unlike genes, can adapt themselves. In some problem areas they are shown to be more efficient than traditional evolutionary algorithms.
  • Bacteriologic algorithms (BA) inspired by evolutionary ecology and, more particularly, bacteriologic adaptation. Evolutionary ecology is the study of living organisms in the context of their environment, with the aim of discovering how they adapt. Its basic concept is that in a heterogeneous environment, there is not one individual that fits the whole environment. So, one needs to reason at the population level. It is also believed BAs could be successfully applied to complex positioning problems (antennas for cell phones, urban planning, and so on) or data mining.
  • Cultural algorithm (CA) consists of the population component almost identical to that of the genetic algorithm and, in addition, a knowledge component called the belief space.
  • Differential search algorithm (DS) inspired by migration of superorganisms.
  • Gaussian adaptation (normal or natural adaptation, abbreviated NA to avoid confusion with GA) is intended for the maximisation of manufacturing yield of signal processing systems. It may also be used for ordinary parametric optimisation. It relies on a certain theorem valid for all regions of acceptability and all Gaussian distributions. The efficiency of NA relies on information theory and a certain theorem of efficiency. Its efficiency is defined as information divided by the work needed to get the information. Because NA maximises mean fitness rather than the fitness of the individual, the landscape is smoothed such that valleys between peaks may disappear. Therefore it has a certain "ambition" to avoid local peaks in the fitness landscape. NA is also good at climbing sharp crests by adaptation of the moment matrix, because NA may maximise the disorder (average information) of the Gaussian simultaneously keeping the mean fitness constant.

Other metaheuristic methods

Metaheuristic methods broadly fall within stochastic optimisation methods.
  • Simulated annealing (SA) is a related global optimization technique that traverses the search space by testing random mutations on an individual solution. A mutation that increases fitness is always accepted. A mutation that lowers fitness is accepted probabilistically based on the difference in fitness and a decreasing temperature parameter. In SA parlance, one speaks of seeking the lowest energy instead of the maximum fitness. SA can also be used within a standard GA algorithm by starting with a relatively high rate of mutation and decreasing it over time along a given schedule.
  • Tabu search (TS) is similar to simulated annealing in that both traverse the solution space by testing mutations of an individual solution. While simulated annealing generates only one mutated solution, tabu search generates many mutated solutions and moves to the solution with the lowest energy of those generated. In order to prevent cycling and encourage greater movement through the solution space, a tabu list is maintained of partial or complete solutions. It is forbidden to move to a solution that contains elements of the tabu list, which is updated as the solution traverses the solution space.
  • Extremal optimization (EO) Unlike GAs, which work with a population of candidate solutions, EO evolves a single solution and makes local modifications to the worst components. This requires that a suitable representation be selected which permits individual solution components to be assigned a quality measure ("fitness"). The governing principle behind this algorithm is that of emergent improvement through selectively removing low-quality components and replacing them with a randomly selected component. This is decidedly at odds with a GA that selects good solutions in an attempt to make better solutions.

Other stochastic optimisation methods

  • The cross-entropy (CE) method generates candidates solutions via a parameterized probability distribution. The parameters are updated via cross-entropy minimization, so as to generate better samples in the next iteration.
  • Reactive search optimization (RSO) advocates the integration of sub-symbolic machine learning techniques into search heuristics for solving complex optimization problems. The word reactive hints at a ready response to events during the search through an internal online feedback loop for the self-tuning of critical parameters. Methodologies of interest for Reactive Search include machine learning and statistics, in particular reinforcement learning, active or query learning, neural networks, and metaheuristics.

Speciation

From Wikipedia, the free encyclopedia

Speciation is the evolutionary process by which populations evolve to become distinct species. The biologist Orator F. Cook coined the term in 1906 for cladogenesis, the splitting of lineages, as opposed to anagenesis, phyletic evolution within lineages. Charles Darwin was the first to describe the role of natural selection in speciation in his 1859 book The Origin of Species. He also identified sexual selection as a likely mechanism, but found it problematic.

There are four geographic modes of speciation in nature, based on the extent to which speciating populations are isolated from one another: allopatric, peripatric, parapatric, and sympatric. Speciation may also be induced artificially, through animal husbandry, agriculture, or laboratory experiments. Whether genetic drift is a minor or major contributor to speciation is the subject matter of much ongoing discussion.

Rapid sympatric speciation can take place through polyploidy, such as by doubling of chromosome number; the result is progeny which are immediately reproductively isolated from the parent population. New species can also be created through hybridisation followed, if the hybrid is favoured by natural selection, by reproductive isolation.

Historical background

In addressing the question of the origin of species, there are two key issues: (1) what are the evolutionary mechanisms of speciation, and (2) what accounts for the separateness and individuality of species in the biota? Since Charles Darwin's time, efforts to understand the nature of species have primarily focused on the first aspect, and it is now widely agreed that the critical factor behind the origin of new species is reproductive isolation. Next we focus on the second aspect of the origin of species.

Darwin's dilemma: Why do species exist?

In On the Origin of Species (1859), Darwin interpreted biological evolution in terms of natural selection, but was perplexed by the clustering of organisms into species. Chapter 6 of Darwin's book is entitled "Difficulties of the Theory." In discussing these "difficulties" he noted "Firstly, why, if species have descended from other species by insensibly fine gradations, do we not everywhere see innumerable transitional forms? Why is not all nature in confusion instead of the species being, as we see them, well defined?" This dilemma can be referred to as the absence or rarity of transitional varieties in habitat space.

Another dilemma, related to the first one, is the absence or rarity of transitional varieties in time. Darwin pointed out that by the theory of natural selection "innumerable transitional forms must have existed," and wondered "why do we not find them embedded in countless numbers in the crust of the earth." That clearly defined species actually do exist in nature in both space and time implies that some fundamental feature of natural selection operates to generate and maintain species.

The effect of sexual reproduction on species formation

It has been argued that the resolution of Darwin's first dilemma lies in the fact that out-crossing sexual reproduction has an intrinsic cost of rarity. The cost of rarity arises as follows. If, on a resource gradient, a large number of separate species evolve, each exquisitely adapted to a very narrow band on that gradient, each species will, of necessity, consist of very few members. Finding a mate under these circumstances may present difficulties when many of the individuals in the neighborhood belong to other species. Under these circumstances, if any species’ population size happens, by chance, to increase (at the expense of one or other of its neighboring species, if the environment is saturated), this will immediately make it easier for its members to find sexual partners. The members of the neighboring species, whose population sizes have decreased, experience greater difficulty in finding mates, and therefore form pairs less frequently than the larger species. This has a snowball effect, with large species growing at the expense of the smaller, rarer species, eventually driving them to extinction. Eventually, only a few species remain, each distinctly different from the other. The cost of rarity not only involves the costs of failure to find a mate, but also indirect costs such as the cost of communication in seeking out a partner at low population densities.

African pygmy kingfisher, showing coloration shared by all adults of that species to a high degree of fidelity.
 
Rarity brings with it other costs. Rare and unusual features are very seldom advantageous. In most instances, they indicate a (non-silent) mutation, which is almost certain to be deleterious. It therefore behooves sexual creatures to avoid mates sporting rare or unusual features (koinophilia). Sexual populations therefore rapidly shed rare or peripheral phenotypic features, thus canalizing the entire external appearance, as illustrated in the accompanying illustration of the African pygmy kingfisher, Ispidina picta. This uniformity of all the adult members of a sexual species has stimulated the proliferation of field guides on birds, mammals, reptiles, insects, and many other taxa, in which a species can be described with a single illustration (or two, in the case of sexual dimorphism). Once a population has become as homogeneous in appearance as is typical of most species (and is illustrated in the photograph of the African pygmy kingfisher), its members will avoid mating with members of other populations that look different from themselves. Thus, the avoidance of mates displaying rare and unusual phenotypic features inevitably leads to reproductive isolation, one of the hallmarks of speciation.

In the contrasting case of organisms that reproduce asexually, there is no cost of rarity; consequently, there are only benefits to fine-scale adaptation. Thus, asexual organisms very frequently show the continuous variation in form (often in many different directions) that Darwin expected evolution to produce, making their classification into "species" (more correctly, morphospecies) very difficult.

Modes


All forms of natural speciation have taken place over the course of evolution; however, debate persists as to the relative importance of each mechanism in driving biodiversity.

One example of natural speciation is the diversity of the three-spined stickleback, a marine fish that, after the last glacial period, has undergone speciation into new freshwater colonies in isolated lakes and streams. Over an estimated 10,000 generations, the sticklebacks show structural differences that are greater than those seen between different genera of fish including variations in fins, changes in the number or size of their bony plates, variable jaw structure, and color differences.

Allopatric

During allopatric (from the ancient Greek allos, "other" + patrā, "fatherland") speciation, a population splits into two geographically isolated populations (for example, by habitat fragmentation due to geographical change such as mountain formation). The isolated populations then undergo genotypic or phenotypic divergence as: (a) they become subjected to dissimilar selective pressures; (b) they independently undergo genetic drift; (c) different mutations arise in the two populations. When the populations come back into contact, they have evolved such that they are reproductively isolated and are no longer capable of exchanging genes. Island genetics is the term associated with the tendency of small, isolated genetic pools to produce unusual traits. Examples include insular dwarfism and the radical changes among certain famous island chains, for example on Komodo. The Galápagos Islands are particularly famous for their influence on Charles Darwin. During his five weeks there he heard that Galápagos tortoises could be identified by island, and noticed that finches differed from one island to another, but it was only nine months later that he reflected that such facts could show that species were changeable. When he returned to England, his speculation on evolution deepened after experts informed him that these were separate species, not just varieties, and famously that other differing Galápagos birds were all species of finches. Though the finches were less important for Darwin, more recent research has shown the birds now known as Darwin's finches to be a classic case of adaptive evolutionary radiation.

Peripatric

In peripatric speciation, a subform of allopatric speciation, new species are formed in isolated, smaller peripheral populations that are prevented from exchanging genes with the main population. It is related to the concept of a founder effect, since small populations often undergo bottlenecks. Genetic drift is often proposed to play a significant role in peripatric speciation.

Case studies include Mayr's investigation of bird fauna; the Australian bird Petroica multicolor; and reproductive isolation in populations of Drosophila subject to population bottlenecking.

Parapatric

In parapatric speciation, there is only partial separation of the zones of two diverging populations afforded by geography; individuals of each species may come in contact or cross habitats from time to time, but reduced fitness of the heterozygote leads to selection for behaviours or mechanisms that prevent their interbreeding. Parapatric speciation is modelled on continuous variation within a "single," connected habitat acting as a source of natural selection rather than the effects of isolation of habitats produced in peripatric and allopatric speciation.

Parapatric speciation may be associated with differential landscape-dependent selection. Even if there is a gene flow between two populations, strong differential selection may impede assimilation and different species may eventually develop. Habitat differences may be more important in the development of reproductive isolation than the isolation time. Caucasian rock lizards Darevskia rudis, D. valentini and D. portschinskii all hybridize with each other in their hybrid zone; however, hybridization is stronger between D. portschinskii and D. rudis, which separated earlier but live in similar habitats than between D. valentini and two other species, which separated later but live in climatically different habitats.

Ecologists refer to parapatric and peripatric speciation in terms of ecological niches. A niche must be available in order for a new species to be successful. Ring species such as Larus gulls have been claimed to illustrate speciation in progress, though the situation may be more complex. The grass Anthoxanthum odoratum may be starting parapatric speciation in areas of mine contamination.

Sympatric


Sympatric speciation is the formation of two or more descendant species from a single ancestral species all occupying the same geographic location.

Often-cited examples of sympatric speciation are found in insects that become dependent on different host plants in the same area.

The best illustrated example of sympatric speciation is that of the cichlids of East Africa inhabiting the Rift Valley lakes, particularly Lake Victoria, Lake Malawi and Lake Tanganyika. There are over 800 described species, and according to estimates, there could be well over 1,600 species in the region. Their evolution is cited as an example of both natural and sexual selection. A 2008 study suggests that sympatric speciation has occurred in Tennessee cave salamanders. Sympatric speciation driven by ecological factors may also account for the extraordinary diversity of crustaceans living in the depths of Siberia's Lake Baikal.

Budding speciation has been proposed as a particular form of sympatric speciation, whereby small groups of individuals become progressively more isolated from the ancestral stock by breeding preferentially with one another. This type of speciation would be driven by the conjunction of various advantages of inbreeding such as the expression of advantageous recessive phenotypes, reducing the recombination load, and reducing the cost of sex

Rhagoletis pomonella, the hawthorn fly, appears to be in the process of sympatric speciation.

The hawthorn fly (Rhagoletis pomonella), also known as the apple maggot fly, appears to be undergoing sympatric speciation. Different populations of hawthorn fly feed on different fruits. A distinct population emerged in North America in the 19th century some time after apples, a non-native species, were introduced. This apple-feeding population normally feeds only on apples and not on the historically preferred fruit of hawthorns. The current hawthorn feeding population does not normally feed on apples. Some evidence, such as that six out of thirteen allozyme loci are different, that hawthorn flies mature later in the season and take longer to mature than apple flies; and that there is little evidence of interbreeding (researchers have documented a 4-6% hybridization rate) suggests that sympatric speciation is occurring.

Methods of selection

Reinforcement

Reinforcement assists speciation by selecting against hybrids.

Reinforcement, sometimes referred to as the Wallace effect, is the process by which natural selection increases reproductive isolation. It may occur after two populations of the same species are separated and then come back into contact. If their reproductive isolation was complete, then they will have already developed into two separate incompatible species. If their reproductive isolation is incomplete, then further mating between the populations will produce hybrids, which may or may not be fertile. If the hybrids are infertile, or fertile but less fit than their ancestors, then there will be further reproductive isolation and speciation has essentially occurred (e.g., as in horses and donkeys).

The reasoning behind this is that if the parents of the hybrid offspring each have naturally selected traits for their own certain environments, the hybrid offspring will bear traits from both, therefore would not fit either ecological niche as well as either parent. The low fitness of the hybrids would cause selection to favor assortative mating, which would control hybridization. This is sometimes called the Wallace effect after the evolutionary biologist Alfred Russel Wallace who suggested in the late 19th century that it might be an important factor in speciation.

Conversely, if the hybrid offspring are more fit than their ancestors, then the populations will merge back into the same species within the area they are in contact.

Reinforcement favoring reproductive isolation is required for both parapatric and sympatric speciation. Without reinforcement, the geographic area of contact between different forms of the same species, called their "hybrid zone," will not develop into a boundary between the different species. Hybrid zones are regions where diverged populations meet and interbreed. Hybrid offspring are very common in these regions, which are usually created by diverged species coming into secondary contact. Without reinforcement, the two species would have uncontrollable inbreeding. Reinforcement may be induced in artificial selection experiments as described below.

Ecological

Ecological selection is "the interaction of individuals with their environment during resource acquisition". Natural selection is inherently involved in the process of speciation, whereby, "under ecological speciation, populations in different environments, or populations exploiting different resources, experience contrasting natural selection pressures on the traits that directly or indirectly bring about the evolution of reproductive isolation". Evidence for the role ecology plays in the process of speciation exists. Studies of stickleback populations support ecologically-linked speciation arising as a by-product, alongside numerous studies of parallel speciation, where isolation evolves between independent populations of species adapting to contrasting environments than between independent populations adapting to similar environments. Ecological speciation occurs with much of the evidence, "...accumulated from top-down studies of adaptation and reproductive isolation".

Sexual selection

It is widely appreciated that sexual selection could drive speciation in many clades, independently of natural selection. However the term “speciation”, in this context, tends to be used in two different, but not mutually exclusive senses. The first and most commonly used sense refers to the “birth” of new species. That is, the splitting of an existing species into two separate species, or the budding off of a new species from a parent species, both driven by a biological "fashion fad" (a preference for a feature, or features, in one or both sexes, that do not necessarily have any adaptive qualities). In the second sense, "speciation" refers to the wide-spread tendency of sexual creatures to be grouped into clearly defined species, rather than forming a continuum of phenotypes both in time and space - which would be the more obvious or logical consequence of natural selection. This was indeed recognized by Darwin as problematic, and included in his On the Origin of Species (1859), under the heading "Difficulties with the Theory". There are several suggestions as to how mate choice might play a significant role in resolving Darwin’s dilemma.

Artificial speciation

Gaur (Indian bison) can interbreed with domestic cattle.
 

New species have been created by animal husbandry, but the dates and methods of the initiation of such species are not clear. Often, the domestic counterpart of the wild ancestor can still interbreed and produce fertile offspring as in the case of domestic cattle, that can be considered the same species as several varieties of wild ox, gaur, yak, etc., or domestic sheep that can interbreed with the mouflon.

The best-documented creations of new species in the laboratory were performed in the late 1980s. William R. Rice and George W. Salt bred Drosophila melanogaster fruit flies using a maze with three different choices of habitat such as light/dark and wet/dry. Each generation was placed into the maze, and the groups of flies that came out of two of the eight exits were set apart to breed with each other in their respective groups. After thirty-five generations, the two groups and their offspring were isolated reproductively because of their strong habitat preferences: they mated only within the areas they preferred, and so did not mate with flies that preferred the other areas. The history of such attempts is described by Rice and Elen E. Hostert (1993). Diane Dodd used a laboratory experiment to show how reproductive isolation can evolve in Drosophila pseudoobscura fruit flies after several generations by placing them in different media, starch- and maltose-based media.

Drosophila speciation experiment.svg

Dodd's experiment has been easy for many others to replicate, including with other kinds of fruit flies and foods. Research in 2005 has shown that this rapid evolution of reproductive isolation may in fact be a relic of infection by Wolbachia bacteria.

Alternatively, these observations are consistent with the notion that sexual creatures are inherently reluctant to mate with individuals whose appearance or behavior is different from the norm. The risk that such deviations are due to heritable maladaptations is very high. Thus, if a sexual creature, unable to predict natural selection's future direction, is conditioned to produce the fittest offspring possible, it will avoid mates with unusual habits or features. Sexual creatures will then inevitably tend to group themselves into reproductively isolated species.

Genetics

Few speciation genes have been found. They usually involve the reinforcement process of late stages of speciation. In 2008, a speciation gene causing reproductive isolation was reported. It causes hybrid sterility between related subspecies. The order of speciation of three groups from a common ancestor may be unclear or unknown; a collection of three such species is referred to as a "trichotomy."

Speciation via polyploidy

Speciation via polyploidy: A diploid cell undergoes failed meiosis, producing diploid gametes, which self-fertilize to produce a tetraploid zygote. In plants, this can effectively be a new species, reproductively isolated from its parents, and able to reproduce.

Polyploidy is a mechanism that has caused many rapid speciation events in sympatry because offspring of, for example, tetraploid x diploid matings often result in triploid sterile progeny. However, not all polyploids are reproductively isolated from their parental plants, and gene flow may still occur for example through triploid hybrid x diploid matings that produce tetraploids, or matings between meiotically unreduced gametes from diploids and gametes from tetraploids (see also hybrid speciation).

It has been suggested that many of the existing plant and most animal species have undergone an event of polyploidization in their evolutionary history. Reproduction of successful polyploid species is sometimes asexual, by parthenogenesis or apomixis, as for unknown reasons many asexual organisms are polyploid. Rare instances of polyploid mammals are known, but most often result in prenatal death.

Hybrid speciation

Hybridization between two different species sometimes leads to a distinct phenotype. This phenotype can also be fitter than the parental lineage and as such natural selection may then favor these individuals. Eventually, if reproductive isolation is achieved, it may lead to a separate species. However, reproductive isolation between hybrids and their parents is particularly difficult to achieve and thus hybrid speciation is considered an extremely rare event. The Mariana mallard is thought to have arisen from hybrid speciation.

Hybridization is an important means of speciation in plants, since polyploidy (having more than two copies of each chromosome) is tolerated in plants more readily than in animals. Polyploidy is important in hybrids as it allows reproduction, with the two different sets of chromosomes each being able to pair with an identical partner during meiosis. Polyploids also have more genetic diversity, which allows them to avoid inbreeding depression in small populations.

Hybridization without change in chromosome number is called homoploid hybrid speciation. It is considered very rare but has been shown in Heliconius butterflies  and sunflowers. Polyploid speciation, which involves changes in chromosome number, is a more common phenomenon, especially in plant species.

Gene transposition

Theodosius Dobzhansky, who studied fruit flies in the early days of genetic research in 1930s, speculated that parts of chromosomes that switch from one location to another might cause a species to split into two different species. He mapped out how it might be possible for sections of chromosomes to relocate themselves in a genome. Those mobile sections can cause sterility in inter-species hybrids, which can act as a speciation pressure. In theory, his idea was sound, but scientists long debated whether it actually happened in nature. Eventually a competing theory involving the gradual accumulation of mutations was shown to occur in nature so often that geneticists largely dismissed the moving gene hypothesis. However, 2006 research shows that jumping of a gene from one chromosome to another can contribute to the birth of new species. This validates the reproductive isolation mechanism, a key component of speciation.

Rates

Phyletic gradualism, above, consists of relatively slow change over geological time. Punctuated equilibrium, bottom, consists of morphological stability and rare, relatively rapid bursts of evolutionary change.

There is debate as to the rate at which speciation events occur over geologic time. While some evolutionary biologists claim that speciation events have remained relatively constant and gradual over time (known as "Phyletic gradualism" - see diagram), some palaeontologists such as Niles Eldredge and Stephen Jay Gould have argued that species usually remain unchanged over long stretches of time, and that speciation occurs only over relatively brief intervals, a view known as punctuated equilibrium.

Punctuated evolution

Evolution can be extremely rapid, as shown in the creation of domesticated animals and plants in a very short geological space of time, spanning only a few tens of thousands of years. Maize (Zea mays), for instance, was created in Mexico in only a few thousand years, starting about 7,000 to 12,000 years ago. This raises the question of why the long term rate of evolution is far slower than is theoretically possible.

Plants and domestic animals can differ markedly from their wild ancestors
 
Top: wild teosinte; middle: maize-teosinte hybrid; bottom: maize
 


































Evolution is imposed on species or groups. It is not planned or striven for in some Lamarckist way. The mutations on which the process depends are random events, and, except for the "silent mutations" which do not affect the functionality or appearance of the carrier, are thus usually disadvantageous, and their chance of proving to be useful in the future is vanishingly small. Therefore, while a species or group might benefit from being able to adapt to a new environment by accumulating a wide range of genetic variation, this is to the detriment of the individuals who have to carry these mutations until a small, unpredictable minority of them ultimately contributes to such an adaptation. Thus, the capability to evolve would require group selection, a concept discredited by (for example) George C. Williams, John Maynard Smith and Richard Dawkins as selectively disadvantageous to the individual.

The resolution to Darwin's second dilemma might thus come about as follows:

If sexual individuals are disadvantaged by passing mutations on to their offspring, they will avoid mutant mates with strange or unusual characteristics. Mutations that affect the external appearance of their carriers will then rarely be passed on to the next and subsequent generations. They would therefore seldom be tested by natural selection. Evolution is, therefore, effectively halted or slowed down considerably. The only mutations that can accumulate in a population, on this punctuated equilibrium view, are ones that have no noticeable effect on the outward appearance and functionality of their bearers (i.e., they are "silent" or "neutral mutations," which can be, and are, used to trace the relatedness and age of populations and species.) This argument implies that evolution can only occur if mutant mates cannot be avoided, as a result of a severe scarcity of potential mates. This is most likely to occur in small, isolated communities. These occur most commonly on small islands, in remote valleys, lakes, river systems, or caves, or during the aftermath of a mass extinction. Under these circumstances, not only is the choice of mates severely restricted but population bottlenecks, founder effects, genetic drift and inbreeding cause rapid, random changes in the isolated population's genetic composition. Furthermore, hybridization with a related species trapped in the same isolate might introduce additional genetic changes. If an isolated population such as this survives its genetic upheavals, and subsequently expands into an unoccupied niche, or into a niche in which it has an advantage over its competitors, a new species, or subspecies, will have come in being. In geological terms this will be an abrupt event. A resumption of avoiding mutant mates will thereafter result, once again, in evolutionary stagnation.

In apparent confirmation of this punctuated equilibrium view of evolution, the fossil record of an evolutionary progression typically consists of species that suddenly appear, and ultimately disappear, hundreds of thousands or millions of years later, without any change in external appearance. Graphically, these fossil species are represented by lines parallel with the time axis, whose lengths depict how long each of them existed. The fact that the lines remain parallel with the time axis illustrates the unchanging appearance of each of the fossil species depicted on the graph. During each species' existence new species appear at random intervals, each also lasting many hundreds of thousands of years before disappearing without a change in appearance. The exact relatedness of these concurrent species is generally impossible to determine. This is illustrated in the diagram depicting the distribution of hominin species through time since the hominins separated from the line that led to the evolution of our closest living primate relatives, the chimpanzees.

Green development

From Wikipedia, the free encyclopedia https://en.wikipedia.org/w...