From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Pan-genome

In the fields of molecular biology and genetics, a pan-genome (or supragenome) is the entire set of genes for all strains within a clade. The pan-genome includes: the core genome containing genes present in all strains within the clade, the accessory genome containing 'dispensable' genes present in a subset of the strains, and strain-specific genes. The study of the pan-genome is called pangenomics.

Some species have open (or extensive) pan-genomes, while others have closed pan-genomes. For species with a closed pan-genome, very few genes are added per sequenced genome (after sequencing many strains), and the size of the full pan-genome can be theoretically predicted. Species with an open pan-genome have enough genes added per additional sequenced genome that predicting the size of the full pan-genome is impossible. Population size and niche versatility have been suggested as the most influential factors in determining pan-genome size.

Pan-genomes were originally constructed for species of bacteria and archaea, but more recently eukaryotic pan-genomes have been developed, particularly for plant species. Plant studies have shown that pan-genome dynamics are linked to transposable elements. The significance of the pan-genome arises in an evolutionary context, especially with relevance to metagenomics, but is also used in a broader genomics context.

History