When a match is made from a national DNA database to link a crime
 scene to a person whose DNA profile is stored on a database, that link 
is often referred to as a cold hit.  A cold hit is of particular 
value in linking a specific person to a crime scene, but is of less 
evidential value than a DNA match made without the use of a DNA 
database. Research shows that DNA databases of criminal offenders reduce crime rates.
Types
Forensic
A
 centralised DNA database for storing DNA profiles of individuals that 
enables searching and comparing of DNA samples collected from a crime 
scene against stored profiles. The most important function of the forensic
 database is to produce matches between the suspected individual and 
crime scene bio-markers, and then provides evidence to support criminal 
investigations, and also leads to identify potential suspects in the 
criminal investigation. Majority of the National DNA databases are used 
for forensic purposes.
The Interpol
 DNA database is used in criminal investigations. Interpol maintains an 
automated DNA database called DNA Gateway that contains DNA profiles 
submitted by member countries collected from crime scenes, missing 
persons, and unidentified bodies.
 The DNA Gateway was established in 2002, and at the end of 2013, it had
 more than 140,000 DNA profiles from 69 member countries. Unlike other 
DNA databases, DNA Gateway is only used for information sharing and 
comparison, it does not link a DNA profile to any individual, and the 
physical or psychological conditions of an individual are not included 
in the database.
Genealogical
A
 national or forensic DNA database is not available for non-police 
purposes. DNA profiles can also be used for genealogical purposes, so 
that a separate genetic genealogy database needs to be created that stores DNA profiles of genealogical DNA test
 results. GenBank is a public genetic genealogy database that stores 
genome sequences submitted by many genetic genealogists. Until now, 
GenBank has contained large number of DNA sequences gained from more 
than 140,000 registered organizations, and is updated every day to 
ensure a uniform and comprehensive collection of sequence information. 
These databases are mainly obtained from individual laboratories or 
large-scale sequencing projects. The files stored in GenBank are divided
 into different groups, such as BCT (bacterial), VRL (viruses), PRI 
(primates)…etc. People can access GenBank from NCBI’s retrieval system, 
and then use “BLAST” function to identify a certain sequence within the 
GenBank or to find the similarities between two sequences.
Medical
A 
medical DNA database is a DNA database of medically relevant genetic 
variations. It collects an individual's DNA which can reflect their 
medical records and lifestyle details. Through recording DNA profiles, 
scientists may find out the interactions between the genetic environment
 and occurrence of certain diseases (such as cardiovascular disease or 
cancer), and thus finding some new drugs or effective treatments in 
controlling these diseases. It is often collaborated with the National 
Health Service.
National
A 
national DNA database is a DNA database maintained by the government for
 storing DNA profiles of its population. Each DNA profile based on PCR 
and uses STR (Short Tandem Repeats) analysis. They are generally used 
for forensic purposes which includes searching and matching of DNA 
profiles of potential criminal suspects.
In 2009 Interpol reported there were 54 police national DNA databases in the world at the time and 26 more countries planned to start one. In Europe Interpol reported there were 31 national DNA databases and six more planned.
 The European Network of Forensic Science Institutes (ENFSI) DNA working
 group made 33 recommendations in 2014 for DNA database management and 
guidelines for auditing DNA databases. Other countries have adopted privately developed DNA databases, such as Qatar, which has adopted Bode dbSEARCH.
Typically, a tiny subset of the individual's genome is sampled from 13 or 16 regions that have high individuation.
United Kingdom
The first national DNA database in the United Kingdom was established in April 1995, called National DNA Database
 (NDNAD). By 2006, it contained 2.7 million DNA profiles (about 5.2% of 
the UK population), as well as other information from individuals and 
crime scenes. This had increased to 5.7 million profiles by 2015. The information is stored in the form of a digital code, which is based on the nomenclature of each STR.
 In the UK, police have wide-ranging powers to take DNA samples and 
retain them if the subject is convicted of a recordable offence.
 As the large amount of DNA profiles which have been stored in NDNAD, 
"cold hits" may happen during the DNA matching, which means finding an 
unexpected match between an individual's DNA profile and an unsolved 
crime-scene DNA profile. This can introduce a new suspect into the 
investigation, thus helping to solve the old cases.
In England and Wales, anyone arrested on suspicion of a recordable offence
 must submit a DNA sample, the profile of which is then stored on the 
DNA database. Those not charged or not found guilty have their DNA data 
deleted within a specified period of time. In Scotland, the law similarly requires the DNA profiles of most people who are acquitted be removed from the database.
New Zealand
New Zealand was the second country to set up a DNA database.
United States
The United States national DNA database is called Combined DNA Index System
 (CODIS). It is maintained at three levels: national, state and local. 
Each level implemented its own DNA index system. The  national DNA index
 system (NDIS) allows DNA profiles to be exchanged and compared between 
participated laboratories nationally. Each state DNA index system (SDIS)
 allows DNA profiles to be exchanged and compared between the 
laboratories of various states and the local DNA index system (LDIS) 
allows DNA profiles collected at local sites and uploaded to SDIS and 
NDIS. 
CODIS software integrates and connects all the DNA index systems 
at the three levels. CODIS is installed on each participating laboratory
 site and uses a standalone network known as Criminal Justice 
Information Systems Wide Area Network (CJIS WAN)
 to connect to other laboratories. In order to decrease the number of 
irrelevant matches at NDIS, the Convicted Offender Index requires all 13
 CODIS STRs to be present for a profile upload.  Forensic profiles only require 10 of the STRs to be present for an upload.
As of 2011, over 9 million records were held within CODIS. As of March 2011, 361,176 forensic profiles and 9,404,747 offender profiles have been accumulated,
 making it the largest DNA database in the world.  As of the same date, 
CODIS has produced over 138,700 matches to requests, assisting in more 
than 133,400 investigations.
The growing public approval of DNA databases has seen the creation and expansion of many states' own DNA databases.  California currently maintains the third largest DNA database in the world.  Political measures such as California Proposition 69
 (2004), which increased the scope of the DNA database, have already met
 with a significant increase in numbers of investigations aided. 
Forty-nine states in the USA, all apart from Idaho, store DNA profiles of violent offenders, and many also store profiles of suspects.
 A 2017 study showed that DNA databases in U.S. states "deter crime by 
profiled offenders, reduce crime rates, and are more cost-effective than
 traditional law enforcement tools".
CODIS is also used to help find missing persons and identify human remains.  It is connected to the National Missing Persons DNA Database; samples provided by family members are sequenced by the University of North Texas Center for Human Identification, which also runs the National Missing and Unidentified Persons System.  UNTCHI can sequence both nuclear and mitochondrial DNA.
The Department of Defense maintains a DNA database to identify the remains of servicemembers.  The Department of Defense Serum Repository
 maintains more than 50,000,000 records, primarily to assist in the 
identification of human remains.  Submission of DNA samples is mandatory
 for US servicemen, but the database also includes information on 
military dependents.  The National Defense Authorization Act of 2003 
provided a means for federal courts or military judges to order the use 
of the DNA information collected to be made available for the purpose of
 investigation or prosecution of a felony, or any sexual offense, for 
which no other source of DNA information is reasonably available. 
Australia
The Australian
 national DNA database is called the National Criminal Investigation DNA
 Database (NCIDD). By July 2018, it contained 837,000+ DNA profiles. The database used 9 STR loci and a sex gene for analysis, and this was increased to 18 core markers in 2013. NCIDD combines all forensic data, including DNA profiles, advanced bio-metrics or cold cases.
Canada
The 
Canadian national DNA database is called the National DNA Data Bank 
(NDDB) which was established  in 1998 but first used in 2000.
 The legislation that Parliament enacted to govern the use of this 
technology within the criminal justice system has been found by Canadian
 courts to be respectful of the constitutional and privacy rights of 
suspects, and of persons found guilty of designated offences.
On December 11, 1999, The Canadian Government agreed upon the DNA Identification Act.
 This would allow a Canadian DNA data bank to be created and amended for
 the criminal code. This provides a mechanism for judges to request the 
offender to provide blood, buccal swabs, or hair samples from DNA 
profiles. This legislation became official on June 29, 2000. Canadian 
police has been using forensic DNA evidence for over a decade. It has 
become one of the most powerful tools available to law enforcement 
agencies for the administration of justice.
NDDB consists of two indexes: the Convicted Offender Index (COI) 
and National Crime Scene Index (CSI-nat). There is also the Local Crime 
Scene Index (CSI-loc) which is maintained by local laboratories but not 
NDDB as local DNA profiles do not meet NDDB collection criteria. Another
 National Crime Scene Index (CSI-nat) is a collection of three labs 
operated by Royal Canadian Mounted Police (RCMP), Laboratory Sciences 
Judiciary Medicine Legal (LSJML) and Center of Forensic Sciences (CFS).
Dubai
In 2017 Dubai announced an initiative called Dubai 10X which was planned to create 'disruptive innovation' into the country.
 One of the projects in this initiative was a DNA database that would 
collect the genomes of all 3 million citizens of the country over a 
10-year period.  It was intended to use the data base for finding 
genetic causes of diseases and creating personalised medical treatments.
Germany
Germany set up its DNA database for the German Federal Police (BKA) in 1998.
In late 2010, the database contained DNA profiles of over 700,000 
individuals and in September 2016 it contained 1,162,304 entries.
 On 23 May 2011 in the "Stop the DNA Collection Frenzy!" campaign 
various civil rights and data protection organizations handed an open 
letter to the German minister of justice Sabine Leutheusser-Schnarrenberger
 asking her to take action in order to stop the "preventive expansion of
 DNA data-collection" and the "preemptive use of mere suspicions and of 
the state apparatus against individuals" and to cancel projects of 
international exchange of DNA data at the European and transatlantic 
level.
Israel
The Israeli national DNA database is called the Israel Police DNA Index System (IPDIS)
 which was established in 2007, and has a collection of more than 
135,000 DNA profiles. The collection includes DNA profiles from 
suspected and accused persons and convicted offenders. The Israeli 
database also include an “elimination bank” of profiles from laboratory 
staff and other police personnel who may have contact with the forensic 
evidence in the course of their work.
In order to handle the high throughput processing and analysis of
 DNA samples from FTA cards, the Israeli Police DNA database has 
established a semi-automated program LIMS, which enables a small number 
of police to finish processing a large number of samples in a relatively
 small period of time, and it is also responsible for the future 
tracking of samples.
Kuwait
The 
Kuwaiti government passed a law in July 2015 requiring all citizens and 
permanent residents (4.2 million people) to have their DNA taken for a 
national database.  The reason for this law was security concerns after the ISIS suicide bombing of the Imam Sadiq mosque.   They planned to finish collecting the DNA by September 2016 which outside observers thought was optimistic.
 In October 2017 the Kuwait constitutional court struck down the law 
saying it was an invasion of personal privacy and the project was 
cancelled.
Brazil
In 1998, the Forensic DNA Research Institute of Federal District Civil Police created DNA databases of sexual assault evidence. In 2012, Brazil
 approved a national law establishing DNA databases at state and 
national levels regarding DNA typing of individuals convicted of violent
 crimes. Following the decree of the Presidency of the Republic of Brazil in 2013, which regulates the 2012 law, Brazil began using CODIS in addition to the DNA databases of sexual assault evidence to solve sexual assault crimes in Brazil.
France
France set up the DNA database called FNAEG in 1998. By December 2009, there were 1.27 million profiles on FNAEG.
Other European countries
In Sweden, only the DNA profiles of criminals who have spent more than two years in prison are stored. In Norway and Germany,
 court orders are required, and are only available, respectively, for 
serious offenders and for those convicted of certain offences and who 
are likely to reoffend.  Austria started a criminal DNA database in 1997 and Italy also set one up in 2016 Switzerland started a temporary criminal DNA database in 2000 and confirmed it in law in 2005.
In 2005 the incoming Portuguese government proposed to introduce a DNA database of the entire population of Portugal. However, after informed debate including opinion from the Portuguese Ethics Council the database introduced was of just the criminal population.
Corporate
- 23andme's DNA database contains genetic information of over 1,000,000 people worldwide. The company explores selling the "anonymous aggregated genetic data" to other researchers and pharmaceutical companies for research purposes if patients give their consent. Ahmad Hariri, professor of psychology and neuroscience at Duke University who has been using 23andMe in his research since 2009 states that the most important aspect of the company's new service is that it makes genetic research accessible and relatively cheap for scientists. A study that identified 15 genome sites linked to depression in 23andMe's database lead to a surge in demands to access the repository with 23andMe fielding nearly 20 requests to access the depression data in the two weeks after publication of the paper.
 
Compression
DNA databases occupy more storage when compared to other non DNA databases due to the enormous size of each DNA sequence.
 Every year DNA databases grow exponentially. This poses a major 
challenge to the storage, data transfer, retrieval and search of these 
databases. To address these challenges DNA databases are compressed to 
save storage space and bandwidth during the data transfers. They are 
decompressed during search and retrieval. Various compression algorithms
 are used to compress and decompress. The efficiency of any compression 
algorithm depends how well and fast it compresses and decompresses, 
which is generally measured in compression ratio. The greater the 
compression ratio, the better the efficiency of an algorithm. At the 
same time, the speed of compression and decompression are also 
considered for evaluation.
DNA sequences contain palindromic repetitions of A, C, T, G. 
Compression of these sequences involve locating and encoding these 
repetitions and decoding them during decompression. 
Some approaches used to encode and decode are:
- Huffman Encoding
 - Adaptive Huffman Encoding
 - Arithmetic coding
 - Arithmetic coding
 - Context tree weighting (CTW) method
 
The compression algorithms listed below may use one of the above encoding approaches to compress and decompress DNA database
- Compression using Redundancy of DNA sets (COMRAD)
 - Relative Lempel-Ziv (RLZ)
 - GenCompress
 - BioCompress
 - DNACompress
 - CTW+LZ
 
In 2012, a team of scientists from Johns Hopkins University published
 the first genetic compression algorithm that does not rely on external 
genetic databases for compression. HAPZIPPER was tailored for HapMap
 data and achieves over 20-fold compression (95% reduction in file 
size), providing 2- to 4-fold better compression much faster than 
leading general-purpose compression utilities.
Genomic sequence compression algorithms, also known as DNA 
sequence compressors, explore the fact that DNA sequences have 
characteristic properties, such as inverted repeats. The most successful
 compressors are XM and GeCo. For eukaryotes
 XM is slightly better in compression ratio, though for sequences larger
 than 100 MB its computational requirements are impractical.
Medicine
Many countries collect newborn blood samples to screen for diseases mainly with a genetic basis. Mainly these are destroyed soon after testing. In some countries the dried blood (and the DNA) is retained for later testing. 
In Denmark
 the Danish Newborn Screening Biobank at Statens Serum Institut keeps a 
blood sample from people born after 1981. The purpose is to test for phenylketonuria and other diseases. However, it is also used for DNA profiling to identify deceased and suspected criminals. Parents can request that the blood sample of their newborn be destroyed after the result of the test is known.
Privacy issues
Critics of DNA databases warn that the various uses of the technology can pose a threat to individual civil liberties. Personal information included in genetic material, such as markers that identify various genetic diseases,
 physical and behavioral traits, could be used for discriminatory 
profiling and its collection may constitute an invasion of privacy. Also, DNA can be used to establish paternity
 and whether or not a child is adopted. Nowadays, the privacy and 
security issues of DNA database has caused huge attention. Some people 
are afraid that their personal DNA information will be let out easily, 
others may define their DNA profiles recording in the Databases as a 
sense of "criminal", and being falsely accused in a crime can lead to 
having a "criminal" record for the rest of their lives. 
UK laws in 2001 and 2003
 allowed DNA profiles to be taken immediately after a person was 
arrested and kept in a Database even if the suspect was later acquitted. In response to public unease at these provisions, the UK later changed this by passing the Protection of Freedoms Act 2012 which required that those suspects not charged or found not guilty would have their DNA data deleted from the Database.
In European countries which have established a DNA database, 
there are some measures which are being used to protect the privacy of 
individuals, more specifically, some criteria to help removing the DNA 
profiles from the databases. Among the 22 European countries which have 
been analyzed, most of the countries will record the DNA profiles of 
suspects or those who have committed serious crimes. For some countries 
(like Belgium and France) may remove the criminal’s profile after 30–40 
years, because these “criminal investigation” database are no longer 
needed. Most of the countries will delete the suspect’s profile after 
they are acquitted…etc. All the countries have a completed legislation 
to largely avoid the privacy issues which may occur during the use of 
DNA database.
Public discussion around the introduction of advanced forensic 
techniques (such as genetic genealogy using public genealogy databases 
and DNA phenotyping approaches) has been limited, disjointed, and 
unfocused, and raises issues of privacy and consent that may warrant 
additional legal protections to be established.
Privacy issues surrounding DNA databases not only means privacy 
is threatened in collecting and analyzing DNA samples, it also exists in
 protecting and storing this important personal information. As the DNA 
profiles can be stored indefinitely in DNA database, it has raised 
concerns that these DNA samples can be used for new and unidentified 
purposes.
 With the increase of the users who access the DNA database, people are 
worried about their information being let out or shared inappropriately,
 for example, their DNA profile may be shared with others such as law 
enforcement agencies or countries without individual consent.
The application of DNA databases have been expanded into two 
controversial areas: arrestees and familial searching. An arrestee is a 
person arrested for a crime and who has not yet been convicted for that 
offense. Currently, 21 states in the a United States have passed 
legislation that allows law enforcement to take DNA from an arrestee and
 enter it into the state's CODIS DNA database to see if that person has a
 criminal record or can be linked to any unsolved crimes. In familial 
searching, the DNA database is used to look for partial matches that 
would be expected between close family members.  This technology can be 
used to link crimes to the family members of suspects and thereby help 
identify a suspect when the perpetrator has no DNA sample in the 
database.
Furthermore, DNA databases could fall into the wrong hands due to data breaches or data sharing.
DNA collection and human rights
In a judgement in December 2008, the European Court of Human Rights
 ruled that two British men should not have had their DNA and 
fingerprints retained by police saying that retention "could not be 
regarded as necessary in a democratic society".
The DNA fingerprinting pioneer Professor Sir Alec Jeffreys
 condemned UK government plans to keep the genetic details of hundreds 
of thousands of innocent people in England and Wales for up to 12 years.
 Jeffreys said he was "disappointed" with the proposals, which came 
after a European court ruled that the current policy breaches people's 
right to privacy. Jefferys said "It seems to be as about as minimal a 
response to the European court of human rights judgment as one could 
conceive. There is a presumption not of innocence but of future guilt 
here … which I find very disturbing indeed".
Effects on crime
A 2017 study in the American Economic Journal: Applied Economics
 showed that databases of criminal offenders' DNA profiles in US states 
"deter crime by profiled offenders, reduce crime rates, and are more 
cost-effective than traditional law enforcement tools."
Monozygotic Twins
Monozygotic twins share around 99.99% of their DNA, while other siblings share around 50%. Some next generation sequencing tools are capable of detecting rare de novo mutations in only one of the twins (detectable in rare single nucleotide polymorphisms). Most DNA testing tools would not detect these rare SNPs in most twins. 
Each person’s DNA is unique to them to the slight exception of 
identical (monozygotic and monospermotic) twins, who start out from the 
identical genetic line of DNA but during the twinning event have 
incredibly small mutations which can be detected now (for all intents 
and purposes, compared to all other humans and even to theoretical 
"clones, [who would not share the same uterus nor experience the same 
mutations pre-twinning event]" identical twins have more identical DNA 
than is probably possible to achieve between any other two humans). Tiny
 differences between identical twins can now (2014) be detected by next 
generation sequencing. For current fiscally available testing, 
"identical" twins cannot be easily differentiated by the most common DNA
 testing, but it has been shown to be possible. While other siblings 
(including fraternal twins) share about 50% of their DNA, monozygotic 
twins share virtually 99.99%. Beyond these more recently discovered 
twinning-event mutation disparities, since 2008 it has been known that 
people who are identical twins also each have their own set of copy 
number variants, which can be thought of as the number of copies they 
each personally exhibit for certain sections of DNA.