Search This Blog

Wednesday, May 1, 2019

DNA profiling

From Wikipedia, the free encyclopedia

DNA profiling (also called DNA fingerprinting) is the process of determining an individual's DNA characteristics, which are as unique as fingerprints. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding.
 
DNA profiling is a forensic technique in criminal investigations, comparing criminal suspects' profiles to DNA evidence so as to assess the likelihood of their involvement in the crime. It is also used in parentage testing, to establish immigration eligibility, and in genealogical and medical research. DNA profiling has also been used in the study of animal and plant populations in the fields of zoology, botany, and agriculture.

Background

Starting in the 1980s scientific advances allowed for the use of DNA as a mechanism for the identification of an individual. The first patent covering the modern process of DNA profiling was filed by Dr. Jeffrey Glassberg in 1983, based upon work he had done while at Rockefeller University in 1981. Glassberg, along with two medical doctors, founded Lifecodes Corporation to bring this invention to market. The Glassberg patent was issued in Belgium BE899027A1, Canada FR2541774A1, Germany DE3407196 A1, Great Britain GB8405107D0, Japan JPS59199000A, United States as US5593832A. In the United Kingdom, Geneticist Sir Alec Jeffreys independently developed a DNA profiling process in beginning in late 1984 while working in the Department of Genetics at the University of Leicester.

The process, developed by Jeffreys in conjunction with Peter Gill and Dave Werrett of the Forensic Science Service (FSS), was first used forensically in the solving of the murder of two teenagers who had been raped and murdered in Narborough, Leicestershire in 1983 and 1986. In the murder inquiry, led by Detective David Baker, the DNA contained within blood samples obtained voluntarily from around 5,000 local men who willingly assisted Leicestershire Constabulary with the investigation, resulted in the exoneration of Richard Buckland, an initial suspect who had confessed to one of the crimes, and the subsequent conviction of Colin Pitchfork on January 2, 1988. Pitchfork, a local bakery employee, had coerced his coworker Ian Kelly to stand in for him when providing a blood sample—Kelly then used a forged passport to impersonate Pitchfork. Another coworker reported the deception to the police. Pitchfork was arrested, and his blood was sent to Jeffrey's lab for processing and profile development. Pitchfork's profile matched that of DNA left by the murderer which confirmed Pitchfork's presence at both crime scenes; he pleaded guilty to both murders.

Although 99.9% of human DNA sequences are the same in every person, enough of the DNA is different that it is possible to distinguish one individual from another, unless they are monozygotic (identical) twins. DNA profiling uses repetitive sequences that are highly variable, called variable number tandem repeats (VNTRs), in particular short tandem repeats (STRs), also known as microsatellites, and minisatellites. VNTR loci are similar between closely related individuals, but are so variable that unrelated individuals are unlikely to have the same VNTRs.

DNA profiling processes

Variations of VNTR allele lengths in 6 individuals.
 
Alec Jeffreys, a pioneer of DNA profiling.
 
The process, developed by Glassberg and independently by Jeffreys, begins with a sample of an individual's DNA (typically called a "reference sample"). Reference samples are usually collected through a buccal swab. When this is unavailable (for example, when a court order is needed but unobtainable) other methods may be needed to collect a sample of blood, saliva, semen, vaginal lubrication, or other fluid or tissue from personal use items (for example, a toothbrush, razor) or from stored samples (for example, banked sperm or biopsy tissue). Samples obtained from blood relatives can indicate an individual's profile, as could previous profiled human remains. A reference sample is then analyzed to create the individual's DNA profile using one of the techniques discussed below. The DNA profile is then compared against another sample to determine whether there is a genetic match.

DNA Extraction

When a sample such as blood or saliva is obtained, the DNA is only a small part of what is present in the sample. Before the DNA can be analyzed, it must be extracted from the cells and purified. There are many ways this can be accomplished, but all methods follow the same basic procedure. The cell and nuclear membranes need to be broken up to allow the DNA to be free in solution. Once the DNA is free, it can be separated from all other cellular components. After the DNA has been separated in solution, the remaining cellular debris can then be removed from the solution and discarded, leaving only DNA. The most common methods of DNA extraction include organic extraction (also called phenol chloroform extraction), Chelex extraction, and solid phase extraction. Differential extraction is a modified version of extraction in which DNA from two different types of cells can be separated from each other before being purified from the solution. Each method of extraction works well in the laboratory, but analysts typically selects their preferred method based on factors such as the cost, the time involved, the quantity of DNA yielded, and the quality of DNA yielded. After the DNA is extracted from the sample, it can be analyzed, whether it be RFLP analysis or quantification and PCR analysis.

RFLP analysis

The first methods for finding out genetics used for DNA profiling involved RFLP analysis. DNA is collected from cells and cut into small pieces using a restriction enzyme (a restriction digest). This generates DNA fragments of differing sizes as a consequence of variations between DNA sequences of different individuals. The fragments are then separated on the basis of size using gel electrophoresis

The separated fragments are then transferred to a nitrocellulose or nylon filter; this procedure is called a Southern blot. The DNA fragments within the blot are permanently fixed to the filter, and the DNA strands are denatured. Radiolabeled probe molecules are then added that are complementary to sequences in the genome that contain repeat sequences. These repeat sequences tend to vary in length among different individuals and are called variable number tandem repeat sequences or VNTRs. The probe molecules hybridize to DNA fragments containing the repeat sequences and excess probe molecules are washed away. The blot is then exposed to an X-ray film. Fragments of DNA that have bound to the probe molecules appear as fluoresent bands on the film. 

The Southern blot technique requires large amounts of non-degraded sample DNA. Also, Karl Brown's original technique looked at many minisatellite loci at the same time, increasing the observed variability, but making it hard to discern individual alleles (and thereby precluding paternity testing). These early techniques have been supplanted by PCR-based assays.

Polymerase chain reaction (PCR) analysis

Developed by Kary Mullis in 1983, a process was reported by which specific portions of the sample DNA can be amplified almost indefinitely (Saiki et al. 1985, 1985) The process, polymerase chain reaction (PCR), mimics the biological process of DNA replication, but confines it to specific DNA sequences of interest. With the invention of the PCR technique, DNA profiling took huge strides forward in both discriminating power and the ability to recover information from very small (or degraded) starting samples. 

PCR greatly amplifies the amounts of a specific region of DNA. In the PCR process, the DNA sample is denatured into the separate individual polynucleotide strands through heating. Two oligonucleotide DNA primers are used to hybridize to two corresponding nearby sites on opposite DNA strands in such a fashion that the normal enzymatic extension of the active terminal of each primer (that is, the 3’ end) leads toward the other primer. PCR uses replication enzymes that are tolerant of high temperatures, such as the thermostable Taq polymerase. In this fashion, two new copies of the sequence of interest are generated. Repeated denaturation, hybridization, and extension in this fashion produce an exponentially growing number of copies of the DNA of interest. Instruments that perform thermal cycling are readily available from commercial sources. This process can produce a million-fold or greater amplification of the desired region in 2 hours or less.

Early assays such as the HLA-DQ alpha reverse dot blot strips grew to be very popular due to their ease of use, and the speed with which a result could be obtained. However, they were not as discriminating as RFLP analysis. It was also difficult to determine a DNA profile for mixed samples, such as a vaginal swab from a sexual assault victim. 

However, the PCR method was readily adaptable for analyzing VNTR, in particular STR loci. In recent years, research in human DNA quantitation has focused on new "real-time" quantitative PCR (qPCR) techniques. Quantitative PCR methods enable automated, precise, and high-throughput measurements. Inter-laboratory studies have demonstrated the importance of human DNA quantitation on achieving reliable interpretation of STR typing and obtaining consistent results across laboratories.

STR analysis

The system of DNA profiling used today is based on polymerase chain reaction (PCR) and uses simple sequences or short tandem repeats (STR). This method uses highly polymorphic regions that have short repeated sequences of DNA (the most common is 4 bases repeated, but there are other lengths in use, including 3 and 5 bases). Because unrelated people almost certainly have different numbers of repeat units, STRs can be used to discriminate between unrelated individuals. These STR loci (locations on a chromosome) are targeted with sequence-specific primers and amplified using PCR. The DNA fragments that result are then separated and detected using electrophoresis. There are two common methods of separation and detection, capillary electrophoresis (CE) and gel electrophoresis. 

Each STR is polymorphic, but the number of alleles is very small. Typically each STR allele will be shared by around 5 - 20% of individuals. The power of STR analysis comes from looking at multiple STR loci simultaneously. The pattern of alleles can identify an individual quite accurately. Thus STR analysis provides an excellent identification tool. The more STR regions that are tested in an individual the more discriminating the test becomes. 

From country to country, different STR-based DNA-profiling systems are in use. In North America, systems that amplify the CODIS 20 core loci are almost universal, whereas in the United Kingdom the DNA-17 17 loci system (which is compatible with The National DNA Database) is in use, and Australia uses 18 core markers. Whichever system is used, many of the STR regions used are the same. These DNA-profiling systems are based on multiplex reactions, whereby many STR regions will be tested at the same time. 

The true power of STR analysis is in its statistical power of discrimination. Because the 20 loci that are currently used for discrimination in CODIS are independently assorted (having a certain number of repeats at one locus does not change the likelihood of having any number of repeats at any other locus), the product rule for probabilities can be applied. This means that, if someone has the DNA type of ABC, where the three loci were independent, we can say that the probability of having that DNA type is the probability of having type A times the probability of having type B times the probability of having type C. This has resulted in the ability to generate match probabilities of 1 in a quintillion (1x1018) or more. However, DNA database searches showed much more frequent than expected false DNA profile matches. Moreover, since there are about 12 million monozygotic twins on Earth, the theoretical probability is not accurate. 

In practice, the risk of contaminated-matching is much greater than matching a distant relative, such as contamination of a sample from nearby objects, or from left-over cells transferred from a prior test. The risk is greater for matching the most common person in the samples: Everything collected from, or in contact with, a victim is a major source of contamination for any other samples brought into a lab. For that reason, multiple control-samples are typically tested in order to ensure that they stayed clean, when prepared during the same period as the actual test samples. Unexpected matches (or variations) in several control-samples indicates a high probability of contamination for the actual test samples. In a relationship test, the full DNA profiles should differ (except for twins), to prove that a person was not actually matched as being related to their own DNA in another sample.

AFLP

Another technique, AFLP, or amplified fragment length polymorphism was also put into practice during the early 1990s. This technique was also faster than RFLP analysis and used PCR to amplify DNA samples. It relied on variable number tandem repeat (VNTR) polymorphisms to distinguish various alleles, which were separated on a polyacrylamide gel using an allelic ladder (as opposed to a molecular weight ladder). Bands could be visualized by silver staining the gel. One popular focus for fingerprinting was the D1S80 locus. As with all PCR based methods, highly degraded DNA or very small amounts of DNA may cause allelic dropout (causing a mistake in thinking a heterozygote is a homozygote) or other stochastic effects. In addition, because the analysis is done on a gel, very high number repeats may bunch together at the top of the gel, making it difficult to resolve. AmpFLP analysis can be highly automated, and allows for easy creation of phylogenetic trees based on comparing individual samples of DNA. Due to its relatively low cost and ease of set-up and operation, AmpFLP remains popular in lower income countries.

DNA family relationship analysis

1: A cell sample is taken- usually a cheek swab or blood test 2: DNA is extracted from sample 3: Cleavage of DNA by restriction enzyme- the DNA is broken into small fragments 4: Small fragments are amplified by the polymerase chain reaction- results in many more fragments 5: DNA fragments are separated by electrophoresis 6: The fragments are transferred to an agar plate 7: On the agar plate specific DNA fragments are bound to a radioactive DNA probe 8: The agar plate is washed free of excess probe 9: An x-ray film is used to detect a radioactive pattern 10: The DNA is compared to other DNA samples
 
Using PCR technology, DNA analysis is widely applied to determine genetic family relationships such as paternity, maternity, siblingship and other kinships. 

During conception, the father's sperm cell and the mother's egg cell, each containing half the amount of DNA found in other body cells, meet and fuse to form a fertilized egg, called a zygote. The zygote contains a complete set of DNA molecules, a unique combination of DNA from both parents. This zygote divides and multiplies into an embryo and later, a full human being.

At each stage of development, all the cells forming the body contain the same DNA—half from the father and half from the mother. This fact allows the relationship testing to use all types of all samples including loose cells from the cheeks collected using buccal swabs, blood or other types of samples. 

There are predictable inheritance patterns at certain locations (called loci) in the human genome, which have been found to be useful in determining identity and biological relationships. These loci contain specific DNA markers that scientists use to identify individuals. In a routine DNA paternity test, the markers used are short tandem repeats (STRs), short pieces of DNA that occur in highly differential repeat patterns among individuals.

Each person's DNA contains two copies of these markers—one copy inherited from the father and one from the mother. Within a population, the markers at each person's DNA location could differ in length and sometimes sequence, depending on the markers inherited from the parents.

The combination of marker sizes found in each person makes up his/her unique genetic profile. When determining the relationship between two individuals, their genetic profiles are compared to see if they share the same inheritance patterns at a statistically conclusive rate. 

For example, the following sample report from this commercial DNA paternity testing laboratory Universal Genetics signifies how relatedness between parents and child is identified on those special markers: 

DNA marker Mother Child Alleged father
D21S11 28, 30 28, 31 29, 31
D7S820 9, 10 10, 11 11, 12
TH01 14, 15 14, 16 15, 16
D13S317 7, 8 7, 9 8, 9
D19S433 14, 16.2 14, 15 15, 17

The partial results indicate that the child and the alleged father's DNA match among these five markers. The complete test results show this correlation on 16 markers between the child and the tested man to enable a conclusion to be drawn as to whether or not the man is the biological father.

Each marker is assigned with a Paternity Index (PI), which is a statistical measure of how powerfully a match at a particular marker indicates paternity. The PI of each marker is multiplied with each other to generate the Combined Paternity Index (CPI), which indicates the overall probability of an individual being the biological father of the tested child relative to a randomly selected man from the entire population of the same race. The CPI is then converted into a Probability of Paternity showing the degree of relatedness between the alleged father and child.

The DNA test report in other family relationship tests, such as grandparentage and siblingship tests, is similar to a paternity test report. Instead of the Combined Paternity Index, a different value, such as a Siblingship Index, is reported.

The report shows the genetic profiles of each tested person. If there are markers shared among the tested individuals, the probability of biological relationship is calculated to determine how likely the tested individuals share the same markers due to a blood relationship.

Y-chromosome analysis

Recent innovations have included the creation of primers targeting polymorphic regions on the Y-chromosome (Y-STR), which allows resolution of a mixed DNA sample from a male and female or cases in which a differential extraction is not possible. Y-chromosomes are paternally inherited, so Y-STR analysis can help in the identification of paternally related males. Y-STR analysis was performed in the Sally Hemings controversy to determine if Thomas Jefferson had sired a son with one of his slaves. The analysis of the Y-chromosome yields weaker results than autosomal chromosome analysis. The Y male sex-determining chromosome, as it is inherited only by males from their fathers, is almost identical along the patrilineal line. This leads to a less precise analysis than if autosomal chromosomes were testing, because of the random matching that occurs between pairs of chromosomes as zygotes are being made.

Mitochondrial analysis

For highly degraded samples, it is sometimes impossible to get a complete profile of the 13 CODIS STRs. In these situations, mitochondrial DNA (mtDNA) is sometimes typed due to there being many copies of mtDNA in a cell, while there may only be 1-2 copies of the nuclear DNA. Forensic scientists amplify the HV1 and HV2 regions of the mtDNA, and then sequence each region and compare single-nucleotide differences to a reference. Because mtDNA is maternally inherited, directly linked maternal relatives can be used as match references, such as one's maternal grandmother's daughter's son. In general, a difference of two or more nucleotides is considered to be an exclusion. Heteroplasmy and poly-C differences may throw off straight sequence comparisons, so some expertise on the part of the analyst is required. mtDNA is useful in determining clear identities, such as those of missing people when a maternally linked relative can be found. mtDNA testing was used in determining that Anna Anderson was not the Russian princess she had claimed to be, Anastasia Romanov

mtDNA can be obtained from such material as hair shafts and old bones/teeth. Control mechanism based on interaction point with data. This can be determined by tooled placement in sample.

Issues with forensic DNA samples

When people think of DNA analysis they often think about shows like NCIS or CSI, which portray DNA samples coming into a lab and then instantly analyzed, followed by pulling up a picture of the suspect within minutes⁠ ⁠—⁠ ⁠the reality, however, is quite different, and perfect DNA samples are often not collected from the scene of a crime. Homicide victims are frequently left exposed to harsh conditions before they are found and objects used to commit crimes have often been handled by more than one person. The two most prevalent issues that forensic scientists encounter when analyzing DNA samples are degraded samples and DNA mixtures.

Degraded DNA

In the real world DNA labs often have to deal with DNA samples that are less than ideal. DNA samples taken from crime scenes are often degraded, which means that the DNA has started to break down into smaller fragments DNA fragmentation. Victims of homicides might not be discovered right away, and in the case of a mass casualty event it could be hard to get DNA samples before the DNA has been exposed to degradation elements.

Degradation or fragmentation of DNA at crime scenes can occur because of a number of reasons, with environmental exposure often being the most common cause. Biological samples that have been exposed to the environment can get degraded by water and enzymes called nucleases Nuclease. Nucleases essentially ‘chew’ up the DNA into fragments over time and are found everywhere in nature.

Before modern PCR methods existed it was almost impossible to analyze degraded DNA samples. Methods like restriction fragment length polymorphism or RFLP Restriction fragment length polymorphism, which was the first technique used for DNA analysis in forensic science, required high molecular weight DNA in the sample in order to get reliable data. High molecular weight DNA however is something that is lacking in degraded samples, as the DNA is too fragmented to accurately carry out RFLP. It wasn't until modern day PCR techniques were invented that analysis of degraded DNA samples were able to be carried out Polymerase chain reaction. Multiplex PCR in particular made it possible to isolate and amplify the small fragments of DNA still left in degraded samples. When multiplex PCR methods are compared to the older methods like RFLP a vast difference can be seen. Multiplex PCR can theoretically amplify less than 1 ng of DNA, while RFLP had to have a least 100 ng of DNA in order to carry out an analysis.

In terms of a forensic approach to a degraded DNA sample, STR loci STR analysis are often amplified using PCR-based methods. Though STR loci are amplified with greater probability of success with degraded DNA, there is still the possibility that larger STR loci will fail to amplify, and therefore, would likely yield a partial profile, which results in reduced statistical weight of association in the event of a match.

MiniSTR Analysis

In instances where DNA samples are degraded, like in the case of intense fires or if all that remains are bone fragments, standard STR testing on these samples can be inadequate. When standard STR testing is done on highly degraded samples the larger STR loci often drop out, and only partial DNA profiles are obtained. While partial DNA profiles can be a powerful tool, the random match probabilities will be larger than if a full profile was obtained. One method that has been developed in order to analyse degraded DNA samples is to use miniSTR technology. In this new approach, primers are specially designed to bind closer to the STR region. In normal STR testing the primers will bind to longer sequences that contain the STR region within the segment. MiniSTR analysis however will just target the STR location, and this results in a DNA product that is much smaller.

By placing the primers closer to the actual STR regions, there is a higher chance that successful amplification of this region will occur. Successful amplification of these STR regions can now occur and more complete DNA profiles can be obtained. The success that smaller PCR products produce a higher success rate with highly degraded samples was first reported in 1995, when miniSTR technology was used to identify victims of the Waco fire. In this case the fire at destroyed the DNA samples so badly that normal STR testing did not result in a positive ID on some of the victims.

DNA Mixtures

Mixtures are another common issue that forensic scientists face when they are analyzing unknown or questionable DNA samples. A mixture is defined as a DNA sample that contains two or more individual contributors. This can often occur when a DNA sample is swabbed from an item that is handled by more than one person or when a sample contains both the victim and assailants' DNA. The presence of more than one individual in a DNA sample can make it challenging to detect individual profiles, and interpretation of mixtures should only be done by highly trained individuals. Mixtures that contain two or three individuals can be interpreted, though it will be difficult. Mixtures that contain four or more individuals are much too convoluted to get individual profiles. One common scenario in which a mixture is often obtained is in the case of sexual assault. A sample may be collected that contains material from the victim, the victim's consensual sexual partners, and the perpetrator(s).

As detection methods in DNA profiling advance, forensic scientists are seeing more DNA samples that contain mixtures, as even the smallest contributor is now able to be detected by modern tests. The ease in which forensic scientists have in interpenetrating DNA mixtures largely depends on the ratio of DNA present from each individual, the genotype combinations, and total amount of DNA amplified. The DNA ratio is often the most important aspect to look at in determining whether a mixture can be interpreted. For example, in the case where a DNA sample had two contributors, it would be easy to interpret individual profiles if the ratio of DNA contributed by one person was much higher than the second person. When a sample has three or more contributors, it becomes extremely difficult to determine individual profiles. Fortunately, advancements in probabilistic genotyping could make this sort of determination possible in the future. Probabilistic genotyping uses complex computer software to run through thousands of mathematical computations in order to produce statistical likelihoods of individual genotypes found in a mixture. Probabilistic genotyping software that are often used in labs today include STRmix and TrueAllele.

DNA databases

An early application of a DNA database was the compilation of a Mitochondrial DNA Concordance, prepared by Kevin W. P. Miller and John L. Dawson at the University of Cambridge from 1996 to 1998 from data collected as part of Miller's PhD thesis. There are now several DNA databases in existence around the world. Some are private, but most of the largest databases are government-controlled. The United States maintains the largest DNA database, with the Combined DNA Index System (CODIS) holding over 13 million records as of May 2018. The United Kingdom maintains the National DNA Database (NDNAD), which is of similar size, despite the UK's smaller population. The size of this database, and its rate of growth, are giving concern to civil liberties groups in the UK, where police have wide-ranging powers to take samples and retain them even in the event of acquittal. The Conservative–Liberal Democrat coalition partially addressed these concerns with part 1 of the Protection of Freedoms Act 2012, under which DNA samples must be deleted if suspects are acquitted or not charged, except in relation to certain (mostly serious and/or sexual) offenses. Public discourse around the introduction of advanced forensic techniques (such as genetic genealogy using public genealogy databases and DNA phenotyping approaches) has been limited, disjointed, unfocused, and raises issues of privacy and consent that may warrant the establishment of additional legal protections.

The U.S. Patriot Act of the United States provides a means for the U.S. government to get DNA samples from suspected terrorists. DNA information from crimes is collected and deposited into the CODIS database, which is maintained by the FBI. CODIS enables law enforcement officials to test DNA samples from crimes for matches within the database, providing a means of finding specific biological profiles associated with collected DNA evidence.

When a match is made from a national DNA databank to link a crime scene to an offender having provided a DNA sample to a database, that link is often referred to as a cold hit. A cold hit is of value in referring the police agency to a specific suspect but is of less evidential value than a DNA match made from outside the DNA Databank.

FBI agents cannot legally store DNA of a person not convicted of a crime. DNA collected from a suspect not later convicted must be disposed of and not entered into the database. In 1998, a man residing in the UK was arrested on accusation of burglary. His DNA was taken and tested, and he was later released. Nine months later, this man's DNA was accidentally and illegally entered in the DNA database. New DNA is automatically compared to the DNA found at cold cases and, in this case, this man was found to be a match to DNA found at a rape and assault case one year earlier. The government then prosecuted him for these crimes. During the trial the DNA match was requested to be removed from the evidence because it had been illegally entered into the database. The request was carried out.

The DNA of the perpetrator, collected from victims of rape, can be stored for years until a match is found. In 2014, to address this problem, Congress extended a bill that helps states deal with "a backlog" of evidence.

Considerations when evaluating DNA evidence

As DNA profiling became a key piece of evidence in the court, defense lawyers based their arguments on statistical reasoning. For example: Given a match that had a 1 in 5 million probability of occurring by chance, the lawyer would argue that this meant that in a country of say 60 million people there were 12 people who would also match the profile. This was then translated to a 1 in 12 chance of the suspect's being the guilty one. This argument is not sound unless the suspect was drawn at random from the population of the country. In fact, a jury should consider how likely it is that an individual matching the genetic profile would also have been a suspect in the case for other reasons. Also, different DNA analysis processes can reduce the amount of DNA recovery if the procedures are not properly done. Therefore, the number of times a piece of evidence is sampled can diminish the DNA collection efficiency. Another spurious statistical argument is based on the false assumption that a 1 in 5 million probability of a match automatically translates into a 1 in 5 million probability of innocence and is known as the prosecutor's fallacy.

When using RFLP, the theoretical risk of a coincidental match is 1 in 100 billion (100,000,000,000), although the practical risk is actually 1 in 1000 because monozygotic twins are 0.2% of the human population. Moreover, the rate of laboratory error is almost certainly higher than this, and often actual laboratory procedures do not reflect the theory under which the coincidence probabilities were computed. For example, the coincidence probabilities may be calculated based on the probabilities that markers in two samples have bands in precisely the same location, but a laboratory worker may conclude that similar—but not precisely identical—band patterns result from identical genetic samples with some imperfection in the agarose gel. However, in this case, the laboratory worker increases the coincidence risk by expanding the criteria for declaring a match. Recent studies have quoted relatively high error rates, which may be cause for concern. In the early days of genetic fingerprinting, the necessary population data to accurately compute a match probability was sometimes unavailable. Between 1992 and 1996, arbitrary low ceilings were controversially put on match probabilities used in RFLP analysis rather than the higher theoretically computed ones. Today, RFLP has become widely disused due to the advent of more discriminating, sensitive and easier technologies. 

Since 1998, the DNA profiling system supported by The National DNA Database in the UK is the SGM+ DNA profiling system that includes 10 STR regions and a sex-indicating test. STRs do not suffer from such subjectivity and provide similar power of discrimination (1 in 1013 for unrelated individuals if using a full SGM+ profile). Figures of this magnitude are not considered to be statistically supportable by scientists in the UK; for unrelated individuals with full matching DNA profiles a match probability of 1 in a billion is considered statistically supportable. However, with any DNA technique, the cautious juror should not convict on genetic fingerprint evidence alone if other factors raise doubt. Contamination with other evidence (secondary transfer) is a key source of incorrect DNA profiles and raising doubts as to whether a sample has been adulterated is a favorite defense technique. More rarely, chimerism is one such instance where the lack of a genetic match may unfairly exclude a suspect.

Evidence of genetic relationship

It is possible to use DNA profiling as evidence of genetic relationship, although such evidence varies in strength from weak to positive. Testing that shows no relationship is absolutely certain. Further, while almost all individuals have a single and distinct set of genes, ultra-rare individuals, known as "chimeras", have at least two different sets of genes. There have been two cases of DNA profiling that falsely suggested that a mother was unrelated to her children. This happens when two eggs are fertilized at the same time and fuse together to create one individual instead of twins.

Fake DNA evidence

In one case, a criminal planted fake DNA evidence in his own body: John Schneeberger raped one of his sedated patients in 1992 and left semen on her underwear. Police drew what they believed to be Schneeberger's blood and compared its DNA against the crime scene semen DNA on three occasions, never showing a match. It turned out that he had surgically inserted a Penrose drain into his arm and filled it with foreign blood and anticoagulants

The functional analysis of genes and their coding sequences (open reading frames [ORFs]) typically requires that each ORF be expressed, the encoded protein purified, antibodies produced, phenotypes examined, intracellular localization determined, and interactions with other proteins sought. In a study conducted by the life science company Nucleix and published in the journal Forensic Science International, scientists found that an in vitro synthesized sample of DNA matching any desired genetic profile can be constructed using standard molecular biology techniques without obtaining any actual tissue from that person. Nucleix claims they can also prove the difference between non-altered DNA and any that was synthesized.

In the case of the Phantom of Heilbronn, police detectives found DNA traces from the same woman on various crime scenes in Austria, Germany, and France—among them murders, burglaries and robberies. Only after the DNA of the "woman" matched the DNA sampled from the burned body of a male asylum seeker in France did detectives begin to have serious doubts about the DNA evidence. It was eventually discovered that DNA traces were already present on the cotton swabs used to collect the samples at the crime scene, and the swabs had all been produced at the same factory in Austria. The company's product specification said that the swabs were guaranteed to be sterile, but not DNA-free.

DNA evidence in criminal trials

Familial DNA searching

Familial DNA searching (sometimes referred to as "familial DNA" or "familial DNA database searching") is the practice of creating new investigative leads in cases where DNA evidence found at the scene of a crime (forensic profile) strongly resembles that of an existing DNA profile (offender profile) in a state DNA database but there is not an exact match. After all other leads have been exhausted, investigators may use specially developed software to compare the forensic profile to all profiles taken from a state's DNA database to generate a list of those offenders already in the database who are most likely to be a very close relative of the individual whose DNA is in the forensic profile. To eliminate the majority of this list when the forensic DNA is a man's, crime lab technicians conduct Y-STR analysis. Using standard investigative techniques, authorities are then able to build a family tree. The family tree is populated from information gathered from public records and criminal justice records. Investigators rule out family members' involvement in the crime by finding excluding factors such as sex, living out of state or being incarcerated when the crime was committed. They may also use other leads from the case, such as witness or victim statements, to identify a suspect. Once a suspect has been identified, investigators seek to legally obtain a DNA sample from the suspect. This suspect DNA profile is then compared to the sample found at the crime scene to definitively identify the suspect as the source of the crime scene DNA. 

Familial DNA database searching was first used in an investigation leading to the conviction of Jeffrey Gafoor of the murder of Lynette White in the United Kingdom on 4 July 2003. DNA evidence was matched to Gafoor's nephew, who at 14 years old had not been born at the time of the murder in 1988. It was used again in 2004 to find a man who threw a brick from a motorway bridge and hit a lorry driver, killing him. DNA found on the brick matched that found at the scene of a car theft earlier in the day, but there were no good matches on the national DNA database. A wider search found a partial match to an individual; on being questioned, this man revealed he had a brother, Craig Harman, who lived very close to the original crime scene. Harman voluntarily submitted a DNA sample, and confessed when it matched the sample from the brick. Currently, familial DNA database searching is not conducted on a national level in the United States, where states determine how and when to conduct familial searches. The first familial DNA search with a subsequent conviction in the United States was conducted in Denver, Colorado, in 2008, using software developed under the leadership of Denver District Attorney Mitch Morrissey and Denver Police Department Crime Lab Director Gregg LaBerge. California was the first state to implement a policy for familial searching under then Attorney General, now Governor, Jerry Brown. In his role as consultant to the Familial Search Working Group of the California Department of Justice, former Alameda County Prosecutor Rock Harmon is widely considered to have been the catalyst in the adoption of familial search technology in California. The technique was used to catch the Los Angeles serial killer known as the "Grim Sleeper" in 2010. It wasn't a witness or informant that tipped off law enforcement to the identity of the "Grim Sleeper" serial killer, who had eluded police for more than two decades, but DNA from the suspect's own son. The suspect's son had been arrested and convicted in a felony weapons charge and swabbed for DNA the year before. When his DNA was entered into the database of convicted felons, detectives were alerted to a partial match to evidence found at the "Grim Sleeper" crime scenes. David Franklin Jr., also known as the Grim Sleeper, was charged with ten counts of murder and one count of attempted murder. More recently, familial DNA led to the arrest of 21-year-old Elvis Garcia on charges of sexual assault and false imprisonment of a woman in Santa Cruz in 2008. In March 2011 Virginia Governor Bob McDonnell announced that Virginia would begin using familial DNA searches. Other states are expected to follow. 

At a press conference in Virginia on March 7, 2011, regarding the East Coast Rapist, Prince William County prosecutor Paul Ebert and Fairfax County Police Detective John Kelly said the case would have been solved years ago if Virginia had used familial DNA searching. Aaron Thomas, the suspected East Coast Rapist, was arrested in connection with the rape of 17 women from Virginia to Rhode Island, but familial DNA was not used in the case.

Critics of familial DNA database searches argue that the technique is an invasion of an individual's 4th Amendment rights. Privacy advocates are petitioning for DNA database restrictions, arguing that the only fair way to search for possible DNA matches to relatives of offenders or arrestees would be to have a population-wide DNA database. Some scholars have pointed out that the privacy concerns surrounding familial searching are similar in some respects to other police search techniques, and most have concluded that the practice is constitutional. The Ninth Circuit Court of Appeals in United States v. Pool (vacated as moot) suggested that this practice is somewhat analogous to a witness looking at a photograph of one person and stating that it looked like the perpetrator, which leads law enforcement to show the witness photos of similar looking individuals, one of whom is identified as the perpetrator. Regardless of whether familial DNA searching was the method used to identify the suspect, authorities always conduct a normal DNA test to match the suspect's DNA with that of the DNA left at the crime scene. 

Critics also claim that racial profiling could occur on account of familial DNA testing. In the United States, the conviction rates of racial minorities are much higher than that of the overall population. It is unclear whether this is due to discrimination from police officers and the courts, as opposed to a simple higher rate of offence among minorities. Arrest-based databases, which are found in the majority of the United States, lead to an even greater level of racial discrimination. An arrest, as opposed to conviction, relies much more heavily on police discretion.

For instance, investigators with Denver District Attorney's Office successfully identified a suspect in a property theft case using a familial DNA search. In this example, the suspect's blood left at the scene of the crime strongly resembled that of a current Colorado Department of Corrections prisoner. Using publicly available records, the investigators created a family tree. They then eliminated all the family members who were incarcerated at the time of the offense, as well as all of the females (the crime scene DNA profile was that of a male). Investigators obtained a court order to collect the suspect's DNA, but the suspect actually volunteered to come to a police station and give a DNA sample. After providing the sample, the suspect walked free without further interrogation or detainment. Later confronted with an exact match to the forensic profile, the suspect pleaded guilty to criminal trespass at the first court date and was sentenced to two years probation.

In Italy a familiar DNA search has been done to solve the case of the murder of Yara Gambirasio whose body was found in the bush three months after her disappearance. A DNA trace was found on the underwear of the murdered teenage near and a DNA sample was requested from a person who lived near the municipality of Brembate di Sopra and a common male ancestor was found in the DNA sample of a young man not involved in the murder. After a long investigation the father of the supposed killer was identified as Giuseppe Guerinoni, a deceased man, but his two sons born from his wife were not related to the DNA samples found on the body of Yara. After three and a half years the DNA found on the underwear of the deceased girl was matched with Massimo Giuseppe Bosetti who was arrested and accused of the murder of the 13-year-old girl. In the summer of 2016 Bosetti was found guilty and sentenced to life by the Corte d'assise of Bergamo.

Partial matches

Partial DNA matches are not searches themselves, but are the result of moderate stringency CODIS searches that produce a potential match that shares at least one allele at every locus. Partial matching does not involve the use of familial search software, such as those used in the UK and United States, or additional Y-STR analysis, and therefore often misses sibling relationships. Partial matching has been used to identify suspects in several cases in the UK and United States, and has also been used as a tool to exonerate the falsely accused. Darryl Hunt was wrongly convicted in connection with the rape and murder of a young woman in 1984 in North Carolina. Hunt was exonerated in 2004 when a DNA database search produced a remarkably close match between a convicted felon and the forensic profile from the case. The partial match led investigators to the felon's brother, Willard E. Brown, who confessed to the crime when confronted by police. A judge then signed an order to dismiss the case against Hunt. In Italy, partial matching has been used in the controversial murder of Yara Gambirasio, a child found dead about a month after her presumed kidnapping. In this case, the partial match has been used as the only incriminating element against the defendant, Massimo Bossetti, who has been subsequently condemned for the murder (waiting appeal by the Italian Supreme Court).

Surreptitious DNA collecting

Police forces may collect DNA samples without a suspect's knowledge, and use it as evidence. The legality of the practice has been questioned in Australia.

In the United States, it has been accepted, courts often ruling that there is no expectation of privacy, citing California v. Greenwood (1988), in which the Supreme Court held that the Fourth Amendment does not prohibit the warrantless search and seizure of garbage left for collection outside the curtilage of a home. Critics of this practice underline that this analogy ignores that "most people have no idea that they risk surrendering their genetic identity to the police by, for instance, failing to destroy a used coffee cup. Moreover, even if they do realize it, there is no way to avoid abandoning one's DNA in public."

The United States Supreme Court ruled in Maryland v. King (2013) that DNA sampling of prisoners arrested for serious crimes is constitutional.

In the UK, the Human Tissue Act 2004 prohibits private individuals from covertly collecting biological samples (hair, fingernails, etc.) for DNA analysis, but exempts medical and criminal investigations from the prohibition.

England and Wales

Evidence from an expert who has compared DNA samples must be accompanied by evidence as to the sources of the samples and the procedures for obtaining the DNA profiles. The judge must ensure that the jury must understand the significance of DNA matches and mismatches in the profiles. The judge must also ensure that the jury does not confuse the match probability (the probability that a person that is chosen at random has a matching DNA profile to the sample from the scene) with the probability that a person with matching DNA committed the crime. In 1996 R v. Doheny Phillips LJ gave this example of a summing up, which should be carefully tailored to the particular facts in each case:
Members of the Jury, if you accept the scientific evidence called by the Crown, this indicates that there are probably only four or five white males in the United Kingdom from whom that semen stain could have come. The Defendant is one of them. If that is the position, the decision you have to reach, on all the evidence, is whether you are sure that it was the Defendant who left that stain or whether it is possible that it was one of that other small group of men who share the same DNA characteristics.
Juries should weigh up conflicting and corroborative evidence, using their own common sense and not by using mathematical formulae, such as Bayes' theorem, so as to avoid "confusion, misunderstanding and misjudgment".

Presentation and evaluation of evidence of partial or incomplete DNA profiles

In R v Bates, Moore-Bick LJ said:
We can see no reason why partial profile DNA evidence should not be admissible provided that the jury are made aware of its inherent limitations and are given a sufficient explanation to enable them to evaluate it. There may be cases where the match probability in relation to all the samples tested is so great that the judge would consider its probative value to be minimal and decide to exclude the evidence in the exercise of his discretion, but this gives rise to no new question of principle and can be left for decision on a case by case basis. However, the fact that there exists in the case of all partial profile evidence the possibility that a "missing" allele might exculpate the accused altogether does not provide sufficient grounds for rejecting such evidence. In many there is a possibility (at least in theory) that evidence that would assist the accused and perhaps even exculpate him altogether exists, but that does not provide grounds for excluding relevant evidence that is available and otherwise admissible, though it does make it important to ensure that the jury are given sufficient information to enable them to evaluate that evidence properly.

DNA testing in the United States

CBP chemist reads a DNA profile to determine the origin of a commodity.
 
There are state laws on DNA profiling in all 50 states of the United States. Detailed information on database laws in each state can be found at the National Conference of State Legislatures website.

Development of artificial DNA

In August 2009, scientists in Israel raised serious doubts concerning the use of DNA by law enforcement as the ultimate method of identification. In a paper published in the journal Forensic Science International: Genetics, the Israeli researchers demonstrated that it is possible to manufacture DNA in a laboratory, thus falsifying DNA evidence. The scientists fabricated saliva and blood samples, which originally contained DNA from a person other than the supposed donor of the blood and saliva.

The researchers also showed that, using a DNA database, it is possible to take information from a profile and manufacture DNA to match it, and that this can be done without access to any actual DNA from the person whose DNA they are duplicating. The synthetic DNA oligos required for the procedure are common in molecular laboratories.

The New York Times quoted the lead author, Daniel Frumkin, saying, "You can just engineer a crime scene ... any biology undergraduate could perform this". Frumkin perfected a test that can differentiate real DNA samples from fake ones. His test detects epigenetic modifications, in particular, DNA methylation. Seventy percent of the DNA in any human genome is methylated, meaning it contains methyl group modifications within a CpG dinucleotide context. Methylation at the promoter region is associated with gene silencing. The synthetic DNA lacks this epigenetic modification, which allows the test to distinguish manufactured DNA from genuine DNA.

It is unknown how many police departments, if any, currently use the test. No police lab has publicly announced that it is using the new test to verify DNA results.

Cases

  • In 1986, Richard Buckland was exonerated, despite having admitted to the rape and murder of a teenager near Leicester, the city where DNA profiling was first developed. This was the first use of DNA fingerprinting in a criminal investigation, and the first to prove a suspect's innocence. The following year Colin Pitchfork was identified as the perpetrator of the same murder, in addition to another, using the same techniques that had cleared Buckland.
  • In 1987, genetic fingerprinting was used in criminal court for the first time in the trial of a man accused of unlawful intercourse with a mentally handicapped 14-year-old female who gave birth to a baby.
  • In 1987, Florida rapist Tommie Lee Andrews was the first person in the United States to be convicted as a result of DNA evidence, for raping a woman during a burglary; he was convicted on November 6, 1987, and sentenced to 22 years in prison.
  • In 1988, Timothy Wilson Spencer was the first man in Virginia to be sentenced to death through DNA testing, for several rape and murder charges. He was dubbed "The South Side Strangler" because he killed victims on the south side of Richmond, Virginia. He was later charged with rape and first-degree murder and was sentenced to death. He was executed on April 27, 1994. David Vasquez, initially convicted of one of Spencer's crimes, became the first man in America exonerated based on DNA evidence.
  • In 1989, Chicago man Gary Dotson was the first person whose conviction was overturned using DNA evidence.
  • In 1991, Allan Legere was the first Canadian to be convicted as a result of DNA evidence, for four murders he had committed while an escaped prisoner in 1989. During his trial, his defense argued that the relatively shallow gene pool of the region could lead to false positives.
  • In 1992, DNA evidence was used to prove that Nazi doctor Josef Mengele was buried in Brazil under the name Wolfgang Gerhard.
  • In 1992, DNA from a palo verde tree was used to convict Mark Alan Bogan of murder. DNA from seed pods of a tree at the crime scene was found to match that of seed pods found in Bogan's truck. This is the first instance of plant DNA admitted in a criminal case.
  • In 1993, Kirk Bloodsworth was the first person to have been convicted of murder and sentenced to death, whose conviction was overturned using DNA evidence.
  • The 1993 rape and murder of Mia Zapata, lead singer for the Seattle punk band The Gits, was unsolved nine years after the murder. A database search in 2001 failed, but the killer's DNA was collected when he was arrested in Florida for burglary and domestic abuse in 2002.
  • The science was made famous in the United States in 1994 when prosecutors heavily relied on DNA evidence allegedly linking O. J. Simpson to a double murder. The case also brought to light the laboratory difficulties and handling procedure mishaps that can cause such evidence to be significantly doubted.
  • In 1994, Royal Canadian Mounted Police (RCMP) detectives successfully tested hairs from a cat known as Snowball, and used the test to link a man to the murder of his wife, thus marking for the first time in forensic history the use of non-human animal DNA to identify a criminal (plant DNA was used in 1992, see above).
  • In 1994, the claim that Anna Anderson was Grand Duchess Anastasia Nikolaevna of Russia was tested after her death using samples of her tissue that had been stored at a Charlottesville, Virginia hospital following a medical procedure. The tissue was tested using DNA fingerprinting, and showed that she bore no relation to the Romanovs.
  • In 1994, Earl Washington, Jr., of Virginia had his death sentence commuted to life imprisonment a week before his scheduled execution date based on DNA evidence. He received a full pardon in 2000 based on more advanced testing. His case is often cited by opponents of the death penalty.
  • In 1995, the British Forensic Science Service carried out its first mass intelligence DNA screening in the investigation of the Naomi Smith murder case.
  • In 1998, Richard J. Schmidt was convicted of attempted second-degree murder when it was shown that there was a link between the viral DNA of the human immunodeficiency virus (HIV) he had been accused of injecting in his girlfriend and viral DNA from one of his patients with AIDS. This was the first time viral DNA fingerprinting had been used as evidence in a criminal trial.
  • In 1999, Raymond Easton, a disabled man from Swindon, England, was arrested and detained for seven hours in connection with a burglary. He was released due to an inaccurate DNA match. His DNA had been retained on file after an unrelated domestic incident some time previously.
  • In 2000 Frank Lee Smith was proved innocent by DNA profiling of the murder of an eight-year-old girl after spending 14 years on death row in Florida, USA. However he had died of cancer just before his innocence was proven. In view of this the Florida state governor ordered that in future any death row inmate claiming innocence should have DNA testing.
  • In May 2000 Gordon Graham murdered Paul Gault at his home in Lisburn, Northern Ireland. Graham was convicted of the murder when his DNA was found on a sports bag left in the house as part of an elaborate ploy to suggest the murder occurred after a burglary had gone wrong. Graham was having an affair with the victim's wife at the time of the murder. It was the first time Low Copy Number DNA was used in Northern Ireland.
  • In 2001, Wayne Butler was convicted for the murder of Celia Douty. It was the first murder in Australia to be solved using DNA profiling.
  • In 2002, the body of James Hanratty, hanged in 1962 for the "A6 murder", was exhumed and DNA samples from the body and members of his family were analysed. The results convinced Court of Appeal judges that Hanratty's guilt, which had been strenuously disputed by campaigners, was proved "beyond doubt". Paul Foot and some other campaigners continued to believe in Hanratty's innocence and argued that the DNA evidence could have been contaminated, noting that the small DNA samples from items of clothing, kept in a police laboratory for over 40 years "in conditions that do not satisfy modern evidential standards", had had to be subjected to very new amplification techniques in order to yield any genetic profile. However, no DNA other than Hanratty's was found on the evidence tested, contrary to what would have been expected had the evidence indeed been contaminated.
  • In 2002, DNA testing was used to exonerate Douglas Echols, a man who was wrongfully convicted in a 1986 rape case. Echols was the 114th person to be exonerated through post-conviction DNA testing.
  • In August 2002, Annalisa Vincenzi was shot dead in Tuscany. Bartender Peter Hamkin, 23, was arrested, in Merseyside in March 2003 on an extradition warrant heard at Bow Street Magistrates' Court in London to establish whether he should be taken to Italy to face a murder charge. DNA "proved" he shot her, but he was cleared on other evidence.
  • In 2003, Welshman Jeffrey Gafoor was convicted of the 1988 murder of Lynette White, when crime scene evidence collected 12 years earlier was re-examined using STR techniques, resulting in a match with his nephew. This may be the first known example of the DNA of an innocent yet related individual being used to identify the actual criminal, via "familial searching".
  • In March 2003, Josiah Sutton was released from prison after serving four years of a twelve-year sentence for a sexual assault charge. Questionable DNA samples taken from Sutton were retested in the wake of the Houston Police Department's crime lab scandal of mishandling DNA evidence.
  • In June 2003, because of new DNA evidence, Dennis Halstead, John Kogut and John Restivo won a re-trial on their murder conviction, their convictions were struck down and they were released. The three men had already served eighteen years of their thirty-plus-year sentences.
  • The trial of Robert Pickton (convicted in December 2003) is notable in that DNA evidence is being used primarily to identify the victims, and in many cases to prove their existence.
  • In 2004, DNA testing shed new light into the mysterious 1912 disappearance of Bobby Dunbar, a four-year-old boy who vanished during a fishing trip. He was allegedly found alive eight months later in the custody of William Cantwell Walters, but another woman claimed that the boy was her son, Bruce Anderson, whom she had entrusted in Walters' custody. The courts disbelieved her claim and convicted Walters for the kidnapping. The boy was raised and known as Bobby Dunbar throughout the rest of his life. However, DNA tests on Dunbar's son and nephew revealed the two were not related, thus establishing that the boy found in 1912 was not Bobby Dunbar, whose real fate remains unknown.
  • In 2005, Gary Leiterman was convicted of the 1969 murder of Jane Mixer, a law student at the University of Michigan, after DNA found on Mixer's pantyhose was matched to Leiterman. DNA in a drop of blood on Mixer's hand was matched to John Ruelas, who was only four years old in 1969 and was never successfully connected to the case in any other way. Leiterman's defense unsuccessfully argued that the unexplained match of the blood spot to Ruelas pointed to cross-contamination and raised doubts about the reliability of the lab's identification of Leiterman.
  • In December 2005, Evan Simmons was proven innocent of a 1981 attack on an Atlanta woman after serving twenty-four years in prison. Mr. Clark is the 164th person in the United States and the fifth in Georgia to be freed using post-conviction DNA testing.
  • In November 2008, Anthony Curcio was arrested for masterminding one of the most elaborately planned armored car heists in history. DNA evidence linked Curcio to the crime.
  • In March 2009, Sean Hodgson—convicted of 1979 killing of Teresa De Simone, 22, in her car in Southampton—was released after tests proved DNA from the scene was not his. It was later matched to DNA retrieved from the exhumed body of David Lace. Lace had previously confessed to the crime but was not believed by the detectives. He served time in prison for other crimes committed at the same time as the murder and then committed suicide in 1988.
  • In 2012, familial DNA profiling led to Alice Collins Plebuch's unexpected discovery that her ancestral bloodline was not purely Irish, as she had previously been led to believe, but that her heritage also contained European Jewish, Middle Eastern and Eastern European. This led her into an extensive genealogy investigation which resulted in her uncovering the genetic family of her adopted father.
  • In 2016 Anthea Ring, abandoned as baby, was able to use a DNA sample and DNA matching database to discover her deceased mother's identity and roots in County Mayo, Ireland. A recently developed forensic test was subsequently used to capture DNA from saliva left on old stamps and envelopes by her suspected father, uncovered through painstaking genealogy research. The DNA in the first three samples was too degraded to use. However, on the fourth, more than enough DNA was found. The test, which has a degree of accuracy acceptable in UK courts, proved that a man named Patrick Coyne was her biological father.
  • In 2018 the Buckskin girl (a body found in 1981 in Ohio) was identified as Marcia King from Arkansas using DNA genealogical techniques
  • In 2018 Joseph James DeAngelo was arrested as the main suspect for the Golden State Killer using DNA and genealogy techniques.
  • In 2018 William Earl Talbot, II was arrested as a suspect for the 1987 murder of Jay Cook and Tanya Van Cuylenborg with the assistance of DNA genealogical techniques . The same genetic genealogist that helped in this case also helped police with 18 other arrests in 2018.

DNA evidence as evidence to prove rights of succession to British titles

DNA testing is used to establish the right of succession to British titles.

Cases:

Child poverty

From Wikipedia, the free encyclopedia

A boy bathes in a polluted river in Jakarta, Indonesia.
 
James Town, Accra.
 
Child poverty refers to the state of children living in poverty. This applies to children that come from poor families or orphans being raised with limited, or in some cases absent, state resources. Children that fail to meet the minimum acceptable standard of the nation where that child lives are said to be poor. In developing countries, these standards are lower and when combined with the increased number of orphans the effects are more extreme

Definition

The definition of children in most countries is 'people under the age of eighteen', while biologically the transition from childhood to adulthood is said to occur with the onset of puberty. Culturally defining the end of childhood is more complex, and takes into account factors such as the commencement of work, end of schooling and marriage as well as class, gender and race. According to the United Nations Children's Fund (UNICEF) "children living in poverty are those who experience deprivation of the material, spiritual and emotional resources needed to stay alive, develop and thrive, leaving them unable to enjoy their rights, achieve their full potential, and participate as full and equal members of society". The ChildFund International (CFI) definition is based on Deprivation (lack of materialistic conditions and services), Exclusion (denial of rights and safety) and Vulnerability (when society can not deal with threats to children). Other charitable organisations also use this multi-dimensional approach to child poverty, defining it as a combination of economic, social, cultural, physical, environmental and emotional factors. These definitions suggest child poverty is multidimensional, relative to their current and changing living conditions and complex interactions of the body, mind and emotions are involved.

Measuring child poverty

A boy washes cutlery in a pool of filthy water in Cambodia
 
The easiest way to quantify child poverty is by setting an absolute or relative monetary threshold. If a family does not earn above that threshold, the children of that family will be considered to live below the poverty line. Absolute poverty thresholds are fixed and generally only updated for price changes, whereas relative poverty thresholds are developed with reference to the actual income of the population and reflect changes in consumption. The absolute poverty threshold is the money needed to purchase a defined quantity of goods and services. While every threshold generally reflects the minimum income required to acquire necessities of life. However, there exists a caveat, as a family which earns above a set threshold, may still choose to not spend on the need of their children. Certain organisations, such as the World Bank and the International Monetary Fund, use the absolute poverty threshold of US$1 a day to measure poverty in developing countries. Since the 1960s, the US has used an absolute poverty threshold adjusted for family size and composition to determine those living in poverty.

Europe and many other developed countries use a relative poverty threshold, typically 50% of the countries' median income. Relative poverty does not necessarily mean the child is lacking anything, but is more a reflection of inequality in society. Child poverty, when measured using relative thresholds, will only improve if low-income families benefit more from economic advances than well-off families. Measures of child poverty using income thresholds will vary depending on whether relative or absolute poverty is measured and what threshold limits are applied. Using a relative measure, poverty is much higher in the US than in Europe, but if an absolute measure is used, then poverty in some European countries is higher. It is argued that using income as the only threshold ignores the multidimensional aspect of child poverty, which includes consumption requirements, access to resources and the ability to interact in society safely and without discrimination.

A 2003 study conducted by researchers out of Bristol attempted to provide a scientific basis for measuring severe deprivation based on levels of adequate nutrition, safe drinking water, decent sanitation facilities, health, shelter, education, and information. Measurable values were attributed to each indicator and these were used to establish how many children were living in poverty. The values included: heights and weights more than 3 deviations below the international median, children with access only to rivers and other surface water, no access to toilets, no immunisations, no access to medical advice, living in dwellings with more than five people per room, no school attendance and no access to newspapers or other media. Out of a population of 1.8 billion children from developing nations, 56% were below at least one of these measurements. In Sub-Saharan Africa and Southern Asia, this number increased to over 80%, with the rural children from these areas the worst affected.

The Young Lives Project is investigating the changing nature of child poverty by following nearly 12,000 children for 15 years in four countries (Ethiopia, Peru, Vietnam and India), chosen to reflect a wide range of cultural, political, geographical and social contexts. Every three to four years, researchers will collect data on the children and their families health, malnutrition, literacy, access to services and other indicators of poverty. Reports are available for these four countries that comparing the initial data obtained in 2002 with data from 2006. Peru, Vietnam and India have shown economic growth and a reduction in poverty over this time, but large inequalities still exist between rural and urban areas, and among ethnic groups. This is particularly obvious in India, a country with the second largest population of billionaires but also home to 25% of the world's poor. Ethiopia, one of the poorest countries in the world, has also shown slight economic growth and reduction in poverty. Inequalities still exist, with boys more likely to be malnourished than girls and more absolute poverty in rural areas, although relative poverty is higher in urban areas. This data was collected before the 2008 drought and the recent increase in food prices, which have had a severe impact on the ability of Ethiopia to feed its population.

Capability Approach and the Child Development Index

Recently, debate among philosophers and theorists on how to define and measure poverty stems from the emergence of the human capability approach, where poverty is defined by Hi Kos extent of freedoms that a person possesses. Amartya Sen, the creator of the capability approach, argues that there are five fundamental freedoms that should be available to all humans: political freedoms, economic facilities, social opportunities, transparency guarantees, and protective security. He also suggests that they are all interconnected, where each freedom fosters and/or enhances the others.

Additionally, the capability approach claims that development should be considered a process of expanding freedoms or removing the major sources of unfreedom rather than a focus on narrower measurements such as growth of gross national product, per capita income, or industrialization. According to kos basic needs approach (which in most aspects is quite like the capability approach), the objective of development should be to provide all humans with the opportunity to a full life, which goes beyond abstractions such as money, income, or employment. Therefore, the definition and measurement of poverty in general must extend beyond measurements like per capita GDP, which tools such as the Human Development Index attempt to accomplish. 

In light of this, a UK initiative, Save the Children, has also developed a measurement of child poverty based on measures of capability, called the Child Development Index (CDI). CDI is an index that combines performance measures specific to children – primary education, child health, and child nutrition – to produce a score on a scale of 0 to 100, with zero being the best with higher scores indicating worse performances. According to Save the Children, each of the indicators was chosen because it was easily accessible, universally understood, and clearly indicative of child wellbeing. Health measures under-five mortality rate; nutrition measures the percentage of children under five who are moderately or severely underweight (which is two standard deviations below the median weight for age of the reference population); and education measures the percentage of primary school-age children that are not enrolled in school. In terms of opportunities and capabilities, CDI is the most appropriate measurement of child poverty.

Of the estimated 2.2 billion children worldwide, about a billion, or every second child, live in poverty. Of the 1.9 billion children in developing nations, 640 million are without adequate shelter; 400 million are without access to safe water; 270 million have no access to health services. In 2003, 10.6 million children died before reaching the age of five, which is equivalent to the total child population of France, Germany, Greece, and Italy. 1.4 million die each year from lack of access to safe drinking water and adequate sanitation while 2.2 million die each year due to lack of immunizations.

The Child Development Index also illustrates relative child poverty compared across all regions of the world (see Measuring child poverty).
  • World performance: CDI = 17.5
  • Africa: CDI = 34.5
  • Middle East/North Africa: CDI = 11.2
  • Central/East Europe and Central Asia: CDI = 9.2
  • South Asia: CDI = 26.4
  • East Asia: CDI = 8.5
  • Latin America and Caribbean: CDI = 6.8
  • Developed Countries: CDI = 2.1
The CDI in Africa is twice that of the world average, and South Asia also fares poorly in relation to the global performance. In contrast, the CDI in developed countries is one-ninth of the world CDI, indicating a clear distinction between developing and developed nations.

However, in 2013, child poverty reached record high levels in the United States, with 16.7 million children, more than 20%, living in food insecure households. 47 million Americans depend on food banks, more than 30% above 2007 levels. Households headed by single mothers are most likely to be affected. Worst affected are the District of Columbia, Oregon, Arizona, New Mexico and Florida, while North Dakota, New Hampshire, Virginia, Minnesota and Massachusetts are the least affected.

Causes

The majority of poverty-stricken children are born to poor parents. Therefore, the causes such as adult poverty, government policies, lack of education, unemployment, social services, disabilities and discrimination significantly affect the presence of child poverty. Lack of parental economic resources such as disposable income restricts children’s opportunities. Economic and demographic factors such as deindustrialization, globalization, residential segregation, labor market segmentation, and migration of middle-class residents from inner cities, constrain economic opportunities and choices across generation, isolating inner city poor children.

The decline of the nuclear family, illegitimacy, teen pregnancy, and increased numbers of single mothers, is also cited as a major cause of poverty and welfare dependency for women and their children. Children resulting from unintended pregnancies are more likely to live in poverty; raising a child requires significant resources, so each additional child increases demands on parental resources. Families raised by a single parent are generally poorer than those raised by couples. In the United States, 6 of 10 long term poor children have spent time in single parent families and in 2007, children living in households headed by single mothers were five times as likely as children living in households headed by married parents to be living in poverty.

Many of the apparent negative associations between growing up poor and children’s attainments reflect unmeasured parental advantages that positively affect both parents’ incomes and children’s attainments, like parental depression.

Effect

Developed countries

Developed countries also have a serious problem with child poverty. If all the 16.7 million poor children in America were gathered in one place, they would form a city bigger than New York. Many published studies have demonstrated strong associations between childhood poverty and the child’s adult outcomes in education, health and socialization, fertility, labor market, and income. Strong evidence suggests that children of low income parents have an increased risk of intellectual and behavioral development problems. Large negative associations between poverty during early childhood and academic outcomes have been consistently found in many studies. Furthermore, children in poverty have a greater risk of displaying behavior and emotional problems, such as impulsiveness and difficulty getting along with peers, and family poverty is associated with higher risk for teen childbearing, less positive peer relations, and lower self-esteem.

In terms of economic disadvantages, adults who experienced persistent childhood poverty are more likely to fall below the poverty line at least once later in life. Poor boys work fewer hours per year, earn lower hourly wages, receive lower annual earnings, and spend more week idle in their mid-twenties. Paternal income is also strongly associated with adult economic status. The National Academy of Sciences found that "childhood poverty and chronic stress may lead to problems regulating emotions as an adult".

Also, childhood poverty in the first three years of life is related to substandard nutritional status and poor motor skills; in contrast, poverty is also associated with child obesity – as they get older, poor children are more likely to have chronic health problems, such as asthma and anemia. These impacts probably reflect issues related to poverty including a substandard diet, inferior housing conditions, poor neighborhood environment, reduced access to goods and activities and the psychological stress stemming from these factors.

The relationship between childhood poverty and later negative adult outcomes has been found to be relatively small in other research. In one systematic analysis, family income only "modestly" affected likelihood of teen pregnancy and male unemployment.

Developing countries

Children of road workers near Rishikesh, India.
 
Using a relative measure of child poverty, an impoverished child growing up in a developing country suffers more hardship than most children living in poverty in a developed country. Poverty in these countries is a condition usually characterised by a severe deprivation of basic human needs (UN,1995). It is estimated that one third of all children in developing countries (~674 million) are living in poverty, the highest rates being in the rural areas of Sub-Saharan Africa and South Asia (over 70%). War, disease, corruption, lack of resources and harsh environmental conditions afflict many of these countries, contributing to their poverty. These factors are a major cause of death, which in turn leads to a higher number of single parents and orphaned children. All UN member states have ratified the 1989 Convention on the Rights of the Child, with the exception of the United States and South Sudan, which aims at reducing violations to a number of rights relevant to reducing child poverty in different countries. A review published by UNICEF in 2009, found declines in under five mortality, less child malnourishment, increases in breastfeeding, improved water systems and better education access. It also states that despite these improvements 24 000 children still die each day from mostly preventable diseases, 150 million 5- to 14-year-olds are involved in child labour and 100 million primary aged children go without schooling. There are still great inequalities within populations, with girls and children from rural areas more likely to suffer poor health, education and survival than boys and urban populations. Notable state attempts to tackle child poverty in the developing world, include Brazil's Bolsa Familia initiative (reaches 12 million households) and South Africa's Child Grant (7 million households. Elsewhere, child specific social protection policies and programmes are few and the institutions to implement them are often lacking.

Cycle of poverty

The cycle of poverty is when a family remains in poverty over many successive generations. For this reason reducing child poverty has been a focus of almost all governments as a way to break this cycle. Improving the quality of education provided to the poor is seen by most as the best way to break this cycle. Improving the environment the child grows up in, ensuring access to health, and providing financial incentives (either through benefit schemes or reducing taxes) have all been suggested as ways to break the cycle
.
Boys and girls have equal rates of poverty through their childhoods but as women enter their teens and childbearing years the rates of poverty between the genders widens. Globally, women are far more impoverished than men and poor children are more likely to live in female-headed households. Attempts to combat the cycle of poverty, therefore, have often targeted mothers as a way to interrupt the negative patterns of poverty that affect the education, nutrition/health, and psychological/social outcomes for poor children.

Policy implications

According to the Overseas Development Institute, greater visibility for children's rights issues is needed in donor policies and attempts should be made to emulate the success achieved using gender markers to develop gender-sensitive development policy. They believe major influential players in the children's rights community – the UNICEF, UNFPA and NGOs, such as Save the Children, Plan International and World Vision – should do more to highlight the impact of mainstream macro-policies issues on children. The Overseas Development Institute further suggests that an international commission be established to address the impact of the 3-F crisis (food, financial and fuel) on children as a platform for dialogue and new initiatives.

However, determining the appropriate policies for dealing with long-term childhood poverty and intergenerational economic inequality is hotly debated, as are most proposed policy solutions, and depends on the effects that most impact the region. In order to combat the lack of resources available in developed nations, policies must be developed that deliver resources to poor families and raise skill levels of poor children by building on successful welfare-to-work initiatives and maintaining financial work supports, such as Earned Income Tax Credit, refundable child care tax credits and housing vouchers. Combating poverty in developed countries also means improving the schools that exist there. In order to help children in poverty, schools need to invest more money in school meals, libraries, and healthcare. To effectively address economic, demographic and cultural changes, economic and social service strategies to reverse the factors that generated the urban underclass, such as providing jobs and social services policies that deal with the effects of isolation, should be implemented. Finally, in order to reduce the cycle of poverty for children and families, policies should be aimed at expanding economic opportunities especially for disadvantaged girls.

Internet linguistics

From Wikipedia, the free encyclopedia

Internet linguistics is a domain of linguistics advocated by the English linguist David Crystal. It studies new language styles and forms that have arisen under the influence of the Internet and of other new media, such as Short Message Service (SMS) text messaging. Since the beginning of human-computer interaction (HCI) leading to computer-mediated communication (CMC) and Internet-mediated communication (IMC), experts have acknowledged that linguistics has a contributing role in it, in terms of web interface and usability. Studying the emerging language on the Internet can help improve conceptual organization, translation and web usability. Such study aims to benefit both linguists and web users.
 
The study of Internet linguistics can take place through four main perspectives: sociolinguistics, education, stylistics and applied linguistics. Further dimensions have developed as a result of further technological advances - which include the development of the Web as corpus and the spread and influence of the stylistic variations brought forth by the spread of the Internet, through the mass media and through literary works. In view of the increasing number of users connected to the Internet, the linguistics future of the Internet remains to be determined, as new computer-mediated technologies continue to emerge and people adapt their languages to suit these new media. The Internet continues to play a significant role both in encouraging as well as in diverting attention away from the usage of languages.

Main perspectives

David Crystal has identified four main perspectives for further investigation – the sociolinguistic perspective, the educational perspective, the stylistic perspective and the applied perspective. The four perspectives are effectively interlinked and affect one another.

Sociolinguistic perspective

This perspective deals with how society views the impact of Internet development on languages. The advent of the Internet has revolutionized communication in many ways; it changed the way people communicate and created new platforms with far-reaching social impact. Significant avenues include but are not limited to SMS text messaging, e-mails, chatgroups, virtual worlds and the Web.

The evolution of these new mediums of communications has raised much concern with regards to the way language is being used. According to Crystal (2005), these concerns are neither without grounds nor unseen in history – it surfaces almost always when a new technology breakthrough influences languages; as seen in the 15th century when printing was introduced, the 19th century when the telephone was invented and the 20th century when broadcasting began to penetrate our society.

At a personal level, CMC such as SMS Text Messaging and mobile e-mailing (push mail) has greatly enhanced instantaneous communication. Some examples include the iPhone and the BlackBerry

In schools, it is not uncommon for educators and students to be given personalized school e-mail accounts for communication and interaction purposes. Classroom discussions are increasingly being brought onto the Internet in the form of discussion forums. For instance, at Nanyang Technological University, students engage in collaborative learning at the university’s portal – edveNTUre, where they participate in discussions on forums and online quizzes and view streaming podcasts prepared by their course instructors among others. iTunes U in 2008 began to collaborate with universities as they converted the Apple music service into a store that makes available academic lectures and scholastic materials for free – they have partnered more than 600 institutions in 18 countries including Oxford, Cambridge and Yale Universities.

These forms of academic social networking and media are slated to rise as educators from all over the world continue to seek new ways to better engage students. It is commonplace for students in New York University to interact with “guest speakers weighing in via Skype, library staffs providing support via instant messaging, and students accessing library resources from off campus.” This will affect the way language is used as students and teachers begin to use more of these CMC platforms.

At a professional level, it is a common sight for companies to have their computers and laptops hooked up onto the Internet (via wired and wireless Internet connection), and for employees to have individual e-mail accounts. This greatly facilitates internal (among staffs of the company) and external (with other parties outside of one’s organization) communication. Mobile communications such as smart phones are increasingly making their way into the corporate world. For instance, in 2008, Apple announced their intention to actively step up their efforts to help companies incorporate the iPhone into their enterprise environment, facilitated by technological developments in streamlining integrated features (push e-mail, calendar and contact management) using ActiveSync.

In general, these new CMCs that are made possible by the Internet have altered the way people use language – there is heightened informality and consequently a growing fear of its deterioration. However, as David Crystal puts it, these should be seen positively as it reflects the power of the creativity of a language.

Themes

The sociolinguistics of the Internet may also be examined through five interconnected themes.
  1. Multilingualism – It looks at the prevalence and status of various languages on the Internet.
  2. Language change – From a sociolinguistic perspective, language change is influenced by the physical constraints of technology (e.g. typed text) and the shifting social-economic priorities such as globalization. It explores the linguistic changes over time, with emphasis on Internet lingo.
  3. Conversation discourse – It explores the changes in patterns of social interaction and communicative practice on the Internet.
  4. Stylistic diffusion – It involves the study of the spread of Internet jargons and related linguistic forms into common usage. As language changes, conversation discourse and stylistic diffusion overlap with the aspect of language stylistics.
  5. Metalanguage and folk linguistics – It involves looking at the way these linguistic forms and changes on the Internet are labelled and discussed (e.g. impact of Internet lingo resulted in the 'death' of the apostrophe and loss of capitalization.)

Educational perspective

The educational perspective of internet linguistics examines the Internet's impact on formal language use, specifically on Standard English, which in turn affects language education. The rise and rapid spread of Internet use has brought about new linguistic features specific only to the Internet platform. These include, but are not limited to, an increase in the use of informal written language, inconsistency in written styles and stylistics and the use of new abbreviations in Internet chats and SMS text messaging, where constraints of technology on word count contributed to the rise of new abbreviations. Such acronyms exist primarily for practical reasons — to reduce the time and effort required to communicate through these mediums apart from technological limitations. Examples of common acronyms include lol (for laughing out loud; a general expression of laughter), omg (oh my god) and gtg (got to go).

The educational perspective has been considerably established in the research on the Internet's impact on language education. It is an important and crucial aspect as it affects and involves the education of current and future student generations in the appropriate and timely use of informal language that arises from Internet usage. There are concerns for the growing infiltration of informal language use and incorrect word use into academic or formal situations, such as the usage of casual words like "guy" or the choice of the word "preclude" in place of "precede" in academic papers by students. There are also issues with spellings and grammar occurring at a higher frequency among students' academic works as noted by educators, with the use of abbreviations such as "u" for "you" and "2" for "to" being the most common.

Linguists and professors like Eleanor Johnson suspect that widespread mistakes in writing are strongly connected to Internet usage, where educators have similarly reported new kinds of spelling and grammar mistakes in student works. There is, however, no scientific evidence to confirm the proposed connection. Though there are valid concerns about Internet usage and its impact on students' academic and formal writing, its severity is however enlarged by the informal nature of the new media platforms. Naomi S. Baron (2008) argues in Always On that student writings suffer little impact from the use of Internet-mediated communication (IMC) such as internet chat, SMS text messaging and e-mail. A study in 2009 published by the British Journal of Developmental Psychology found that students who regularly texted (sent messages via SMS using a mobile phone) displayed a wider range of vocabulary and this may lead to a positive impact on their reading development.

Though the use of the Internet resulted in stylistics that are not deemed appropriate in academic and formal language use, it is to be noted that Internet use may not hinder language education but instead aid it. The Internet has proven in different ways that it can provide potential benefits in enhancing language learning, especially in second or foreign language learning. Language education through the Internet in relation to Internet linguistics is, most significantly, applied through the communication aspect (use of e-mails, discussion forums, chat messengers, blogs, etc.). IMC allows for greater interaction between language learners and native speakers of the language, providing for greater error corrections and better learning opportunities of standard language, in the process allowing the picking up of specific skills such as negotiation and persuasion.

Stylistic perspective

This perspective examines how the Internet and its related technologies have encouraged new and different forms of creativity in language, especially in literature. It looks at the Internet as a medium through which new language phenomena have arisen. This new mode of language is interesting to study because it is an amalgam of both spoken and written languages. For example, traditional writing is static compared to the dynamic nature of the new language on the Internet where words can appear in different colors and font sizes on the computer screen. Yet, this new mode of language also contains other elements not found in natural languages. One example is the concept of framing found in e-mails and discussion forums. In replying to e-mails, people generally use the sender’s e-mail message as a frame to write their own messages. They can choose to respond to certain parts of an e-mail message while leaving other bits out. In discussion forums, one can start a new thread and anyone regardless of their physical location can respond to the idea or thought that was set down through the Internet. This is something that is usually not found in written language.

Future research also includes new varieties of expressions that the Internet and its various technologies are constantly producing and their effects not only on written languages but also their spoken forms. The communicative style of Internet language is best observed in the CMC channels below, as there are often attempts to overcome technological restraints such as transmission time lags and to re-establish social cues that are often vague in written text.

Mobile phones

Mobile phones (also called cell phones) have an expressive potential beyond their basic communicative functions. This can be seen in text-messaging poetry competitions such as the one held by The Guardian. The 160-character limit imposed by the cell phone has motivated users to exercise their linguistic creativity to overcome them. A similar example of new technology with character constraints is Twitter, which has a 280-character limit. There have been debates as to whether these new abbreviated forms introduced in users’ Tweets are "lazy" or whether they are creative fragments of communication. Despite the ongoing debate, there is no doubt that Twitter has contributed to the linguistic landscape with new lingoes and also brought about a new dimension of communication.

The cell phone has also created a new literary genre – cell phone novels. A typical cell phone novel consists of several chapters which readers download in short installments. These novels are in their "raw" form as they do not go through editing processes like traditional novels. They are written in short sentences, similar to text-messaging. Authors of such novels are also able to receive feedback and new ideas from their readers through e-mails or online feedback channels. Unlike traditional novel writing, readers’ ideas sometimes get incorporated into the storyline or authors may also decide to change their story’s plot according to the demand and popularity of their novel (typically gauged by the number of download hits). Despite their popularity, there has also been criticism regarding the novels’ "lack of diverse vocabulary" and poor grammar.

Blogs

Blogging has brought about new ways of writing diaries and from a linguistic perspective, the language used in blogs is "in its most 'naked' form", published for the world to see without undergoing the formal editing process. This is what makes blogs stand out because almost all other forms of printed language have gone through some form of editing and standardization. David Crystal stated that blogs were "the beginning of a new stage in the evolution of the written language". Blogs have become so popular that they have expanded beyond written blogs, with the emergence of photoblog, videoblog, audioblog and moblog. These developments in interactive blogging have created new linguistic conventions and styles, with more expected to arise in the future.

Virtual worlds

Virtual worlds provide insights into how users are adapting the usage of natural language for communication within these new mediums. The Internet language that has arisen through user interactions in text-based chatrooms and computer-simulated worlds has led to the development of slangs within digital communities. Examples of these include pwn and noob. Emoticons are further examples of how users have adapted different expressions to suit the limitations of cyberspace communication, one of which is the "loss of emotivity".

Communication in niches such as role-playing games (RPG) of Multi-User domains (MUDs) and virtual worlds is highly interactive, with emphasis on speed, brevity and spontaneity. As a result, CMC is generally more vibrant, volatile, unstructured and open. There are often complex organization of sequences and exchange structures evident in the connection of conversational strands and short turns. Some of the CMC strategies used include capitalization for words such as EMPHASIS, usage of symbols such as the asterisk to enclose words as seen in *stress* and the creative use of punctuation like ???!?!?!?. Symbols are also used for discourse functions, such as the asterisk as a conversational repair marker and arrows and carats as deixis and referent markers. Besides contributing to these new forms in language, virtual worlds are also being used to teach languages. Virtual world language learning provides students with simulations of real-life environments, allowing them to find creative ways to improve their language skills. Virtual worlds are good tools for language learning among the younger learners because they already see such places as a "natural place to learn and play".

E-mail

One of the most popular Internet-related technologies to be studied under this perspective is e-mail, which has expanded the stylistics of languages in many ways. A study done on the linguistic profile of e-mails has shown that there is a hybrid of speech and writing styles in terms of format, grammar and style. E-mail is rapidly replacing traditional letter-writing because of its convenience, speed and spontaneity. It is often related to informality as it feels temporary and can be deleted easily. However, as this medium of communication matures, e-mail is no longer confined to sending informal messages between friends and relatives. Instead, business correspondences are increasingly being carried out through e-mails. Job seekers are also using e-mails to send their resumes to potential employers. The result of a move towards more formal usages will be a medium representing a range of formal and informal stylistics.

While e-mail has been blamed for students’ increased usage of informal language in their written work, David Crystal argues that e-mail is "not a threat, for language education" because e-mail with its array of stylistic expressiveness can act as a domain for language learners to make their own linguistic choices responsibly. Furthermore, the younger generation’s high propensity for using e-mail may improve their writing and communication skills because of the efforts they are making to formulate their thoughts and ideas, albeit through a digital medium.

Instant messaging

Like other forms of online communication, instant messaging has also developed its own acronyms and short forms. However, instant messaging is quite different from e-mail and chatgroups because it allows participants to interact with one another in real-time while conversing in private. With instant messaging, there is an added dimension of familiarity among participants. This increased degree of intimacy allows greater informality in language and "typographical idiosyncrasies". There are also greater occurrences of stylistic variation because there can be a very wide age gap between participants. For example, a granddaughter can catch up with her grandmother through instant messaging. Unlike chatgroups where participants come together with shared interests, there is no pressure to conform in language here.

Applied perspective


The applied perspective views the linguistic exploitation of the Internet in terms of its communicative capabilities – the good and the bad. The Internet provides a platform where users can experience multilingualism. Although English is still the dominant language used on the Internet, other languages are gradually increasing in their number of users. The Global Internet usage page provides some information on the number of users of the Internet by language, nationality and geography. This multilingual environment continues to increase in diversity as more language communities become connected to the Internet. The Internet is thus a platform where minority and endangered languages can seek to revive their language use and/or create awareness. This can be seen in two instances where it provides these languages opportunities for progress in two important regards - language documentation and language revitalization.

Language documentation

Firstly, the Internet facilitates language documentation. Digital archives of media such as audio and video recordings not only help to preserve language documentation, but also allows for global dissemination through the Internet. Publicity about endangered languages, such as Webster (2003) has helped to spur a worldwide interest in linguistic documentation. 

Foundations such as the Hans Rausing Endangered Languages Project (HRELP), funded by Arcadia also help to develop the interest in linguistic documentation. The HRELP is a project that seeks to document endangered languages, preserve and disseminate documentation materials among others. The materials gathered are made available online under its Endangered Languages Archive (ELAR) program.

Other online materials that support language documentation include the Language Archive Newsletter which provides news and articles about topics in endangered languages. The web version of Ethnologue also provides brief information of all of the world’s known living languages. By making resources and information of endangered languages and language documentation available on the Internet, it allows researchers to build on these materials and hence preserve endangered languages.

Language revitalization

Secondly, the Internet facilitates language revitalization. Throughout the years, the digital environment has developed in various sophisticated ways that allow for virtual contact. From e-mails, chats to instant messaging, these virtual environments have helped to bridge the spatial distance between communicators. The use of e-mails has been adopted in language courses to encourage students to communicate in various styles such as conference-type formats and also to generate discussions. Similarly, the use of e-mails facilitates language revitalization in the sense that speakers of a minority language who moved to a location where their native language is not being spoken can take advantage of the Internet to communicate with their family and friends, thus maintaining the use of their native language. With the development and increasing use of telephone broadband communication such as Skype, language revitalization through the internet is no longer restricted to literate users.

Hawaiian educators have been taking advantage of the Internet in their language revitalization programs. The graphical bulletin board system, Leoki (Powerful Voice), was established in 1994. The content, interface and menus of the system are entirely in the Hawaiian language. It is installed throughout the immersion school system and includes components for e-mails, chat, dictionary and online newspaper among others. In higher institutions such as colleges and universities where the Leoki system is not yet installed, the educators make use of other software and Internet tools such as Daedalus Interchange, e-mails and the Web to connect students of Hawaiian language with the broader community.

Another use of the Internet includes having students of minority languages write about their native cultures in their native languages for distant audiences. Also, in an attempt to preserve their language and culture, Occitan speakers have been taking advantage of the Internet to reach out to other Occitan speakers from around the world. These methods provide reasons for using the minority languages by communicating in it. In addition, the use of digital technologies, which the young generation think of as ‘cool’, will appeal to them and in turn maintain their interest and usage of their native languages.

Exploitation of the Internet

The Internet can also be exploited for activities such as terrorism, internet fraud and pedophilia. In recent years, there has been an increase in crimes that involved the use of the Internet such as e-mails and Internet Relay Chat (IRC), as it is relatively easy to remain anonymous. These conspiracies carry concerns for security and protection. From a forensic linguistic point of view, there are many potential areas to explore. While developing a chat room child protection procedure based on search terms filtering is effective, there is still minimal linguistically orientated literature to facilitate the task. In other areas, it is observed that the Semantic Web has been involved in tasks such as personal data protection, which helps to prevent fraud.

Dimensions

The dimensions covered in this section include looking at the Web as a corpus and issues of language identification and normalization. The impacts of internet linguistics on everyday life are examined under the spread and influence of Internet stylistics, trends of language change on the Internet and conversation discourse.

The Web as a corpus

With the Web being a huge reservoir of data and resources, language scientists and technologists are increasingly turning to the web for language data. Corpora were first formally mentioned in the field of computational linguistics at the 1989 ACL meeting in Vancouver. It was met with much controversy as they lacked theoretical integrity leading to much skepticism of their role in the field, until the publication of the journal ‘Using Large Corpora’ in 1993 that the relationship between computational linguistics and corpora became widely accepted.

To establish whether the Web is a corpus, it is worthwhile to turn to the definition established by McEnery and Wilson (1996, pp 21).
In principle, any collection of more than one text can be called a corpus. . . . But the term “corpus” when used in the context of modern linguistics tends most frequently to have more specific connotations than this simple definition provides for. These may be considered under four main headings: sampling and representativeness, finite size, machine-readable form, a standard reference.
— Tony McEnery and Andrew Wilson, Corpus Linguistics
Relating closer to the Web as a Corpus, Manning and Schütze (1999, pp 120) further streamlines the definition:
In Statistical NLP [Natural Language Processing], one commonly receives as a corpus a certain amount of data from a certain domain of interest, without having any say in how it is constructed. In such cases, having more training data is normally more useful than any concerns of balance, and one should simply use all the text that is available.
— Christopher Manning and Hinrich Schütze, Foundations of Statistical Language Processing
Hit counts were used for carefully constructed search engine queries to identify rank orders for word sense frequencies, as an input to a word sense disambiguation engine. This method was further explored with the introduction of the concept of a parallel corpora where the existing Web pages that exist in parallel in local and major languages be brought together. It was demonstrated that it is possible to build a language-specific corpus from a single document in that specific language.

Themes

There has been much discussion about the possible developments in the arena of the Web as a corpus. The development of using the web as a data source for word sense disambiguation was brought forward in The EU MEANING project in 2002. It used the assumption that within a domain, words often have a single meaning, and that domains are identifiable on the Web. This was further explored by using Web technology to gather manual word sense annotations on the Word Expert Web site.

In areas of language modeling, the Web has been used to address data sparseness. Lexical statistics have been gathered for resolving prepositional phrase attachments, while Web document were used to seek a balance in the corpus.

In areas of information retrieval, a Web track was integrated as a component in the community’s TREC evaluation initiative. The sample of the Web used for this exercise amount to around 100GB, compromising of largely documents in the .gov top level domain.

British National Corpus

The British National Corpus contains ample information on the dominant meanings and usage patterns for the 10,000 words that forms the core of English. 

The number of words in the British National Corpus (ca 100 million) is sufficient for many empirical strategies for learning about language for linguists and lexicographers, and is satisfactory for technologies that utilize quantitative information about the behavior of words as input (parsing).

However, for some other purposes, it is insufficient, as an outcome of the Zipfian nature of word frequencies. Because the bulk of the lexical stock occurs less than 50 times in the British National Corpus, it is insufficient for statistically stable conclusions about such words. Furthermore, for some rarer words, rare meanings of common words, and combinations of words, no data has been found. Researchers find that probabilistic models of language based on very large quantities of data are better than ones based on estimates from smaller, cleaner data sets.

The multilingual Web

The Web is clearly a multilingual corpus. It is estimated that 71% of the pages (453 million out of 634 million Web pages indexed by the Excite engine) were written in English, followed by Japanese (6.8%), German (5.1%), French (1.8%), Chinese (1.5%), Spanish (1.1%), Italian (0.9%), and Swedish (0.7%).

A test to find contiguous words like ‘deep breath’ revealed 868,631 Web pages containing the terms in AlltheWeb. The number found through the search engines are more than three times the counts generated by the British National Corpus, indicating the significant size of the English corpus available on the Web.

The massive size of text available on the Web can be seen in the analysis of controlled data in which corpora of different languages were mixed in various proportions. The estimated Web size in words by AltaVista saw English at the top of the list with 76,598,718,000 words. The next is German, with 7,035,850,000 words along with 6 other languages with over a billion hits. Even languages with fewer hits on the Web such as Slovenian, Croatian, Malay, and Turkish have more than one hundred million words on the Web. This reveals the potential strength and accuracy of using the Web as a Corpus given its significant size, which warrants much additional research such as the project currently being carried out by the British National Corpus to exploit its scale.

Challenges

In areas of language modeling, there are limitations on the applicability of any language model as the statistics for different types of text will be different. When a language technology application is put into use (applied to a new text type), it is not certain that the language model will fare in the same way as how it would when applied to the training corpus. It is found that there are substantial variations in model performance when the training corpus changes. This lack of theory types limits the assessment of the usefulness of language-modeling work. 

As Web texts are easily produced (in terms of cost and time) and with many different authors working on them, it often results in little concern for accuracy. Grammatical and typographical errors are regarded as “erroneous” forms that cause the Web to be a dirty corpus. Nonetheless, it may still be useful even with some noise.

The issue of whether sublanguages should be included remains unsettled. Proponents of it argue that with all sublanguages removed, it will result in an impoverished view of language. Since language is made up of lexicons, grammar and a wide array of different sublanguages, they should be included. However, it is not until recently that it became a viable option. Striking a middle ground by including some sublanguages is contentious because it’s an arbitrary issue of which to include and which not.

The decision of what to include in a corpus lies with corpus developers, and it has been done so with pragmatism. The desiderata and criteria used for the British National Corpus serves as a good model for a general-purpose, general-language corpus with the focus of being representative replaced with being balanced.

Search engines such as Google serves as a default means of access to the Web and its wide array of linguistics resources. However, for linguists working in the field of corpora, there presents a number of challenges. This includes the limited instances that are presented by the search engines (1,000 or 5,000 maximum); insufficient context for each instance (Google provides a fragment of around ten words); results selected according to criteria that are distorted (from a linguistic point of view) as search term in titles and headings often occupy the top results slots; inability to allow searches to be specified according to linguistic criteria, such as the citation form for a word, or word class; unreliability of statistics, with results varying according to search engine load and many other factors. At present, in view of the conflicts of priorities among the different stakeholders, the best solution is for linguists to attempt to correct these problems by themselves. This will then lead to a large number of possibilities opening in the area of harnessing the rich potential of the Web.

Representation

Despite the sheer size of the Web, it may still not be representative of all the languages and domains in the world, and neither are other corpora. However, the huge quantities of text, in numerous languages and language types on a huge range of topics makes it a good starting point that opens up to a large number of possibilities in the study of corpora.

Impact of its spread and influence

Stylistics arising from Internet usage has spread beyond the new media into other areas and platforms, including but not limited to, films, music and literary works. The infiltration of Internet stylistics is important as mass audiences are exposed to the works, reinforcing certain Internet specific language styles which may not be acceptable in standard or more formal forms of language.

Apart from internet slang, grammatical errors and typographical errors are features of writing on the Internet and other CMC channels. As users of the Internet gets accustomed to these errors, it progressively infiltrates into everyday language use, in both written and spoken forms. It is also common to witness such errors in mass media works, from typographical errors in news articles to grammatical errors in advertisements and even internet slang in drama dialogues. 

The more the internet is incorporated into daily life, the greater the impact it has on formal language. This is especially true in modern Language Arts classes through the use of smart phones, tablets, and social media. Students are exposed to the language of the internet more than ever, and as such, the grammatical structure and slang of the internet are bleeding into their formal writing. Full immersion into a language is always the best way to learn it. Mark Lester in his book Teaching Grammar and Usage states, “The biggest single problem that basic writers have in developing successful strategies for coping with errors is simply their lack of exposure to formal written English...We would think it absurd to expect a student to master a foreign language without extensive exposure to it.” Since students are immersed in internet language, that is the form and structure they are mirroring.

Mass media

There has been instances of television advertisements using Internet slang, reinforcing the penetration of Internet stylistics in everyday language use. For example, in the Cingular commercial in the United States, acronyms such as "BFF Jill" (which means "Best Friend Forever, Jill") were used. More businesses have adopted the use of Internet slang in their advertisements as the more people are growing up using the Internet and other CMC platforms, in an attempt to relate and connect to them better. Such commercials have received relatively enthusiastic feedback from its audiences.

The use of Internet lingo has also spread into the arena of music, significantly seen in popular music. A recent example is Trey Songz's lyrics for "LOL :-)", which incorporated many Internet lingo and mentions of Twitter and texting.

The spread of Internet linguistics is also present in films made by both commercial and independent filmmakers. Though primarily screened at film festivals, DVDs of independent films are often available for purchase over the internet including paid-live-streamings, making access to films more easily available for the public. The very nature of commercial films being screened at public cinemas allows for the wide exposure to the mainstream mass audience, resulting in a faster and wider spread of Internet slangs. The latest commercial film is titled "LOL" (acronym for Laugh Out Loud or Laughing Out Loud), starring Miley Cyrus and Demi Moore. This movie is a 2011 remake of the Lisa Azuelos' 2008 popular French film similarly titled "LOL (Laughing Out Loud)".

The use of internet slangs is not limited to the English language but extends to other languages as well. The Korean language has incorporated the English alphabet in the formation of its slang, while others were formed from common misspellings arising from fast typing. The new Korean slang is further reinforced and brought into everyday language use by television shows such as soap operas or comedy dramas like “High Kick Through the Roof” released in 2009.

Linguistic future of the Internet

With the emergence of greater computer/Internet mediated communication systems, coupled with the readiness with which people adapt to meet the new demands of a more technologically sophisticated world, it is expected that users will continue to remain under pressure to alter their language use to suit the new dimensions of communication.

As the number of Internet users increase rapidly around the world, the cultural background, linguistic habits and language differences among users are brought into the Web at a much faster pace. These individual differences among Internet users are predicted to significantly impact the future of Internet linguistics, notably in the aspect of the multilingual web. As seen from 2000 to 2010, Internet penetration has experienced its greatest growth in non-English speaking countries such as China and India and countries in Africa, resulting in more languages apart from English penetrating the Web.

Also, the interaction between English and other languages is predicted to be an important area of study. As global users interact with each other, possible references to different languages may continue to increase, resulting in formation of new Internet stylistics that spans across languages. Chinese and Korean languages have already experienced English language's infiltration leading to the formation of their multilingual Internet lingo.

At current state, the Internet provides a form of education and promotion for minority languages. However, similar to how cross-language interaction has resulted in English language's infiltration into Chinese and Korean languages to form new slangs, minority languages are also affected by the more common languages used on the Internet (such as English and Spanish). While language interaction can cause a loss in the authentic standard of minority languages, familiarity of the majority language can also affect the minority languages in adverse ways. For example, users attempting to learn the minority language may opt to read and understand about it in a majority language and stop there, resulting in a loss instead of gain in the potential speakers of the minority language. Also, speakers of minority languages may be encouraged to learn the more common languages that are being used on the Web in order to gain access to more resources, and in turn leading to a decline in their usage of their own language. The future of endangered minority languages in view of the spread of Internet remains to be observed.

Brezhnev Doctrine

From Wikipedia, the free encyclopedia Eastern Bloc : the USSR and its satelli...