Transcranial magnetic stimulation (schematic diagram)
Transcranial magnetic stimulation (TMS) is a noninvasive form of brain stimulation in which a changing magnetic field is used to induce an electric current at a specific area of the brain through electromagnetic induction. An electric pulse generator, or stimulator, is connected to a magnetic coil
connected to the scalp. The stimulator generates a changing electric
current within the coil which creates a varying magnetic field, inducing
a current within a region in the brain itself.
TMS has shown diagnostic and therapeutic potential in the central nervous system with a wide variety of disease states in neurology and mental health, with research still evolving.
Adverse effects of TMS appear rare and include fainting and seizure. Other potential issues include discomfort, pain, hypomania, cognitive change, hearing loss, and inadvertent current induction in implanted devices such as pacemakers or defibrillators.
Medical uses
TMS does not require surgery or electrode implantation.
Its use can be diagnostic and/or therapeutic. Effects vary based
on frequency and intensity of the magnetic pulses as well as the length
of treatment, which dictates the total number of pulses given.
TMS treatments are approved by the FDA in the US and by NICE in the UK
for the treatment of depression and are predominantly provided by
private clinics. TMS stimulates cortical tissue without the pain
sensations produced in transcranial electrical stimulation.
Although
TMS is generally regarded as safe, risks are increased for therapeutic
rTMS compared to single or paired diagnostic TMS. Adverse effects generally increase with higher frequency stimulation.
The greatest immediate risk from TMS is fainting, though this is uncommon. Seizures have been reported, but are rare. Other adverse effects include short term discomfort, pain, brief episodes of hypomania, cognitive change, hearing loss, impaired working memory, and the induction of electrical currents in implanted devices such as cardiac pacemakers.
Procedure
During the procedure, a magnetic coil is positioned at the head of the person receiving the treatment using anatomical landmarks on the skull, in particular the inion and nasion. The coil is then connected to a pulse generator, or stimulator, that delivers electric current to the coil.
TMS uses electromagnetic induction to generate an electric current across the scalp and skull. A plastic-enclosed coil of wire is held next to the skull and when activated, produces a varying magnetic field oriented orthogonally
to the plane of the coil. The changing magnetic field then induces an
electric current in the brain that activates nearby nerve cells in a
manner similar to a current applied superficially at the cortical
surface.
The magnetic field is about the same strength as magnetic resonance imaging
(MRI), and the pulse generally reaches no more than 5 centimeters into
the brain unless using a modified coil and technique for deeper
stimulation.
Transcranial magnetic stimulation is achieved by quickly discharging current from a large capacitor into a coil to produce pulsed magnetic fields between 2 and 3 teslas in strength. Directing the magnetic field pulse at a targeted area in the brain causes a localized electrical current which can then either depolarize or hyperpolarize
neurons at that site.
The induced electric field inside the brain tissue causes a change in
transmembrane potentials resulting in depolarization or
hyperpolarization of neurons, causing them to be more or less excitable,
respectively.
TMS usually stimulates to a depth from 2 to 4 cm below the
surface, depending on the coil and intensity used. Consequently, only
superficial brain areas can be affected. Deep TMS can reach up to 6 cm into the brain to stimulate deeper layers of the motor cortex,
such as that which controls leg motion. The path of this current can be
difficult to model because the brain is irregularly shaped with
variable internal density and water content, leading to a nonuniform
magnetic field strength and conduction throughout its tissues.
Frequency and duration
The effects of TMS can be divided based on frequency, duration and intensity (amplitude) of stimulation:
Single or paired pulse TMS causes neurons in the neocortex under the site of stimulation to depolarize and discharge an action potential. If used in the primary motor cortex, it produces muscle activity referred to as a motor evoked potential (MEP) which can be recorded on electromyography. If used on the occipital cortex, 'phosphenes'
(flashes of light) might be perceived by the subject. In most other
areas of the cortex, there is no conscious effect, but behaviour may be
altered (e.g., slower reaction time on a cognitive task), or changes in
brain activity may be detected using diagnostic equipment.
Repetitive TMS produces longer-lasting effects which persist past
the period of stimulation. rTMS can increase or decrease the
excitability of the corticospinal tract
depending on the intensity of stimulation, coil orientation, and
frequency. Low frequency rTMS with a stimulus frequency less than 1 Hz
is believed to inhibit cortical firing while a stimulus frequency
greater than 1 Hz, or high frequency, is believed to provoke it. Though its mechanism is not clear, it has been suggested as being due to a change in synaptic efficacy related to long-term potentiation (LTP) and long-term depression like plasticity (LTD-like plasticity).
Coil types
Most
devices use a coil shaped like a figure-eight to deliver a shallow
magnetic field that affects more superficial neurons in the brain.
Differences in magnetic coil design are considered when comparing
results, with important elements including the type of material,
geometry and specific characteristics of the associated magnetic pulse.
The core material may be either a magnetically inert substrate ('air core'), or a solid, ferromagnetically
active material ('solid core'). Solid cores result in more efficient
transfer of electrical energy to a magnetic field and reduce energy loss
to heat, and so can be operated with the higher volume of therapy
protocols without interruption due to overheating. Varying the geometric shape of the coil itself can cause variations in focality,
shape, and depth of penetration. Differences in coil material and its
power supply also affect magnetic pulse width and duration.
A number of different types of coils exist, each of which produce
different magnetic fields. The round coil is the original used in TMS.
Later, the figure-eight (butterfly) coil was developed to provide a more
focal pattern of activation in the brain, and the four-leaf coil for
focal stimulation of peripheral nerves. The double-cone coil conforms
more to the shape of the head.
The Hesed (H-core), circular crown and double cone coils allow more
widespread activation and a deeper magnetic penetration. They are
supposed to impact deeper areas in the motor cortex and cerebellum controlling the legs and pelvic floor, for example, though the increased depth comes at the cost of a less focused magnetic pulse.
History
Luigi Galvani
(1737–1798) undertook research on the effects of electricity on the
body in the late-eighteenth century and laid the foundations for the
field of electrophysiology. In the 1830s Michael Faraday (1791–1867) discovered that an electrical current had a corresponding magnetic field, and that changing one could induce its counterpart.
Work to directly stimulate the human brain with electricity started in the late 1800s, and by the 1930s the Italian physicians Cerletti and Bini had developed electroconvulsive therapy (ECT).[37] ECT became widely used to treat mental illness, and ultimately overused, as it began to be seen as a panacea. This led to a backlash in the 1970s.
In 1980 Merton and Morton successfully used transcranial
electrical stimulation (TES) to stimulate the motor cortex. However,
this process was very uncomfortable, and subsequently Anthony T. Barker
began to search for an alternative to TES.
He began exploring the use of magnetic fields to alter electrical
signaling within the brain, and the first stable TMS devices were
developed in 1985.They were originally intended as diagnostic and research devices, with
evaluation of their therapeutic potential being a later development. The United States' FDA first approved TMS devices in October 2008.
With Parkinson's disease, early results suggest that low frequency stimulation may have an effect on medication associated dyskinesia, and that high frequency stimulation improves motor function. The most effective treatment protocols appear to involve high frequency stimulation of the motor cortex, particularly on the dominant side, but with more variable results for treatment of the dorsolateral prefrontal cortex. It is less effective than electroconvulsive therapy for motor symptoms, though both appear to have utility. Cerebellar stimulation has also shown potential for the treatment of levodopa associated dyskinesia.
TMS can also be used to map functional connectivity between the cerebellum and other areas of the brain.
A study on alternative Alzheimer's treatments at the Wahrendorff Clinic in Germany in 2021 reported that 84% of participants in the study have experienced positive effects after using the treatment.
Under the supervision of Professor Marc Ziegenbein, a psychiatry
and psychotherapy specialist, the study of 77 subjects with mild to
moderate Alzheimer's disease received frequent transcranial magnetic
stimulation applications and observed over a period of time.
Improvements were mainly found in the areas of orientation in the
environment, concentration, general well-being and satisfaction.
Study blinding
Mimicking the physical discomfort of TMS with placebo to discern its true effect is a challenging issue in research. It is difficult to establish a convincing placebo for TMS during controlledtrials in conscious individuals due to the neck pain, headache and twitching in the scalp or upper face associated with the intervention. In addition, placebo manipulations can affect brainsugar metabolism and MEPs, which may confound results. This problem is exacerbated when using subjective measures of improvement. Placebo responses in trials of rTMS in major depression are negatively associated with refractoriness to treatment.
A 2011 review found that most studies did not report unblinding.
In the minority that did, participants in real and sham rTMS groups
were not significantly different in their ability to correctly guess
their therapy, though there was a trend for participants in the real
group to more often guess correctly.
Nexstim obtained United States Federal Food, Drug, and Cosmetic Act§Section 510(k) clearance for the assessment of the primary motor cortex for pre-procedural planning in December 2009 and for neurosurgical planning in June 2011.
Depression
The National Institutes of Health estimates depression medications work for 60 percent to 70 percent of people who take them. TMS is approved as a Class II medical device under the "de novo pathway".
In addition, the World Health Organization reports that the number of
people living with depression has increased nearly 20 percent since
2005.
In a 2012 study, TMS was found to improve depression significantly in
58 percent of patients and provide complete remission of symptoms in 37
percent of patients.
In 2002, Cochrane Library reviewed randomized controlled trials using
TMS to treat depression. The review did not find a difference between
rTMS and sham TMS, except for a period 2 weeks after treatment. In 2018, Cochrane Library stated a plan to contact authors about updating the review of rTMS for depression.
Obsessive–compulsive disorder (OCD)
In August 2018, the US Food and Drug Administration (US FDA) authorized the use of TMS developed by the Israeli company Brainsway in the treatment of obsessive–compulsive disorder (OCD).
In 2020, US FDA authorized the use of TMS developed by the U.S. company MagVenture Inc. in the treatment of OCD.
In 2023, US FDA authorized the use of TMS developed by the U.S. company Neuronetics Inc. in the treatment of OCD.
The United Kingdom's National Institute for Health and Care Excellence (NICE) issues guidance to the National Health Service
(NHS) in England, Wales, Scotland and Northern Ireland (UK). NICE
guidance does not cover whether or not the NHS should fund a procedure.
Local NHS bodies (primary care trusts and hospital trusts)
make decisions about funding after considering the clinical
effectiveness of the procedure and whether the procedure represents
value for money for the NHS.
NICE evaluated TMS for severe depression (IPG 242) in 2007, and
subsequently considered TMS for reassessment in January 2011 but did not
change its evaluation. The Institute found that TMS is safe, but there is insufficient evidence for its efficacy.
In January 2014, NICE reported the results of an evaluation of
TMS for treating and preventing migraine (IPG 477). NICE found that
short-term TMS is safe but there is insufficient evidence to evaluate
safety for long-term and frequent uses. It found that evidence on the
efficacy of TMS for the treatment of migraine is limited in quantity,
that evidence for the prevention of migraine is limited in both quality
and quantity.
Subsequently, in 2015, NICE approved the use of TMS for the treatment of depression in the UK and IPG542 replaced IPG242.
NICE said "The evidence on repetitive transcranial magnetic stimulation
for depression shows no major safety concerns. The evidence on its
efficacy in the short-term is adequate, although the clinical response
is variable. Repetitive transcranial magnetic stimulation for depression
may be used with normal arrangements for clinical governance and
audit."
United States: commercial health insurance
In 2013, several commercial health insurance plans in the United States, including Anthem, Health Net, and Blue Cross Blue Shield of Nebraska and of Rhode Island, covered TMS for the treatment of depression for the first time. In contrast, UnitedHealthcare
issued a medical policy for TMS in 2013 that stated there is
insufficient evidence that the procedure is beneficial for health
outcomes in patients with depression. UnitedHealthcare noted that
methodological concerns raised about the scientific evidence studying
TMS for depression include small sample size, lack of a validated sham
comparison in randomized controlled studies, and variable uses of
outcome measures.
Other commercial insurance plans whose 2013 medical coverage policies
stated that the role of TMS in the treatment of depression and other
disorders had not been clearly established or remained investigational
included Aetna, Cigna and Regence.
United States: Medicare
Policies for Medicare coverage vary among local jurisdictions within the Medicare system, and Medicare coverage for TMS has varied among jurisdictions and with time. For example:
In early 2012 in New England, Medicare covered TMS for the first time in the United States. However, that jurisdiction later decided to end coverage after October, 2013.
In August 2012, the jurisdiction covering Arkansas, Louisiana,
Mississippi, Colorado, Texas, Oklahoma, and New Mexico determined that
there was insufficient evidence to cover the treatment,
but the same jurisdiction subsequently determined that Medicare would
cover TMS for the treatment of depression after December 2013.
Subsequently, some other Medicare jurisdictions added Medicare coverage for depression.
The Turing test, originally called the imitation game by Alan Turing in 1950, is a test of a machine's ability to exhibit intelligent behaviour
equivalent to, or indistinguishable from, that of a human. Turing
proposed that a human evaluator would judge natural language
conversations between a human and a machine designed to generate
human-like responses. The evaluator would be aware that one of the two
partners in conversation was a machine, and all participants would be
separated from one another. The conversation would be limited to a
text-only channel, such as a computer keyboard and screen, so the result
would not depend on the machine's ability to render words as speech.
If the evaluator could not reliably tell the machine from the human,
the machine would be said to have passed the test. The test results
would not depend on the machine's ability to give correct answers to questions,
only on how closely its answers resembled those a human would give.
Since the Turing test is a test of indistinguishability in performance
capacity, the verbal version generalizes naturally to all of human
performance capacity, verbal as well as nonverbal (robotic).
The test was introduced by Turing in his 1950 paper "Computing Machinery and Intelligence" while working at the University of Manchester. It opens with the words: "I propose to consider the question, 'Can machines think?'"
Because "thinking" is difficult to define, Turing chooses to "replace
the question by another, which is closely related to it and is expressed
in relatively unambiguous words."
Turing describes the new form of the problem in terms of a three-person
game called the "imitation game", in which an interrogator asks
questions of a man and a woman in another room in order to determine the
correct sex of the two players. Turing's new question is: "Are there
imaginable digital computers which would do well in the imitation game?"
This question, Turing believed, was one that could actually be
answered. In the remainder of the paper, he argued against all the major
objections to the proposition that "machines can think".
Since Turing introduced his test, it has been both highly
influential and widely criticized, and has become an important concept
in the philosophy of artificial intelligence. Philosopher John Searle would comment on the Turing test in his Chinese room argument, a thought experiment that stipulates that a machine cannot have a "mind", "understanding", or "consciousness",
regardless of how intelligently or human-like the program may make the
computer behave. Searle criticizes Turing's test and claims it is
insufficient to detect the presence of consciousness. Searle goes on to
dispute the notion that the mind (mental cognition) can exist outside of the body, a belief known as Cartesian dualism.
History
Philosophical background
The question of whether it is possible for machines to think has a long
history, which is firmly entrenched in the distinction between dualist and materialist views of the mind. René Descartes prefigures aspects of the Turing test in his 1637 Discourse on the Method when he writes:
[H]ow
many different automata or moving machines could be made by the
industry of man ... For we can easily understand a machine's being
constituted so that it can utter words, and even emit some responses to
action on it of a corporeal kind, which brings about a change in its
organs; for instance, if touched in a particular part it may ask what we
wish to say to it; if in another part it may exclaim that it is being
hurt, and so on. But it never happens that it arranges its speech in
various ways, in order to reply appropriately to everything that may be
said in its presence, as even the lowest type of man can do.
Here Descartes notes that automata
are capable of responding to human interactions but argues that such
automata cannot respond appropriately to things said in their presence
in the way that any human can. Descartes therefore prefigures the Turing
test by defining the insufficiency of appropriate linguistic response
as that which separates the human from the automaton. Descartes fails to
consider the possibility that future automata might be able to overcome
such insufficiency, and so does not propose the Turing test as such,
even if he prefigures its conceptual framework and criterion.
Denis Diderot formulates in his 1746 book Pensées philosophiques
a Turing-test criterion, though with the important implicit limiting
assumption maintained, of the participants being natural living beings,
rather than considering created artifacts:
If they find a parrot who could answer to everything, I would claim it to be an intelligent being without hesitation.
This does not mean he agrees with this, but that it was already a common argument of materialists at that time.
According to dualism, the mind is non-physical (or, at the very least, has non-physical properties)
and, therefore, cannot be explained in purely physical terms. According
to materialism, the mind can be explained physically, which leaves open
the possibility of minds that are produced artificially.
In 1936, philosopher Alfred Ayer considered the standard philosophical question of other minds: how do we know that other people have the same conscious experiences that we do? In his book, Language, Truth and Logic,
Ayer suggested a protocol to distinguish between a conscious man and an
unconscious machine: "The only ground I can have for asserting that an
object which appears to be conscious is not really a conscious being,
but only a dummy or a machine, is that it fails to satisfy one of the
empirical tests by which the presence or absence of consciousness is
determined."
(This suggestion is very similar to the Turing test, but it is not
certain that Ayer's popular philosophical classic was familiar to
Turing.) In other words, a thing is not conscious if it fails the
consciousness test.
Cultural background
Tests
where a human judges whether a computer or an alien is intelligent were
an established convention in science fiction by the 1940s, and it is
likely that Turing would have been aware of these. Stanley G. Weinbaum's "A Martian Odyssey" (1934) provides an example of how nuanced such tests could be.
Earlier examples of machines or automatons attempting to pass as human include the Ancient Greek myth of Pygmalion who creates a sculpture of a woman that is animated by Aphrodite, Carlo Collodi's novel The Adventures of Pinocchio, about a puppet who wants to become a real boy, and E. T. A. Hoffmann's 1816 story "The Sandman",
where the protagonist falls in love with an automaton. In all these
examples, people are fooled by artificial beings that - up to a point -
pass as human.
Alan Turing and the Imitation Game
Researchers
in the United Kingdom had been exploring "machine intelligence" for up
to ten years prior to the founding of the field of artificial
intelligence (AI) research in 1956. It was a common topic among the members of the Ratio Club, an informal group of British cybernetics and electronics researchers that included Alan Turing.
Turing, in particular, had been running the notion of machine intelligence since at least 1941 and one of the earliest-known mentions of "computer intelligence" was made by him in 1947. In Turing's report, "Intelligent Machinery", he investigated "the question of whether or not it is possible for machinery to show intelligent behaviour" and, as part of that investigation, proposed what may be considered the forerunner to his later tests:
It is not difficult to devise a paper machine which will play a not very bad game of chess.
Now get three men A, B and C as subjects for the experiment. A and C
are to be rather poor chess players, B is the operator who works the
paper machine. ... Two rooms are used with some arrangement for
communicating moves, and a game is played between C and either A or the
paper machine. C may find it quite difficult to tell which he is
playing.
"Computing Machinery and Intelligence" (1950)
was the first published paper by Turing to focus exclusively on machine
intelligence. Turing begins the 1950 paper with the claim, "I propose
to consider the question 'Can machines think?'" As he highlights, the traditional approach to such a question is to start with definitions,
defining both the terms "machine" and "think". Turing chooses not to do
so; instead he replaces the question with a new one, "which is closely
related to it and is expressed in relatively unambiguous words."
In essence he proposes to change the question from "Can machines
think?" to "Can machines do what we (as thinking entities) can do?"
The advantage of the new question, Turing argues, is that it draws "a
fairly sharp line between the physical and intellectual capacities of a
man."
To demonstrate this approach Turing proposes a test inspired by a party game,
known as the "imitation game", in which a man and a woman go into
separate rooms and guests try to tell them apart by writing a series of
questions and reading the typewritten answers sent back. In this game,
both the man and the woman aim to convince the guests that they are the
other. (Huma Shah argues that this two-human version of the game was
presented by Turing only to introduce the reader to the machine-human
question-answer test.) Turing described his new version of the game as follows:
We now ask the question, "What will happen when a machine
takes the part of A in this game?" Will the interrogator decide wrongly
as often when the game is played like this as he does when the game is
played between a man and a woman? These questions replace our original,
"Can machines think?"
Later in the paper, Turing suggests an "equivalent" alternative
formulation involving a judge conversing only with a computer and a man.
While neither of these formulations precisely matches the version of
the Turing test that is more generally known today, he proposed a third
in 1952. In this version, which Turing discussed in a BBC
radio broadcast, a jury asks questions of a computer and the role of
the computer is to make a significant proportion of the jury believe
that it is really a man.
Turing's paper considered nine putative objections, which include some of the major arguments against artificial intelligence that have been raised in the years since the paper was published (see "Computing Machinery and Intelligence").
ELIZA and PARRY
In 1966, Joseph Weizenbaum created a program which appeared to pass the Turing test. The program, known as ELIZA,
worked by examining a user's typed comments for keywords. If a keyword
is found, a rule that transforms the user's comments is applied, and the
resulting sentence is returned. If a keyword is not found, ELIZA
responds either with a generic riposte or by repeating one of the
earlier comments. In addition, Weizenbaum developed ELIZA to replicate the behaviour of a Rogerian psychotherapist, allowing ELIZA to be "free to assume the pose of knowing almost nothing of the real world."
With these techniques, Weizenbaum's program was able to fool some
people into believing that they were talking to a real person, with some
subjects being "very hard to convince that ELIZA [...] is not human." Thus, ELIZA is claimed by some to be one of the programs (perhaps the first) able to pass the Turing test, even though this view is highly contentious (see Naïveté of interrogators below).
Kenneth Colby created PARRY in 1972, a program described as "ELIZA with attitude". It attempted to model the behaviour of a paranoidschizophrenic,
using a similar (if more advanced) approach to that employed by
Weizenbaum. To validate the work, PARRY was tested in the early 1970s
using a variation of the Turing test. A group of experienced
psychiatrists analysed a combination of real patients and computers
running PARRY through teleprinters.
Another group of 33 psychiatrists were shown transcripts of the
conversations. The two groups were then asked to identify which of the
"patients" were human and which were computer programs.
The psychiatrists were able to make the correct identification only 52
percent of the time – a figure consistent with random guessing.
In the 21st century, versions of these programs (now known as "chatbots") continue to fool people. "CyberLover", a malware
program, preys on Internet users by convincing them to "reveal
information about their identities or to lead them to visit a web site
that will deliver malicious content to their computers".
The program has emerged as a "Valentine-risk" flirting with people
"seeking relationships online in order to collect their personal data".
John Searle's 1980 paper Minds, Brains, and Programs proposed the "Chinese room"
thought experiment and argued that the Turing test could not be used to
determine if a machine could think. Searle noted that software (such as
ELIZA) could pass the Turing test simply by manipulating symbols of
which they had no understanding. Without understanding, they could not
be described as "thinking" in the same sense people did. Therefore,
Searle concluded, the Turing test could not prove that machines could
think. Much like the Turing test itself, Searle's argument has been both widely criticised and endorsed.
Arguments such as Searle's and others working on the philosophy of mind
sparked off a more intense debate about the nature of intelligence, the
possibility of machines with a conscious mind and the value of the
Turing test that continued through the 1980s and 1990s.
The Loebner Prize provides an annual platform for practical Turing tests with the first competition held in November 1991. It is underwritten by Hugh Loebner. The Cambridge Center for Behavioral Studies in Massachusetts,
United States, organised the prizes up to and including the 2003
contest. As Loebner described it, one reason the competition was created
is to advance the state of AI research, at least in part, because no
one had taken steps to implement the Turing test despite 40 years of
discussing it.
The first Loebner Prize competition in 1991 led to a renewed
discussion of the viability of the Turing test and the value of pursuing
it, in both the popular press and academia.
The first contest was won by a mindless program with no identifiable
intelligence that managed to fool naïve interrogators into making the
wrong identification. This highlighted several of the shortcomings of
the Turing test (discussed below): The winner won, at least in part, because it was able to "imitate human typing errors"; the unsophisticated interrogators were easily fooled; and some researchers in AI have been led to feel that the test is merely a distraction from more fruitful research.
The silver (text only) and gold (audio and visual) prizes have
never been won. However, the competition has awarded the bronze medal
every year for the computer system that, in the judges' opinions,
demonstrates the "most human" conversational behaviour among that year's
entries. Artificial Linguistic Internet Computer Entity (A.L.I.C.E.) has won the bronze award on three occasions in recent times (2000, 2001, 2004). Learning AI Jabberwacky won in 2005 and 2006.
The Loebner Prize tests conversational intelligence; winners are typically chatterbot programs, or Artificial Conversational Entities (ACE)s. Early Loebner Prize rules restricted conversations: Each entry and hidden-human conversed on a single topic,
thus the interrogators were restricted to one line of questioning per
entity interaction. The restricted conversation rule was lifted for the
1995 Loebner Prize. Interaction duration between judge and entity has
varied in Loebner Prizes. In Loebner 2003, at the University of Surrey,
each interrogator was allowed five minutes to interact with an entity,
machine or hidden-human. Between 2004 and 2007, the interaction time
allowed in Loebner Prizes was more than twenty minutes.
In June 2022 the Google
LaMDA (Language Model for Dialog Applications) chatbot received
widespread coverage regarding claims about it having achieved sentience.
Initially in an article in The Economist
Google Research Fellow Blaise Agüera y Arcas said the chatbot had
demonstrated a degree of understanding of social relationships. Several days later, Google engineer Blake Lemoine claimed in an interview with the Washington Post
that LaMDA had achieved sentience. Lemoine had been placed on leave by
Google for internal assertions to this effect. Agüera y Arcas (a Google
Vice President) and Jen Gennai (head of Responsible Innovation) had
investigated the claims but dismissed them.
Lemoine's assertion was roundly rejected by other experts in the field,
pointing out that a language model appearing to mimic human
conversation does not indicate that any intelligence is present behind
it,
despite seeming to pass the Turing test. Widespread discussion from
proponents for and against the claim that LaMDA has reached sentience
has sparked discussion across social-media platforms, to include
defining the meaning of sentience as well as what it means to be human.
OpenAI's chatbot, ChatGPT, released in November 2022, is based on GPT-3.5 and GPT-4large language models. Celeste Biever wrote in a Nature article that "ChatGPT broke the Turing test".
Stanford researchers reported that ChatGPT passes the test; they found
that ChatGPT-4 "passes a rigorous Turing test, diverging from average
human behavior chiefly to be more cooperative."
Versions
Saul Traiger argues that there are at least three primary versions of
the Turing test, two of which are offered in "Computing Machinery and
Intelligence" and one that he describes as the "Standard
Interpretation".
While there is some debate regarding whether the "Standard
Interpretation" is that described by Turing or, instead, based on a
misreading of his paper, these three versions are not regarded as
equivalent, and their strengths and weaknesses are distinct.
Turing's original article describes a simple party game involving
three players. Player A is a man, player B is a woman and player C (who
plays the role of the interrogator) is of either gender. In the
imitation game, player C is unable to see either player A or player B,
and can communicate with them only through written notes. By asking
questions of player A and player B, player C tries to determine which of
the two is the man and which is the woman. Player A's role is to trick
the interrogator into making the wrong decision, while player B attempts
to assist the interrogator in making the right one.
Turing then asks:
"What will happen when a machine takes the part of A in
this game? Will the interrogator decide wrongly as often when the game
is played like this as he does when the game is played between a man and
a woman?" These questions replace our original, "Can machines think?"
The second version appeared later in Turing's 1950 paper. Similar to
the original imitation game test, the role of player A is performed by a
computer. However, the role of player B is performed by a man rather
than a woman.
Let us fix our attention on one particular digital computer C.
Is it true that by modifying this computer to have an adequate storage,
suitably increasing its speed of action, and providing it with an
appropriate programme, C can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?
In this version, both player A (the computer) and player B are trying
to trick the interrogator into making an incorrect decision.
The standard interpretation is not included in the original
paper, but is both accepted and debated.
Common understanding has it that the purpose of the Turing test is not
specifically to determine whether a computer is able to fool an
interrogator into believing that it is a human, but rather whether a
computer could imitate a human. While there is some dispute whether this interpretation was intended by Turing, Sterrett believes that it was and thus conflates the second version with this one, while others, such as Traiger, do not –
this has nevertheless led to what can be viewed as the "standard
interpretation". In this version, player A is a computer and player B a
person of either sex. The role of the interrogator is not to determine
which is male and which is female, but which is a computer and which is a
human.
The fundamental issue with the standard interpretation is that the
interrogator cannot differentiate which responder is human, and which is
machine. There are issues about duration, but the standard
interpretation generally considers this limitation as something that
should be reasonable.
Interpretations
Controversy has arisen over which of the alternative formulations of the test Turing intended. Sterrett argues that two distinct tests can be extracted from his 1950 paper and that, pace
Turing's remark, they are not equivalent. The test that employs the
party game and compares frequencies of success is referred to as the
"Original Imitation Game Test", whereas the test consisting of a human
judge conversing with a human and a machine is referred to as the
"Standard Turing Test", noting that Sterrett equates this with the
"standard interpretation" rather than the second version of the
imitation game. Sterrett agrees that the standard Turing test (STT) has
the problems that its critics cite but feels that, in contrast, the
original imitation game test (OIG test) so defined is immune to many of
them, due to a crucial difference: Unlike the STT, it does not make
similarity to human performance the criterion, even though it employs
human performance in setting a criterion for machine intelligence. A man
can fail the OIG test, but it is argued that it is a virtue of a test
of intelligence that failure indicates a lack of resourcefulness: The
OIG test requires the resourcefulness associated with intelligence and
not merely "simulation of human conversational behaviour". The general
structure of the OIG test could even be used with non-verbal versions of
imitation games.
According to Huma Shah, Turing himself was concerned with whether
a machine could think and was providing a simple method to examine
this: through human-machine question-answer sessions.
Shah argues the imitation game which Turing described could be
practicalized in two different ways: a) one-to-one interrogator-machine
test, and b) simultaneous comparison of a machine with a human, both
questioned in parallel by an interrogator.
Still other writers
have interpreted Turing as proposing that the imitation game itself is
the test, without specifying how to take into account Turing's statement
that the test that he proposed using the party version of the imitation
game is based upon a criterion of comparative frequency of success in
that imitation game, rather than a capacity to succeed at one round of
the game.
Some writers argue that the imitation game is best understood by its
social aspects. In his 1948 paper, Turing refers to intelligence as an
"emotional concept", and notes that
The extent to
which we regard something as behaving in an intelligent manner is
determined as much by our own state of mind and training as by the
properties of the object under consideration. If we are able to explain
and predict its behaviour or if there seems to be little underlying
plan, we have little temptation to imagine intelligence. With the same
object therefore it is possible that one man would consider it as
intelligent and another would not; the second man would have found out
the rules of its behaviour.
Following this remark and similar ones scattered throughout Turing's publications, Diane Proudfoot claims that Turing held a response-dependence approach to intelligence, according to which an intelligent (or thinking) entity is one that appears
intelligent to an average interrogator. Bernardo Gonçalves shows that
although Turing used the rhetoric of introducing his test as a sort of
crucial experiment to decide whether machines can be said to think,
the actual presentation of his test satisfies well-known properties of
thought experiments in the modern scientific tradition of Galileo. Shlomo Danziger
promotes a socio-technological interpretation, according to which
Turing saw the imitation game not as an intelligence test but as a
technological aspiration - one whose realization would likely involve a
change in society's attitude toward machines. According to this reading,
Turing's celebrated 50-year prediction - that by the end of the 20th
century his test will be passed by some machine - actually consists of
two distinguishable predictions. The first is a technological
prediction:
I believe that in about fifty years' time
it will be possible to programme computers ... to make them play the
imitation game so well that an average interrogator will not have more
than 70% chance of making the right identification after five minutes of
questioning.
The second prediction Turing makes is a sociological one:
I
believe that at the end of the century the use of words and general
educated opinion will have altered so much that one will be able to
speak of machines thinking without expecting to be contradicted.
Danziger
claims further that for Turing, alteration of society's attitude
towards machinery is a prerequisite for the existence of intelligent
machines: Only when the term "intelligent machine" is no longer seen as
an oxymoron the existence of intelligent machines would become logically possible.
Saygin has suggested that maybe the original game is a way of
proposing a less biased experimental design as it hides the
participation of the computer.
The imitation game also includes a "social hack" not found in the
standard interpretation, as in the game both computer and male human are
required to play as pretending to be someone they are not.
Should the interrogator know about the computer?
A
crucial piece of any laboratory test is that there should be a control.
Turing never makes clear whether the interrogator in his tests is aware
that one of the participants is a computer. He states only that player A
is to be replaced with a machine, not that player C is to be made aware
of this replacement.
When Colby, FD Hilf, S Weber and AD Kramer tested PARRY, they did so by
assuming that the interrogators did not need to know that one or more
of those being interviewed was a computer during the interrogation. As Ayse Saygin, Peter Swirski, and others have highlighted, this makes a big difference to the implementation and outcome of the test. An experimental study looking at Gricean maxim violations
using transcripts of Loebner's one-to-one (interrogator-hidden
interlocutor) Prize for AI contests between 1994 and 1999, Ayse Saygin
found significant differences between the responses of participants who
knew and did not know about computers being involved.
Strengths
Tractability and simplicity
The power and appeal of the Turing test derives from its simplicity. The philosophy of mind, psychology, and modern neuroscience
have been unable to provide definitions of "intelligence" and
"thinking" that are sufficiently precise and general to be applied to
machines. Without such definitions, the central questions of the philosophy of artificial intelligence
cannot be answered. The Turing test, even if imperfect, at least
provides something that can actually be measured. As such, it is a
pragmatic attempt to answer a difficult philosophical question.
Breadth of subject matter
The
format of the test allows the interrogator to give the machine a wide
variety of intellectual tasks. Turing wrote that "the question and
answer method seems to be suitable for introducing almost any one of the
fields of human endeavour that we wish to include." John Haugeland adds that "understanding the words is not enough; you have to understand the topic as well."
To pass a well-designed Turing test, the machine must use natural language, reason, have knowledge and learn.
The test can be extended to include video input, as well as a "hatch"
through which objects can be passed: this would force the machine to
demonstrate skilled use of well designed vision and robotics as well. Together, these represent almost all of the major problems that artificial intelligence research would like to solve.
The Feigenbaum test
is designed to take advantage of the broad range of topics available to
a Turing test. It is a limited form of Turing's question-answer game
which compares the machine against the abilities of experts in specific
fields such as literature or chemistry.
Emphasis on emotional and aesthetic intelligence
As
a Cambridge honours graduate in mathematics, Turing might have been
expected to propose a test of computer intelligence requiring expert
knowledge in some highly technical field, and thus anticipating a more recent approach to the subject.
Instead, as already noted, the test which he described in his seminal
1950 paper requires the computer to be able to compete successfully in a
common party game, and this by performing as well as the typical man in
answering a series of questions so as to pretend convincingly to be the
woman contestant.
Given the status of human sexual dimorphism as one of the most ancient of subjects,
it is thus implicit in the above scenario that the questions to be
answered will involve neither specialised factual knowledge nor
information processing technique. The challenge for the computer,
rather, will be to demonstrate empathy for the role of the female, and
to demonstrate as well a characteristic aesthetic sensibility—both of
which qualities are on display in this snippet of dialogue which Turing
has imagined:
Interrogator: Will X please tell me the length of his or her hair?
Contestant: My hair is shingled, and the longest strands are about nine inches long.
When Turing does introduce some specialised knowledge into one of his
imagined dialogues, the subject is not maths or electronics, but
poetry:
Interrogator: In the first line of your sonnet which reads,
"Shall I compare thee to a summer's day," would not "a spring day" do as
well or better?
Interrogator: How about "a winter's day." That would scan all right.
Witness: Yes, but nobody wants to be compared to a winter's day.
Turing thus once again demonstrates his interest in empathy and
aesthetic sensitivity as components of an artificial intelligence; and
in light of an increasing awareness of the threat from an AI run amok, it has been suggested
that this focus perhaps represents a critical intuition on Turing's
part, i.e., that emotional and aesthetic intelligence will play a key
role in the creation of a "friendly AI".
It is further noted, however, that whatever inspiration Turing might be
able to lend in this direction depends upon the preservation of his
original vision, which is to say, further, that the promulgation of a
"standard interpretation" of the Turing test—i.e., one which focuses on a
discursive intelligence only—must be regarded with some caution.
Weaknesses
Turing did not explicitly state that the Turing test could be used as a measure of "intelligence",
or any other human quality. He wanted to provide a clear and
understandable alternative to the word "think", which he could then use
to reply to criticisms of the possibility of "thinking machines" and to
suggest ways that research might move forward.
Nevertheless, the Turing test has been proposed as a measure of a
machine's "ability to think" or its "intelligence". This proposal has
received criticism from both philosophers and computer scientists. The
interpretation makes the assumption that an interrogator can determine
if a machine is "thinking" by comparing its behaviour with human
behaviour. Every element of this assumption has been questioned: the
reliability of the interrogator's judgement, the value of comparing the
machine with a human, and the value of comparing only behaviour. Because
of these and other considerations, some AI researchers have questioned
the relevance of the test to their field.
Naïveté of interrogators
In
practice, the test's results can easily be dominated not by the
computer's intelligence, but by the attitudes, skill, or naïveté of the
questioner. Numerous experts in the field, including cognitive scientist
Gary Marcus, insist that the Turing test only shows how easy it is to fool humans and is not an indication of machine intelligence.
Turing doesn't specify the precise skills and knowledge required
by the interrogator in his description of the test, but he did use the
term "average interrogator": "[the] average interrogator would not have
more than 70 per cent chance of making the right identification after
five minutes of questioning".
Chatterbot programs such as ELIZA have repeatedly fooled
unsuspecting people into believing that they are communicating with
human beings. In these cases, the "interrogators" are not even aware of
the possibility that they are interacting with computers. To
successfully appear human, there is no need for the machine to have any
intelligence whatsoever and only a superficial resemblance to human
behaviour is required.
Early Loebner Prize competitions used "unsophisticated" interrogators who were easily fooled by the machines.
Since 2004, the Loebner Prize organisers have deployed philosophers,
computer scientists, and journalists among the interrogators.
Nonetheless, some of these experts have been deceived by the machines.
One interesting feature of the Turing test is the frequency of the confederate effect,
when the confederate (tested) humans are misidentified by the
interrogators as machines. It has been suggested that what interrogators
expect as human responses is not necessarily typical of humans. As a
result, some individuals can be categorised as machines. This can
therefore work in favour of a competing machine. The humans are
instructed to "act themselves", but sometimes their answers are more
like what the interrogator expects a machine to say. This raises the question of how to ensure that the humans are motivated to "act human".
Human intelligence vs. intelligence in general
The Turing test does not directly test whether the computer behaves
intelligently. It tests only whether the computer behaves like a human
being. Since human behaviour and intelligent behaviour are not exactly
the same thing, the test can fail to accurately measure intelligence in
two ways:
Some human behaviour is unintelligent
The Turing test requires that the machine be able to execute all
human behaviours, regardless of whether they are intelligent. It even
tests for behaviours that may not be considered intelligent at all, such
as the susceptibility to insults, the temptation to lie or, simply, a high frequency of typing mistakes. If a machine cannot imitate these unintelligent behaviours in detail it fails the test.
This objection was raised by The Economist, in an article entitled "artificial stupidity"
published shortly after the first Loebner Prize competition in 1992.
The article noted that the first Loebner winner's victory was due, at
least in part, to its ability to "imitate human typing errors." Turing himself had suggested that programs add errors into their output, so as to be better "players" of the game.
Some intelligent behaviour is inhuman
The Turing test does not test for highly intelligent behaviours,
such as the ability to solve difficult problems or come up with original
insights. In fact, it specifically requires deception on the part of
the machine: if the machine is more intelligent than a human
being it must deliberately avoid appearing too intelligent. If it were
to solve a computational problem that is practically impossible for a
human to solve, then the interrogator would know the program is not
human, and the machine would fail the test.
Because it cannot measure intelligence that is beyond the
ability of humans, the test cannot be used to build or evaluate systems
that are more intelligent than humans. Because of this, several test
alternatives that would be able to evaluate super-intelligent systems
have been proposed.
The Turing test is concerned strictly with how the subject acts – the external behaviour of the machine. In this regard, it takes a behaviourist or functionalist approach to the study of the mind. The example of ELIZA
suggests that a machine passing the test may be able to simulate human
conversational behaviour by following a simple (but large) list of
mechanical rules, without thinking or having a mind at all.
John Searle
has argued that external behaviour cannot be used to determine if a
machine is "actually" thinking or merely "simulating thinking." His Chinese room
argument is intended to show that, even if the Turing test is a good
operational definition of intelligence, it may not indicate that the
machine has a mind, consciousness, or intentionality. (Intentionality is a philosophical term for the power of thoughts to be "about" something.)
Turing anticipated this line of criticism in his original paper, writing:
I
do not wish to give the impression that I think there is no mystery
about consciousness. There is, for instance, something of a paradox
connected with any attempt to localise it. But I do not think these
mysteries necessarily need to be solved before we can answer the
question with which we are concerned in this paper.
Impracticality and irrelevance: the Turing test and AI research
Mainstream AI researchers argue that trying to pass the Turing test is merely a distraction from more fruitful research. Indeed, the Turing test is not an active focus of much academic or commercial effort—as Stuart Russell and Peter Norvig write: "AI researchers have devoted little attention to passing the Turing test." There are several reasons.
First, there are easier ways to test their programs. Most current
research in AI-related fields is aimed at modest and specific goals,
such as object recognition or logistics.
To test the intelligence of the programs that solve these problems, AI
researchers simply give them the task directly. Stuart Russell and Peter
Norvig suggest an analogy with the history of flight: Planes are tested by how well they fly, not by comparing them to birds. "Aeronautical engineering texts," they write, "do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons.'"
Second, creating lifelike simulations of human beings is a
difficult problem on its own that does not need to be solved to achieve
the basic goals of AI research. Believable human characters may be
interesting in a work of art, a game, or a sophisticated user interface,
but they are not part of the science of creating intelligent machines,
that is, machines that solve problems using intelligence.
Turing did not intend for his idea to be used to test the
intelligence of programs—he wanted to provide a clear and understandable
example to aid in the discussion of the philosophy of artificial intelligence. John McCarthy
argues that we should not be surprised that a philosophical idea turns
out to be useless for practical applications. He observes that the
philosophy of AI is "unlikely to have any more effect on the practice of
AI research than philosophy of science generally has on the practice of
science."
The Language-centric Objection
Another
well known objection raised towards the Turing Test concerns its
exclusive focus on the linguistic behaviour (i.e. it is only a
"language-based" experiment, while all the other cognitive faculties are
not tested). This drawback downsizes the role of other
modality-specific "intelligent abilities" concerning human beings that
the psychologist Howard Gardner, in his "multiple intelligence theory", proposes to consider (verbal-linguistic abilities are only one of those).
Silence
A
critical aspect of the Turing test is that a machine must give itself
away as being a machine by its utterances. An interrogator must then
make the "right identification" by correctly identifying the machine as
being just that. If however a machine remains silent during a
conversation, then it is not possible for an interrogator to accurately
identify the machine other than by means of a calculated guess.
Even taking into account a parallel/hidden human as part of the test may
not help the situation as humans can often be misidentified as being a
machine.
The Turing Trap
By focusing on imitating
humans, rather than augmenting or extending human capabilities, the
Turing Test risks directing research and implementation toward
technologies that substitute for humans and thereby drive down wages and
income for workers. As they lose economic power, these workers may also
lose political power, making it more difficult for them to change the
allocation of wealth and income. This can trap them in a bad
equilibrium. Erik Brynjolfsson has called this "The Turing Trap" and argued that there are currently excess incentives for creating machines that imitate rather than augment humans.
Variations
Numerous other versions of the Turing test, including those expounded above, have been raised through the years.
A modification of the Turing test wherein the objective of one or
more of the roles have been reversed between machines and humans is
termed a reverse Turing test. An example is implied in the work of
psychoanalyst Wilfred Bion, who was particularly fascinated by the "storm" that resulted from the encounter of one mind by another. In his 2000 book, among several other original points with regard to the Turing test, literary scholar Peter Swirski
discussed in detail the idea of what he termed the Swirski
test—essentially the reverse Turing test. He pointed out that it
overcomes most if not all standard objections levelled at the standard
version.
Carrying this idea forward, R. D. Hinshelwood
described the mind as a "mind recognizing apparatus". The challenge
would be for the computer to be able to determine if it were interacting
with a human or another computer. This is an extension of the original
question that Turing attempted to answer but would, perhaps, offer a
high enough standard to define a machine that could "think" in a way
that we typically define as characteristically human.
CAPTCHA
is a form of reverse Turing test. Before being allowed to perform some
action on a website, the user is presented with alphanumerical
characters in a distorted graphic image and asked to type them out. This
is intended to prevent automated systems from being used to abuse the
site. The rationale is that software sufficiently sophisticated to read
and reproduce the distorted image accurately does not exist (or is not
available to the average user), so any system able to do so is likely to
be a human.
Software that could reverse CAPTCHA with some accuracy by
analysing patterns in the generating engine started being developed soon
after the creation of CAPTCHA.
In 2013, researchers at Vicarious announced that they had developed a system to solve CAPTCHA challenges from Google, Yahoo!, and PayPal up to 90% of the time.
In 2014, Google engineers demonstrated a system that could defeat CAPTCHA challenges with 99.8% accuracy.
In 2015, Shuman Ghosemajumder, former click fraud czar of Google, stated that there were cybercriminal sites that would defeat CAPTCHA challenges for a fee, to enable various forms of fraud.
Distinguishing accurate use of language from actual understanding
A
further variation is motivated by the concern that modern Natural
Language Processing prove to be highly successful in generating text on
the basis of a huge text corpus and could eventually pass the Turing
test simply by manipulating words and sentences that have been used in
the initial training of the model. Since the interrogator has no precise
understanding of the training data, the model might simply be returning
sentences that exist in similar fashion in the enormous amount of
training data. For this reason, Arthur Schwaninger proposes a variation of the Turing test that can distinguish between systems that are only capable of using language and systems that understand
language. He proposes a test in which the machine is confronted with
philosophical questions that do not depend on any prior knowledge and
yet require self-reflection to be answered appropriately.
Another variation is described as the subject-matter expert
Turing test, where a machine's response cannot be distinguished from an
expert in a given field. This is also known as a "Feigenbaum test" and
was proposed by Edward Feigenbaum in a 2003 paper.
"Low-level" cognition test
Robert French
(1990) makes the case that an interrogator can distinguish human and
non-human interlocutors by posing questions that reveal the low-level
(i.e., unconscious) processes of human cognition, as studied by cognitive science.
Such questions reveal the precise details of the human embodiment of
thought and can unmask a computer unless it experiences the world as
humans do.
Total Turing test
The "Total Turing test" variation of the Turing test, proposed by cognitive scientist Stevan Harnad,
adds two further requirements to the traditional Turing test. The
interrogator can also test the perceptual abilities of the subject
(requiring computer vision) and the subject's ability to manipulate objects (requiring robotics).
Electronic health records
A letter published in Communications of the ACM
describes the concept of generating a synthetic patient population and
proposes a variation of Turing test to assess the difference between
synthetic and real patients. The letter states: "In the EHR context,
though a human physician can readily distinguish between synthetically
generated and real live human patients, could a machine be given the
intelligence to make such a determination on its own?" and further the
letter states: "Before synthetic patient identities become a public
health problem, the legitimate EHR market might benefit from applying
Turing Test-like techniques to ensure greater data reliability and
diagnostic value. Any new techniques must thus consider patients'
heterogeneity and are likely to have greater complexity than the Allen
eighth-grade-science-test is able to grade."
The minimum intelligent signal test was proposed by Chris McKinstry as "the maximum abstraction of the Turing test",
in which only binary responses (true/false or yes/no) are permitted, to
focus only on the capacity for thought. It eliminates text chat
problems like anthropomorphism bias, and does not require emulation of unintelligent human behaviour,
allowing for systems that exceed human intelligence. The questions must
each stand on their own, however, making it more like an IQ test
than an interrogation. It is typically used to gather statistical data
against which the performance of artificial intelligence programs may be
measured.
Hutter Prize
The organisers of the Hutter Prize believe that compressing natural language text is a hard AI problem, equivalent to passing the Turing test.
The data compression test has some advantages over most versions and variations of a Turing test, including:
It gives a single number that can be directly used to compare which of two machines is "more intelligent."
It does not require the computer to lie to the judge
The main disadvantages of using data compression as a test are:
It is not possible to test humans this way.
It is unknown what particular "score" on this test—if any—is equivalent to passing a human-level Turing test.
Other tests based on compression or Kolmogorov complexity
A
related approach to Hutter's prize which appeared much earlier in the
late 1990s is the inclusion of compression problems in an extended
Turing test. or by tests which are completely derived from Kolmogorov complexity.
Other related tests in this line are presented by Hernandez-Orallo and Dowe.
Algorithmic IQ, or AIQ for short, is an attempt to convert the
theoretical Universal Intelligence Measure from Legg and Hutter (based
on Solomonoff's inductive inference) into a working practical test of machine intelligence.
Two major advantages of some of these tests are their
applicability to nonhuman intelligences and their absence of a
requirement for human testers.
Ebert test
The Turing test inspired the Ebert test proposed in 2011 by film critic Roger Ebert which is a test whether a computer-based synthesised voice has sufficient skill in terms of intonations, inflections, timing and so forth, to make people laugh.
Social Turing Game
Taking advantage of Large Language Models, in 2023 the research company AI21 Labs created an online social experiment titled "Human or Not?". It was played more than 10 million times by more than 2 million people.
It is the biggest Turing-style experiment to that date. The results
showed that 32% of people couldn't distinguish between humans and
machines.
Conferences
Turing Colloquium
1990
marked the fortieth anniversary of the first publication of Turing's
"Computing Machinery and Intelligence" paper, and saw renewed interest
in the test. Two significant events occurred in that year: the first was
the Turing Colloquium, which was held at the University of Sussex
in April, and brought together academics and researchers from a wide
variety of disciplines to discuss the Turing test in terms of its past,
present, and future; the second was the formation of the annual Loebner Prize competition.
Blay Whitby
lists four major turning points in the history of the Turing test – the
publication of "Computing Machinery and Intelligence" in 1950, the
announcement of Joseph Weizenbaum's ELIZA in 1966, Kenneth Colby's creation of PARRY, which was first described in 1972, and the Turing Colloquium in 1990.