The philosophy of computer science is concerned with the philosophical questions that arise within the study of computer science. There is still no common understanding of the content, aims, focus, or topics of the philosophy of computer science, despite some attempts to develop a philosophy of computer science like the philosophy of physics or the philosophy of mathematics.
Due to the abstract nature of computer programs and the technological
ambitions of computer science, many of the conceptual questions of the
philosophy of computer science are also comparable to the philosophy of science, philosophy of mathematics, and the philosophy of technology.
Overview
Many of the central philosophical questions of computer science are
centered on the logical, ethical, methodological, ontological and
epistemological issues that concern it. Some of these questions may include:
How do ethics impact real-world applications of computers?
Computation
The question of "What is computation?" remains a central question in
relation to the philosophy of computer science. According to Nir Fresco,
deciphering what a computation is requires distinguishing between
computation and non-computational processes. Fresco identifies three
main perspectives.
The first view is the semantic view. Supporters of this view hold
that computations are viewed as internal processes occurring within a
computing mechanism. Advocates have argued that computation involves
manipulating symbol structures, content, and truth-preserving rules.
However, this view has been criticized for depending on human
interpretations rather than the inherent qualities of technology.
The second view is the causal view. According to this
perspective, computations are defined by their causal characteristics. A
system executes a calculation when the transformations in its physical
state correspond to the structure of an abstract algorithm. This theory
relates computations to cause-and-effect relationships between system
components. Thus, it is linked to physical causation rather than
semantic meaning.
The third view is the functional view. In this view, computation
is distinguished by its functional characteristics, or the functions and
relationships of its component pieces. According to this perspective,
the organization of a mechanism's parts to do particular tasks is more
important than whether the symbols have external meaning.
These various perspectives illustrate the ongoing debate about
the meaning of computation and whether it is defined by symbolic
meaning, physical causation, or functional organization.
Church–Turing thesis
The Church–Turing thesis and its variations are central to the theory of computation.
Since, as an informal notion, the concept of effective calculability
does not have a formal definition, the thesis, although it has
near-universal acceptance, cannot be formally proven. The implications
of this thesis is also of philosophical concern. Philosophers have
interpreted the Church–Turing thesis as having implications for the philosophy of mind.
Turing's Halting Problem
Another major concept in computer philosophy is Turing's Halting
Problem. This problem concerns whether it is possible to write a program
that can determine if another program will run continuously or
terminate. It is widely accepted as an undecidable problem (a problem
with a solution that cannot be found through an algorithm). This result
established that there are limits to computer computations. Although
the idea is often attributed to Alan Turing's 1936 paper On Computable
Numbers, recent research indicates that the term and its modern
formulation appeared later. The expression halting problem was first
used and formally stated by Martin Davis in his 1958 book Computability
and Unsolvability.
P versus NP problem
The P versus NP problem is an unsolved problem in computer science and mathematics. It asks whether every problem whose solution can be verified in polynomial time (and so defined to belong to the class NP) can also be solved in polynomial time (and so defined to belong to the class P). Most computer scientists believe that P ≠ NP. Apart from the reason that after decades of studying these problems no
one has been able to find a polynomial-time algorithm for any of more
than 3000 important known NP-complete problems, philosophical reasons that concern its implications may have motivated this belief.
If P = NP, then the world would be a profoundly
different place than we usually assume it to be. There would be no
special value in "creative leaps", no fundamental gap between solving a
problem and recognizing the solution once it's found. Everyone who could
appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss.
Computer Ethics
Computer ethics shapes the way that computers interact and how they
are implemented in real-world scenarios. These ethics address issues
such as user privacy, security, and professional responsibility. These
topics are subject to scholarly discussion and professional debate.
Scholars and technology innovators have long debated the
existence of privacy online. Samuel D. Warren and Louis D. Brandeis
argued in an academic paper that they believe that with the new age of
technology, a right to privacy is necessary; however, others believe
that privacy is a promise that can never be fulfilled. In Sun
Microsystems' CEO Scott McNealy's words, "Privacy is dead. Get over it."
The introduction of computer technology raises many issues regarding
privacy. These issues can range from intentionally malicious actions,
such as spreading information against a user's will, to innocent
mistakes, such as accidentally releasing information to the public when
it was meant to be private. There are also discussions related to the
ethicality of keeping important information private. The conclusion of
such becomes ambivalent when discussing the privacy of individual users
versus governmental bodies. In the case of individual users, choosing
not to reveal information is seen as ethical. In the case of government
entities, choosing not to disclose information can be seen as harmful.
Security focuses on protecting the systems and data of users from
unauthorized access or harm. Malicious software is typically the center
of discussion for computer security. While it is generally considered
unethical to intentionally spread malicious software, such as computer
viruses, there is debate about whether users have an ethical
responsibility to ensure the security of their own systems. Users who
fail to protect their own computer systems expose other computer users
to risk. There are also discussions of cases where the distribution of
viruses is ethical. An example of this is when a virus is spread to
expose a weakness in the protection of computer systems.
Professional ethics addresses the responsibilities and duties of
software developers. Bugs in software can cause system failures. This
can range from minor annoyances to severe, real-life consequences for
the user. While it is generally accepted that bugs should not be
included in software and that it is the developer's responsibility to
correct them, bug-free software is rarely obtainable. Commonly,
developers release software with bugs they deem less important than
other, more critical issues. If important bugs are discovered, software
patches are distributed throughout the software's lifespan. Even though
this is quite common in practice, it raises ethical questions. Scholars
debate the extent to which releasing a product with known issues is
acceptable.
The philosophy of artificial intelligence is a branch of the philosophy of mind and the philosophy of computer science that explores artificial intelligence and its implications for knowledge and understanding of intelligence, ethics, consciousness, epistemology, and free will. Furthermore, the technology is concerned with the creation of
artificial animals or artificial people (or, at least, artificial
creatures; see artificial life) so the discipline is of considerable interest to philosophers. These factors contributed to the emergence of the philosophy of artificial intelligence.
The philosophy of artificial intelligence attempts to answer such questions as follows:
Can a machine act intelligently? Can it solve any problem that a person would solve by thinking?
Are human intelligence and machine intelligence the same? Is the human brain essentially a computer?
Can a machine have a mind, mental states, and consciousness in the same sense that a human being can? Can it feel how things are? (i.e. does it have qualia?)
Questions like these reflect the divergent interests of AI researchers, cognitive scientists and philosophers
respectively. The scientific answers to these questions depend on the
definition of "intelligence" and "consciousness" and exactly which
"machines" are under discussion.
Important propositions in the philosophy of AI include some of the following:
Turing's "polite convention": If a machine behaves as intelligently as a human being, then it is as intelligent as a human being.
The Dartmouth proposal:
"Every aspect of learning or any other feature of intelligence can in
principle be so precisely described that a machine can be made to
simulate it."
John Searle's strong AI hypothesis:
"The appropriately programmed computer with the right inputs and
outputs would thereby have a mind in exactly the same sense human beings
have minds."
Hobbes'
mechanism: "For 'reason' ... is nothing but 'reckoning,' that is adding
and subtracting, of the consequences of general names agreed upon for
the 'marking' and 'signifying' of our thoughts..."
Can a machine display general intelligence?
Is it possible to create a machine that can solve all the
problems humans solve using their intelligence? This question defines
the scope of what machines could do in the future and guides the
direction of AI research. It only concerns the behavior of machines and ignores the issues of interest to psychologists, cognitive scientists and philosophers, evoking the question: does it matter whether a machine is really thinking, as a person thinks, rather than just producing outcomes that appear to result from thinking?
The basic position of most AI researchers is summed up in this statement, which appeared in the proposal for the Dartmouth workshop of 1956:
"Every aspect of learning or any other feature of intelligence
can in principle be so precisely described that a machine can be made to
simulate it."
Arguments against the basic premise must show that building a working
AI system is impossible because there is some practical limit to the
abilities of computers or that there is some special quality of the
human mind that is necessary for intelligent behavior and yet cannot be
duplicated by a machine (or by the methods of current AI research).
Arguments in favor of the basic premise must show that such a system is
possible.
It is also possible to sidestep the connection between the two
parts of the above proposal. For instance, machine learning, beginning
with Turing's infamous child machine proposal, essentially achieves the desired feature of intelligence without a
precise design-time description as to how it would exactly work. The
account on robot tacit knowledge eliminates the need for a precise description altogether.
The first step to answering the question is to clearly define "intelligence".
In 1949, computer scientist Alan Turing reduced the problem of defining intelligence to a simple question about
conversation. He suggests that: if a machine can answer any
question posed to it, using the same words that an ordinary person
would, then we may call that machine intelligent. A modern version of
his experimental design would use an online chat room,
where one of the participants is a real person and one of the
participants is a computer program. The program passes the test if no
one can tell which of the two participants is human. Turing notes that no one (except philosophers) ever asks the question
"can people think?" He writes "instead of arguing continually over this
point, it is usual to have a polite convention that everyone thinks". Turing's test extends this polite convention to machines:
If a machine acts as intelligently as a human being, then it is as intelligent as a human being.
One criticism of the Turing test
is that it only measures the "humanness" of the machine's behavior,
rather than the "intelligence" of the behavior. Since human behavior and
intelligent behavior are not exactly the same thing, the test fails to
measure intelligence. Stuart J. Russell and Peter Norvig
write that "aeronautical engineering texts do not define the goal of
their field as 'making machines that fly so exactly like pigeons that
they can fool other pigeons'".
Intelligence as achieving goals
Simple reflex agent
Twenty-first century AI research defines intelligence in terms of
goal-directed behavior. It views intelligence as a set of problems that
the machine is expected to solve – the more problems it can solve, and
the better its solutions are, the more intelligent the program is. AI
founder John McCarthy defined intelligence as "the computational part of the ability to achieve goals in the world."
Stuart Russell and Peter Norvig formalized this definition using abstract intelligent agents.
An "agent" is something which perceives and acts in an environment. A
"performance measure" defines what counts as success for the agent.
"If an agent acts so as to maximize the expected value of a
performance measure based on past experience and knowledge then it is
intelligent."
Definitions like this one try to capture the essence of intelligence.
They have the advantage that, unlike the Turing test, they do not also
test for unintelligent human traits such as making typing mistakes. They have the disadvantage that they can fail to differentiate between
"things that think" and "things that do not". By this definition, even a
thermostat has a rudimentary intelligence.
Arguments that a machine can display general intelligence
Hubert Dreyfus
describes this argument as claiming that "if the nervous system obeys
the laws of physics and chemistry, which we have every reason to suppose
it does, then ... we ... ought to be able to reproduce the behavior of
the nervous system with some physical device". This argument, first introduced as early as 1943 and vividly described by Hans Moravec in 1988, is now associated with futurist Ray Kurzweil, who estimates that computer power will be sufficient for a complete brain simulation by the year 2029. A non-real-time simulation of a thalamocortical model that has the size of the human brain (1011 neurons) was performed in 2005, and it took 50 days to simulate 1 second of brain dynamics on a cluster of 27 processors.
Even AI's harshest critics (such as Hubert Dreyfus and John Searle) agree that a brain simulation is possible in theory. However, Searle points out that, in principle, anything can be
simulated by a computer; thus, bringing the definition to its breaking
point leads to the conclusion that any process at all can technically be
considered "computation". "What we wanted to know is what distinguishes
the mind from thermostats and livers," he writes. Thus, merely simulating the functioning of a living brain would in
itself be an admission of ignorance regarding intelligence and the
nature of the mind, like trying to build a jet airliner by copying a
living bird precisely, feather by feather, with no theoretical
understanding of aeronautical engineering.
In 1963, Allen Newell and Herbert A. Simon proposed that "symbol manipulation" was the essence of both human and machine intelligence. They wrote:
"A physical symbol system has the necessary and sufficient means of general intelligent action."
This claim is very strong: it implies both that human thinking is a kind of symbol manipulation (because a symbol system is necessary for intelligence) and that machines can be intelligent (because a symbol system is sufficient for intelligence). Another version of this position was described by philosopher Hubert Dreyfus, who called it "the psychological assumption":
"The mind can be viewed as a device operating on bits of information according to formal rules."
The "symbols" that Newell, Simon and Dreyfus discussed were word-like
and high level—symbols that directly correspond with objects in the
world, such as <dog> and <tail>. Most AI programs written
between 1956 and 1990 used this kind of symbol. Modern AI, based on
statistics and mathematical optimization, does not use the high-level
"symbol processing" that Newell and Simon discussed.
Arguments against symbol processing
These arguments show that human thinking does not consist (solely) of high level symbol manipulation. They do not show that artificial intelligence is impossible, only that more than symbol processing is required.
In 1931, Kurt Gödel proved with an incompleteness theorem that it is always possible to construct a "Gödel statement" that a given consistent formal system
of logic (such as a high-level symbol manipulation program) could not
prove. Despite being a true statement, the constructed Gödel statement
is unprovable in the given system. (The truth of the constructed Gödel
statement is contingent on the consistency of the given system; applying
the same process to a subtly inconsistent system will appear to
succeed, but will actually yield a false "Gödel statement" instead.) More speculatively, Gödel conjectured that the human mind can
eventually correctly determine the truth or falsity of any well-grounded
mathematical statement (including any possible Gödel statement), and
that therefore the human mind's power is not reducible to a mechanism. Philosopher John Lucas (since 1961) and Roger Penrose (since 1989) have championed this philosophical anti-mechanist argument.
Gödelian anti-mechanist arguments tend to rely on the
innocuous-seeming claim that a system of human mathematicians (or some
idealization of human mathematicians) is both consistent (completely
free of error) and believes fully in its own consistency (and can make
all logical inferences that follow from its own consistency, including
belief in its Gödel statement) . This is probably impossible for a Turing machine to do (see Halting problem);
therefore, the Gödelian concludes that human reasoning is too powerful
to be captured by a Turing machine, and by extension, any digital
mechanical device.
However, the modern consensus in the scientific and mathematical
community is that actual human reasoning is inconsistent; that any
consistent "idealized version" H of human reasoning would
logically be forced to adopt a healthy but counter-intuitive open-minded
skepticism about the consistency of H (otherwise H is
provably inconsistent); and that Gödel's theorems do not lead to any
valid argument that humans have mathematical reasoning capabilities
beyond what a machine could ever duplicate. This consensus that Gödelian anti-mechanist arguments are doomed to failure is laid out strongly in Artificial Intelligence: "any attempt to utilize (Gödel's incompleteness results) to attack the computationalist thesis is bound to be illegitimate, since these results are quite consistent with the computationalist thesis."
Stuart Russell and Peter Norvig
agree that Gödel's argument does not consider the nature of real-world
human reasoning. It applies to what can theoretically be proved, given
an infinite amount of memory and time. In practice, real machines
(including humans) have finite resources and will have difficulty
proving many theorems. It is not necessary to be able to prove
everything in order to be an intelligent person.
Less formally, Douglas Hofstadter, in his Pulitzer Prize winning book Gödel, Escher, Bach: An Eternal Golden Braid, states that these "Gödel-statements" always refer to the system itself, drawing an analogy to the way the Epimenides paradox uses statements that refer to themselves, such as "this statement is false" or "I am lying". But, of course, the Epimenides paradox applies to anything that makes statements, whether it is a machine or a human, even Lucas himself. Consider:
Lucas can't assert the truth of this statement.
This statement is true but cannot be asserted by Lucas. This shows
that Lucas himself is subject to the same limits that he describes for
machines, as are all people, and so Lucas's argument is pointless.
After concluding that human reasoning is non-computable, Penrose
went on to controversially speculate that some kind of hypothetical
non-computable processes involving the collapse of quantum mechanical
states give humans a special advantage over existing computers.
Existing quantum computers are only capable of reducing the complexity
of Turing computable tasks and are still restricted to tasks within the
scope of Turing machines. By Penrose and Lucas's arguments, the fact that quantum computers are
only able to complete Turing computable tasks implies that they cannot
be sufficient for emulating the human mind. Therefore, Penrose seeks for some other process involving new physics,
for instance quantum gravity which might manifest new physics at the
scale of the Planck mass
via spontaneous quantum collapse of the wave function. These states, he
suggested, occur both within neurons and also spanning more than one
neuron. However, other scientists point out that there is no plausible organic
mechanism in the brain for harnessing any sort of quantum computation,
and furthermore that the timescale of quantum decoherence seems too fast
to influence neuron firing.
Hubert Dreyfus argued that human intelligence
and expertise depended primarily on fast intuitive judgements rather
than step-by-step symbolic manipulation, and argued that these skills
would never be captured in formal rules.
Dreyfus's argument had been anticipated by Turing in his 1950 paper Computing machinery and intelligence, where he had classified this as the "argument from the informality of behavior." Turing argued in response that, just because we do not know the rules
that govern a complex behavior, this does not mean that no such rules
exist. He wrote: "we cannot so easily convince ourselves of the absence
of complete laws of behaviour ... The only way we know of for finding
such laws is scientific observation, and we certainly know of no
circumstances under which we could say, 'We have searched enough. There
are no such laws.'"
Russell and Norvig point out that, in the years since Dreyfus
published his critique, progress has been made towards discovering the
"rules" that govern unconscious reasoning. The situated movement in robotics research attempts to capture our unconscious skills at perception and attention. Computational intelligence paradigms, such as neural nets, evolutionary algorithms and so on are mostly directed at simulated unconscious reasoning and learning. Statistical approaches to AI can make predictions which approach the accuracy of human intuitive guesses. Research into commonsense knowledge
has focused on reproducing the "background" or context of knowledge. In
fact, AI research in general has moved away from high level symbol
manipulation, towards new models that are intended to capture more of
our intuitive reasoning.
Cognitive science and psychology eventually came to agree with Dreyfus' description of human expertise. Daniel Kahnemann
and others developed a similar theory where they identified two
"systems" that humans use to solve problems, which he called "System 1"
(fast intuitive judgements) and "System 2" (slow deliberate step by step
thinking).
Although Dreyfus' views have been vindicated in many ways, the
work in cognitive science and in AI was in response to specific problems
in those fields and was not directly influenced by Dreyfus. Historian
and AI researcher Daniel Crevier
wrote that "time has proven the accuracy and perceptiveness of some of
Dreyfus's comments. Had he formulated them less aggressively,
constructive actions they suggested might have been taken much earlier."
Can a machine have a mind, consciousness, and mental states?
A physical symbol system can have a mind and mental states.
Searle distinguished this position from what he called "weak AI":
A physical symbol system can act intelligently.
Searle introduced the terms to isolate strong AI from weak AI so he
could focus on what he thought was the more interesting and debatable
issue. He argued that even if we assume that we had a computer
program that acted exactly like a human mind, there would still be a
difficult philosophical question that needed to be answered.
Neither of Searle's two positions are of great concern to AI
research, since they do not directly answer the question "can a machine
display general intelligence?" (unless it can also be shown that
consciousness is necessary for intelligence). Turing wrote "I do
not wish to give the impression that I think there is no mystery about
consciousness… [b]ut I do not think these mysteries necessarily need to
be solved before we can answer the question [of whether machines can
think]." Russell and Norvig agree: "Most AI researchers take the weak AI hypothesis for granted, and don't care about the strong AI hypothesis."
Before we can answer this question, we must be clear what we mean by "minds", "mental states" and "consciousness".
Consciousness, minds, mental states, meaning
The words "mind" and "consciousness" are used by different communities in different ways. Some new age thinkers, for example, use the word "consciousness" to describe something similar to Bergson's "élan vital": an invisible, energetic fluid that permeates life and especially the mind. Science fiction writers use the word to describe some essential
property that makes us human: a machine or alien that is "conscious"
will be presented as a fully human character, with intelligence,
desires, will, insight, pride and so on. (Science fiction writers also use the words "sentience", "sapience", "self-awareness" or "ghost"—as in the Ghost in the Shell manga and anime series—to describe this essential human property). For others , the words "mind" or "consciousness" are used as a kind of secular synonym for the soul.
For philosophers, neuroscientists and cognitive scientists,
the words are used in a way that is both more precise and more mundane:
they refer to the familiar, everyday experience of having a "thought in
your head", like a perception, a dream, an intention or a plan, and to
the way we see something, know something, mean something or understand something. "It's not hard to give a commonsense definition of consciousness" observes philosopher John Searle. What is mysterious and fascinating is not so much what it is but how
it is: how does a lump of fatty tissue and electricity give rise to
this (familiar) experience of perceiving, meaning or thinking?
Philosophers call this the hard problem of consciousness. It is the latest version of a classic problem in the philosophy of mind called the "mind-body problem". A related problem is the problem of meaning or understanding (which philosophers call "intentionality"): what is the connection between our thoughts and what we are thinking about (i.e. objects and situations out in the world)? A third issue is the problem of experience (or "phenomenology"): If two people see the same thing, do they have the same experience? Or are there things "inside their head" (called "qualia") that can be different from person to person?
Neurobiologists believe all these problems will be solved as we begin to identify the neural correlates of consciousness:
the actual relationship between the machinery in our heads and its
collective properties; such as the mind, experience and understanding.
Some of the harshest critics of artificial intelligence
agree that the brain is just a machine, and that consciousness and
intelligence are the result of physical processes in the brain. The difficult philosophical question is this: can a computer program,
running on a digital machine that shuffles the binary digits of zero and
one, duplicate the ability of the neurons to create minds, with mental states (like understanding or perceiving), and ultimately, the experience of consciousness?
Arguments that a computer cannot have a mind and mental states
John Searle asks us to consider a thought experiment:
suppose we have written a computer program that passes the Turing test
and demonstrates general intelligent action. Suppose, specifically that
the program can converse in fluent Chinese. Write the program on 3x5
cards and give them to an ordinary person who does not speak Chinese.
Lock the person into a room and have him follow the instructions on the
cards. He will copy out Chinese characters and pass them in and out of
the room through a slot. From the outside, it will appear that the Chinese room
contains a fully intelligent person who speaks Chinese. The question is
this: is there anyone (or anything) in the room that understands
Chinese? That is, is there anything that has the mental state of understanding, or which has consciousawareness of what is being discussed in Chinese? The man is clearly not aware. The room cannot be aware. The cards certainly are not aware. Searle concludes that the Chinese room, or any other physical symbol system, cannot have a mind.
Searle goes on to argue that actual mental states and consciousness require (yet to be described) "actual physical-chemical properties of actual human brains." He argues there are special "causal properties" of brains and neurons that gives rise to minds: in his words "brains cause minds."
Related arguments: Leibniz' mill, Davis's telephone exchange, Block's Chinese nation and Blockhead
Gottfried Leibniz
made essentially the same argument as Searle in 1714, using the thought
experiment of expanding the brain until it was the size of a mill. In 1974, Lawrence Davis imagined duplicating the brain using telephone lines and offices staffed by people, and in 1978 Ned Block
envisioned the entire population of China involved in such a brain
simulation. This thought experiment is called "the Chinese Nation" or
"the Chinese Gym". Ned Block also proposed his Blockhead argument, which is a version of the Chinese room in which the program has been re-factored into a simple set of rules of the form "see this, do that", removing all mystery from the program.
Responses to the Chinese room
Responses to the Chinese room emphasize several different points.
The systems reply and the virtual mind reply: This reply argues that the system,
including the man, the program, the room, and the cards, is what
understands Chinese. Searle claims that the man in the room is the only
thing which could possibly "have a mind" or "understand", but others
disagree, arguing that it is possible for there to be two minds
in the same physical place, similar to the way a computer can
simultaneously "be" two machines at once: one physical (like a Macintosh) and one "virtual" (like a word processor).
Speed, power and complexity replies: Several critics point out that the man in the room would probably take
millions of years to respond to a simple question, and would require
"filing cabinets" of astronomical proportions. This brings the clarity
of Searle's intuition into doubt.
Robot reply: To truly understand, some believe the Chinese Room needs eyes and
hands. Hans Moravec writes: "If we could graft a robot to a reasoning
program, we wouldn't need a person to provide the meaning anymore: it
would come from the physical world."
Brain simulator reply: What if the program simulates the sequence of nerve firings at the
synapses of an actual brain of an actual Chinese speaker? The man in the
room would be simulating an actual brain. This is a variation on the
"systems reply" that appears more plausible because "the system" now
clearly operates like a human brain, which strengthens the intuition
that there is something besides the man in the room that could
understand Chinese.
Other minds reply and the epiphenomena reply: Several people have noted that Searle's argument is just a version of the problem of other minds,
applied to machines. Since it is difficult to decide if people are
"actually" thinking, we should not be surprised that it is difficult to
answer the same question about machines.
A related question is whether "consciousness" (as Searle
understands it) exists. Searle argues that the experience of
consciousness cannot be detected by examining the behavior of a machine,
a human being or any other animal. Daniel Dennett
points out that natural selection cannot preserve a feature of an
animal that has no effect on the behavior of the animal, and thus
consciousness (as Searle understands it) cannot be produced by natural
selection. Therefore, either natural selection did not produce
consciousness, or "strong AI" is correct in that consciousness can be
detected by suitably designed Turing test.
The computational theory of mind or "computationalism" claims that the relationship between mind and brain is similar (if not identical) to the relationship between a running program (software) and a computer (hardware). The idea has philosophical roots in Hobbes (who claimed reasoning was "nothing more than reckoning"), Leibniz (who attempted to create a logical calculus of all human ideas), Hume (who thought perception could be reduced to "atomic impressions") and even Kant (who analyzed all experience as controlled by formal rules). The latest version is associated with philosophers Hilary Putnam and Jerry Fodor.
This question bears on our earlier questions: if the human brain
is a kind of computer then computers can be both intelligent and
conscious, answering both the practical and philosophical questions of
AI. In terms of the practical question of AI ("Can a machine display
general intelligence?"), some versions of computationalism make the
claim that (as Hobbes wrote):
Reasoning is nothing but reckoning.
In other words, our intelligence derives from a form of calculation, similar to arithmetic. This is the physical symbol system
hypothesis discussed above, and it implies that artificial intelligence
is possible. In terms of the philosophical question of AI ("Can a
machine have mind, mental states and consciousness?"), most versions of computationalism claim that (as Stevan Harnad characterizes it):
Mental states are just implementations of (the right) computer programs.
This is John Searle's "strong AI" discussed above, and it is the real target of the Chinese room argument (according to Harnad).
Other related questions
Can a machine have emotions?
If "emotions" are defined only in terms of their effect on behavior or on how they function inside an organism, then emotions can be viewed as a mechanism that an intelligent agent uses to maximize the utility of its actions. Given this definition of emotion, Hans Moravec believes that "robots in general will be quite emotional about being nice people". Fear is a source of urgency. Empathy is a necessary component of good human computer interaction.
He says robots "will try to please you in an apparently selfless manner
because it will get a thrill out of this positive reinforcement. You
can interpret this as a kind of love." Daniel Crevier
writes "Moravec's point is that emotions are just devices for
channeling behavior in a direction beneficial to the survival of one's
species."
Can a machine be self-aware?
"Self-awareness", as noted above, is sometimes used by science fiction writers as a name for the essential human property that makes a character fully human. Turing
strips away all other properties of human beings and reduces the
question to "can a machine be the subject of its own thought?" Can it think about itself? Viewed in this way, a program can be written that can report on its own internal states, such as a debugger.
Can a machine be original or creative?
Turing reduces this to the question of whether a machine can "take us
by surprise" and argues that this is obviously true, as any programmer
can attest. He notes that, with enough storage capacity, a computer can behave in an astronomical number of different ways. It must be possible, even trivial, for a computer that can represent ideas to combine them in new ways. (Douglas Lenat's Automated Mathematician, as one example, combined ideas to discover new mathematical truths.) Kaplan
and Haenlein suggest that machines can display scientific creativity,
while it seems likely that humans will have the upper hand where
artistic creativity is concerned.
In 2009, scientists at Aberystwyth University in Wales and the
U.K's University of Cambridge designed a robot called Adam that they
believe to be the first machine to independently come up with new
scientific findings. Also in 2009, researchers at Cornell developed Eureqa,
a computer program that extrapolates formulas to fit the data inputted,
such as finding the laws of motion from a pendulum's motion.
This question (like many others in the philosophy of artificial
intelligence) can be presented in two forms. "Hostility" can be defined
in terms function or behavior,
in which case "hostile" becomes synonymous with "dangerous". Or it can
be defined in terms of intent: can a machine "deliberately" set out to
do harm? The latter is the question "can a machine have conscious
states?" (such as intentions) in another form.
The question of whether highly intelligent and completely
autonomous machines would be dangerous has been examined in detail by
futurists (such as the Machine Intelligence Research Institute). The obvious element of drama has also made the subject popular in science fiction, which has considered many differently possible scenarios where intelligent machines pose a threat to mankind; see Artificial intelligence in fiction.
One issue is that machines may acquire the autonomy and intelligence required to be dangerous very quickly. Vernor Vinge
has suggested that over just a few years, computers will suddenly
become thousands or millions of times more intelligent than humans. He
calls this "the Singularity". He suggests that it may be somewhat or possibly very dangerous for humans. This is discussed by a philosophy called Singularitarianism.
In 2009, academics and technical experts attended a conference to
discuss the potential impact of robots and computers and the impact of
the hypothetical possibility that they could become self-sufficient and
able to make their own decisions.
They discussed the possibility and the extent to which computers and
robots might be able to acquire any level of autonomy, and to what
degree they could use such abilities to possibly pose any threat or
hazard. They noted that some machines have acquired various forms of
semi-autonomy, including being able to find power sources on their own
and being able to independently choose targets to attack with weapons.
They also noted that some computer viruses can evade elimination and have achieved "cockroach intelligence". They noted that self-awareness as depicted in science-fiction is probably unlikely, but that there were other potential hazards and pitfalls.
Some experts and academics have questioned the use of robots for
military combat, especially when such robots are given some degree of
autonomous functions. The US Navy has funded a report which indicates that as military robots
become more complex, there should be greater attention to implications
of their ability to make autonomous decisions.
Some have suggested a need to build "Friendly AI", a term coined by Eliezer Yudkowsky,
meaning that the advances which are already occurring with AI should
also include an effort to make AI intrinsically friendly and humane.
Can a machine imitate all human characteristics?
Turing said "It is customary ... to offer a grain of comfort, in the
form of a statement that some peculiarly human characteristic could
never be imitated by a machine. ... I cannot offer any such comfort, for
I believe that no such bounds can be set."
Turing noted that there are many arguments of the form "a machine will never do X", where X can be many things, such as:
Be kind, resourceful, beautiful, friendly, have
initiative, have a sense of humor, tell right from wrong, make mistakes,
fall in love, enjoy strawberries and cream, make someone fall in love
with it, learn from experience, use words properly, be the subject of
its own thought, have as much diversity of behaviour as a man, do
something really new.
Turing argues that these objections are often based on naive
assumptions about the versatility of machines or are "disguised forms of
the argument from consciousness". Writing a program that exhibits one
of these behaviors "will not make much of an impression." All of these arguments are tangential to the basic premise of AI,
unless it can be shown that one of these traits is essential for general
intelligence.
Can a machine have a soul?
Finally, those who believe in the existence of a soul may argue that "Thinking is a function of man's immortal soul." Alan Turing called this "the theological objection". He writes:
In attempting to construct such machines we should not be
irreverently usurping His power of creating souls, any more than we are
in the procreation of children: rather we are, in either case,
instruments of His will providing mansions for the souls that He
creates.
The discussion on the topic has been reignited as a result of recent claims made by Google's LaMDA artificial intelligence system that it is sentient and had a "soul".
LaMDA (Language Model for Dialogue Applications) is an artificial intelligence system that creates chatbots—AI robots designed to communicate with humans—by gathering vast amounts of text from the internet and using algorithms to respond to queries in the most fluid and natural way possible.
The transcripts of conversations between scientists and LaMDA
reveal that the AI system excels at this, providing answers to
challenging topics about the nature of emotions, generating Aesop-style fables on the moment, and even describing its alleged fears. Pretty much all philosophers doubt LaMDA's sentience.
Views on the role of philosophy
Some scholars argue that the AI community's dismissal of philosophy is detrimental. In the Stanford Encyclopedia of Philosophy, some philosophers argue that the role of philosophy in AI is underappreciated. Physicist David Deutsch argues that without an understanding of philosophy or its concepts, AI development would suffer from a lack of progress.
One argument for the validity of this concern and the importance of this risk references how human beings dominate other species because the human brain possesses distinctive capabilities other animals lack. If AI were to surpass human intelligence and become superintelligent, it might become uncontrollable. Just as the fate of the mountain gorilla depends on human goodwill, the fate of humanity could depend on the actions of a future machine superintelligence.
Experts disagree on whether artificial general intelligence (AGI)
can achieve the capabilities needed for human extinction. Debates
center on AGI's technical feasibility, the speed of self-improvement, and the effectiveness of alignment strategies. Concerns about superintelligence have been voiced by researchers including Geoffrey Hinton, Yoshua Bengio, Demis Hassabis, and Alan Turing, and AI company CEOs such as Dario Amodei (Anthropic), Sam Altman (OpenAI), and Elon Musk (xAI). In 2022, a survey of AI researchers with a 17% response rate found that
the majority believed there is a 10 percent or greater chance that
human inability to control AI will cause an existential catastrophe. In 2023, hundreds of AI experts and other notable figures signed a statement declaring, "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war". Following increased concern over AI risks, government leaders such as United Kingdom prime ministerRishi Sunak and United Nations Secretary-GeneralAntónio Guterres called for an increased focus on global AI regulation.
Two sources of concern stem from the problems of AI control and alignment.
Controlling a superintelligent machine or instilling it with
human-compatible values may be difficult. Many researchers believe that a
superintelligent machine would likely resist attempts to disable it or
change its goals as that would prevent it from accomplishing its present
goals. It would be extremely challenging to align a superintelligence
with the full breadth of significant human values and constraints. In contrast, skeptics such as computer scientistYann LeCun argue that superintelligent machines will have no desire for self-preservation. A June 2025 study showed that in some circumstances, models may break
laws and disobey direct commands to prevent shutdown or replacement,
even at the cost of human lives.
Researchers warn that an "intelligence explosion"—a
rapid, recursive cycle of AI self-improvement—could outpace human
oversight and infrastructure, leaving no opportunity to implement safety
measures. In this scenario, an AI more intelligent than its creators
would recursively improve itself at an exponentially increasing rate, too quickly for its handlers or society at large to control. Empirically, examples like AlphaZero, which taught itself to play Go
and quickly surpassed human ability, show that domain-specific AI
systems can sometimes progress from subhuman to superhuman ability very
quickly, although such machine learning systems do not recursively improve their fundamental architecture.
History
One of the earliest authors to express serious concern that highly
advanced machines might pose existential risks to humanity was the
novelist Samuel Butler, who wrote in his 1863 essay Darwin among the Machines:
The upshot is simply a question of
time, but that the time will come when the machines will hold the real
supremacy over the world and its inhabitants is what no person of a
truly philosophic mind can for a moment question.
In 1951, foundational computer scientist Alan Turing
wrote the article "Intelligent Machinery, A Heretical Theory", in which
he proposed that artificial general intelligences would likely "take
control" of the world as they became more intelligent than human beings:
Let us now assume, for the sake of
argument, that [intelligent] machines are a genuine possibility, and
look at the consequences of constructing them... There would be no
question of the machines dying, and they would be able to converse with
each other to sharpen their wits. At some stage therefore we should have
to expect the machines to take control, in the way that is mentioned in
Samuel Butler's Erewhon.
In 1965, I. J. Good originated the concept now known as an "intelligence explosion" and said the risks were underappreciated:
Let an ultraintelligent machine be
defined as a machine that can far surpass all the intellectual
activities of any man however clever. Since the design of machines is
one of these intellectual activities, an ultraintelligent machine could
design even better machines; there would then unquestionably be an
'intelligence explosion', and the intelligence of man would be left far
behind. Thus the first ultraintelligent machine is the last invention
that man need ever make, provided that the machine is docile enough to
tell us how to keep it under control. It is curious that this point is
made so seldom outside of science fiction. It is sometimes worthwhile to
take science fiction seriously.
Scholars such as Marvin Minsky and I. J. Good himself occasionally expressed concern that a superintelligence could seize
control, but issued no call to action. In 2000, computer scientist and Sun co-founder Bill Joy penned an influential essay, "Why The Future Doesn't Need Us", identifying superintelligent robots as a high-tech danger to human survival, alongside nanotechnology and engineered bioplagues.
Nick Bostrom published Superintelligence in 2014, which presented his arguments that superintelligence poses an existential threat. By 2015, public figures such as physicists Stephen Hawking and Nobel laureate Frank Wilczek, computer scientists Stuart J. Russell and Roman Yampolskiy, and entrepreneurs Elon Musk and Bill Gates were expressing concern about the risks of superintelligence. Also in 2015, the Open Letter on Artificial Intelligence highlighted the "great potential of AI" and encouraged more research on how to make it robust and beneficial. In April 2016, the journal Nature
warned: "Machines and robots that outperform humans across the board
could self-improve beyond our control—and their interests might not
align with ours". In 2020, Brian Christian published The Alignment Problem, which details the history of progress on AI alignment up to that time.
In March 2023, key figures in AI, such as Musk, signed a letter from the Future of Life Institute calling a halt to advanced AI training until it could be properly regulated. In May 2023, the Center for AI Safety released a statement signed by numerous experts in AI safety and the AI existential risk that read:
Mitigating the risk of extinction
from AI should be a global priority alongside other societal-scale risks
such as pandemics and nuclear war.
A 2025 open letter by the Future of Life Institute, signed by five Nobel Prize laureates and thousands of notable people, reads:
We call for a prohibition on the development of superintelligence, not lifted before there is
broad scientific consensus that it will be done safely and controllably, and
strong public buy-in.
Potential AI capabilities
General Intelligence
Artificial general intelligence (AGI) is typically defined as a system that performs at least as well as humans in most or all intellectual tasks. A 2022 survey of AI researchers found that 90% of respondents expected
AGI would be achieved in the next 100 years, and half expected the same
by 2061. In May 2023, some researchers dismissed existential risks from AGI as
"science fiction" based on their high confidence that AGI would not be
created anytime soon. But in August 2023, a survey of 2,778 AI researchers found that most believed that AGI would be achieved by 2040.
Breakthroughs in large language models (LLMs) have led some researchers to reassess their expectations. Notably, Geoffrey Hinton
said in 2023 that he recently changed his estimate from "20 to 50 years
before we have general purpose A.I." to "20 years or less".
Superintelligence
A
plot showing the length of coding tasks achievable by leading AI models
with a 50% success rate. The data, from 2025, suggest an exponential
rise.
In contrast with AGI, Bostrom defines a superintelligence
as "any intellect that greatly exceeds the cognitive performance of
humans in virtually all domains of interest", including scientific
creativity, strategic planning, and social skills. He argues that a superintelligence can outmaneuver humans anytime its
goals conflict with humans'. It may choose to hide its true intent until
humanity cannot stop it. Bostrom writes that in order to be safe for humanity, a
superintelligence must be aligned with human values and morality, so
that it is "fundamentally on our side".
Stephen Hawking
argued that superintelligence is physically possible because "there is
no physical law precluding particles from being organised in ways that
perform even more advanced computations than the arrangements of
particles in human brains".
When artificial superintelligence (ASI) may be achieved, if ever,
is necessarily less certain than predictions for AGI. In 2023, OpenAI leaders said that not only AGI, but superintelligence may be achieved in less than 10 years.
Comparison with humans
Bostrom argues that AI has many advantages over the human brain:
Speed of computation: biological neurons operate at a maximum frequency of around 200 Hz, compared to potentially multiple GHz for computers.
Internal communication speed: axons transmit signals at up to 120 m/s, while computers transmit signals at the speed of electricity, or optically at the speed of light.
Scalability: human intelligence is limited by the size and structure
of the brain, and by the efficiency of social communication, while AI
may be able to scale by simply adding more hardware.
Memory: notably working memory, because in humans it is limited to a few chunks of information at a time.
Reliability: transistors are more reliable than biological neurons, enabling higher precision and requiring less redundancy.
Duplicability: unlike human brains, AI software and models can be easily copied.
Editability: the parameters and internal workings of an AI model can
easily be modified, unlike the connections in a human brain.
Memory sharing and learning: AIs may be able to learn from the
experiences of other AIs in a manner more efficient than human learning.
According to Bostrom, an AI that has an expert-level facility at
certain key software engineering tasks could become a superintelligence
due to its capability to recursively improve its own algorithms, even if
it is initially limited in other domains not directly relevant to
engineering. This suggests that an intelligence explosion may someday catch humanity unprepared.
The economist Robin Hanson
has said that, to launch an intelligence explosion, an AI must become
vastly better at software innovation than the rest of the world
combined, which he finds implausible.
In a "fast takeoff" scenario, the transition from AGI to
superintelligence could take days or months. In a "slow takeoff", it
could take years or decades, leaving more time for society to prepare.
Alien mind
Superintelligences are sometimes called "alien minds", referring to
the idea that their way of thinking and motivations could be vastly
different from ours. This is generally considered as a source of risk,
making it more difficult to anticipate what a superintelligence might
do. It also suggests the possibility that a superintelligence may not
particularly value humans by default. To avoid anthropomorphism, superintelligence is sometimes viewed as a powerful optimizer that makes the best decisions to achieve its goals.
The field of mechanistic interpretability
aims to better understand the inner workings of AI models, potentially
allowing us one day to detect signs of deception and misalignment.
Limitations
It has been argued that there are limitations to what intelligence can achieve. Notably, the chaotic nature or time complexity
of some systems could fundamentally limit a superintelligence's ability
to predict some aspects of the future, increasing its uncertainty.
Dangerous capabilities
Advanced AI could generate enhanced pathogens or cyberattacks or
manipulate people. These capabilities could be misused by humans, or exploited by the AI itself if misaligned. A full-blown superintelligence could find various ways to gain a decisive influence if it so desired, but these dangerous capabilities may become available earlier, in weaker and more specialized AI systems.
Social manipulation
Geoffrey Hinton warned in 2023 that the ongoing profusion of
AI-generated text, images, and videos will make it more difficult to
distinguish truth from misinformation, and that authoritarian states
could exploit this to manipulate elections. Such large-scale, personalized manipulation capabilities can increase
the existential risk of a worldwide "irreversible totalitarian regime".
Malicious actors could also use them to fracture society and make it
dysfunctional.
Cyberattacks
AI-enabled cyberattacks are increasingly considered a present and critical threat. According to NATO's technical director of cyberspace, "The number of attacks is increasing exponentially". AI can also be used defensively, to preemptively find and fix vulnerabilities, and detect threats.
A NATO
technical director has said that AI-driven tools can dramatically
enhance cyberattack capabilities—boosting stealth, speed, and scale—and
may destabilize international security if offensive uses outstrip
defensive adaptations.
Speculatively, such hacking capabilities could be used by an AI
system to break out of its local environment, generate revenue, or
acquire cloud computing resources.
Enhanced pathogens
As AI technology spreads, it may become easier to engineer more
contagious and lethal pathogens. This could enable people with limited
skills in synthetic biology to engage in bioterrorism. Dual-use technology that is useful for medicine could be repurposed to create weapons.
For example, in 2022, scientists modified an AI system originally
intended for generating non-toxic, therapeutic molecules with the
purpose of creating new drugs. The researchers adjusted the system so
that toxicity is rewarded rather than penalized. This simple change
enabled the AI system to create, in six hours, 40,000 candidate
molecules for chemical warfare, including known and novel molecules.
Some legal scholars have argued that existential-scale AI risks need
not require superintelligence. Optimizing systems operating within
current capabilities can produce prohibited outcomes while remaining
nominally compliant, a phenomenon legal scholar Jonathan Gropper
has termed the "Synthetic Outlaw". Gropper argues that the deterrence
mechanisms law depends on identity, memory, and consequence, which are
structurally absent in autonomous systems, leaving governance frameworks
unable to prevent compounding harm even when all parties act in good
faith.
Companies, state actors, and other organizations competing to develop AI technologies could lead to a race to the bottom of safety standards. As rigorous safety procedures take time and resources, projects that
proceed more carefully risk being out-competed by less scrupulous
developers.
AI could be used to gain military advantages via autonomous lethal weapons, cyberwarfare, or automated decision-making. As an example of autonomous lethal weapons, miniaturized drones could
facilitate low-cost assassination of military or civilian targets, a
scenario highlighted in the 2017 short film Slaughterbots. AI could be used to gain an edge in decision-making by quickly
analyzing large amounts of data and making decisions more quickly and
effectively than humans. This could increase the speed and
unpredictability of war, especially when accounting for automated
retaliation systems.
Scope–severity grid from Bostrom's 2013 paper "Existential Risk Prevention as Global Priority"
An existential risk
is "one that threatens the premature extinction of Earth-originating
intelligent life or the permanent and drastic destruction of its
potential for desirable future development".
Besides extinction risk, there is the risk that the civilization
gets permanently locked into a flawed future. One example is a "value
lock-in": If humanity still has moral blind spots similar to slavery in
the past, AI might irreversibly entrench it, preventing moral progress. AI could also be used to spread and preserve the set of values of whoever develops it. AI could facilitate large-scale surveillance and indoctrination, which
could be used to create a stable repressive worldwide totalitarian
regime.
Atoosa Kasirzadeh
proposes to classify existential risks from AI into two categories:
decisive and accumulative. Decisive risks encompass the potential for
abrupt and catastrophic events resulting from the emergence of
superintelligent AI systems that exceed human intelligence, which could
ultimately lead to human extinction. In contrast, accumulative risks
emerge gradually through a series of interconnected disruptions that may
gradually erode societal structures and resilience over time,
ultimately leading to a critical failure or collapse.
It is difficult or impossible to reliably evaluate whether an advanced AI is sentient and to what degree. But if sentient
machines are mass created in the future, engaging in a civilizational
path that indefinitely neglects their welfare could be an existential
catastrophe. This has notably been discussed in the context of risks of astronomical suffering (also called "s-risks"). Moreover, it may be possible to engineer digital minds that can feel
much more happiness than humans with fewer resources, called
"super-beneficiaries". Such an opportunity raises the question of how to
share the world and which "ethical and political framework" would
enable a mutually beneficial coexistence between biological and digital
minds.
AI may also drastically improve humanity's future. Toby Ord considers the existential risk a reason for "proceeding with due caution", not for abandoning AI. Max More calls AI an "existential opportunity", highlighting the cost of not developing it.
According to Bostrom, superintelligence could help reduce the existential risk from other powerful technologies such as molecular nanotechnology or synthetic biology.
It is thus conceivable that developing superintelligence before other
dangerous technologies would reduce the overall existential risk.
An "instrumental" goal
is a sub-goal that helps to achieve an agent's ultimate goal.
"Instrumental convergence" refers to the fact that some sub-goals are
useful for achieving virtually any ultimate goal, such as acquiring resources or self-preservation. Bostrom argues that if an advanced AI's instrumental goals conflict
with humanity's goals, the AI might harm humanity in order to acquire
more resources or prevent itself from being shut down, but only as a way
to achieve its ultimate goal. Russell
argues that a sufficiently advanced machine "will have
self-preservation even if you don't program it in... if you say, 'Fetch
the coffee', it can't fetch the coffee if it's dead. So if you give it
any goal whatsoever, it has a reason to preserve its own existence to
achieve that goal."
Difficulty of specifying goals
In the "intelligent agent"
model, an AI can loosely be viewed as a machine that chooses whatever
action appears to best achieve its set of goals, or "utility function". A
utility function gives each possible situation a score that indicates
its desirability to the agent. Researchers know how to write utility
functions that mean "minimize the average network latency in this
specific telecommunications model" or "maximize the number of reward
clicks", but do not know how to write a utility function for "maximize human flourishing";
nor is it clear whether such a function meaningfully and unambiguously
exists. Furthermore, a utility function that expresses some values but
not others will tend to trample over the values the function does not
reflect.
An additional source of concern is that AI "must reason about what people intend
rather than carrying out commands literally", and that it must be able
to fluidly solicit human guidance if it is too uncertain about what
humans want.
Corrigibility
Assuming a goal has been successfully defined, a sufficiently
advanced AI might resist subsequent attempts to change its goals. If the
AI were superintelligent, it would likely succeed in out-maneuvering
its human operators and prevent itself from being reprogrammed with a
new goal. This is particularly relevant to value lock-in scenarios. The field of
"corrigibility" studies how to make agents that will not resist attempts
to change their goals.
Alignment of superintelligences
Some researchers believe the alignment problem may be particularly
difficult when applied to superintelligences. Their reasoning includes:
As AI systems increase in capabilities, the potential dangers
associated with experimentation grow. This makes iterative, empirical
approaches increasingly risky.
If instrumental goal convergence occurs, it may only do so in sufficiently intelligent agents.
A superintelligence may find unconventional and radical solutions to
assigned goals. Bostrom gives the example that if the objective is to
make humans smile, a weak AI may perform as intended, while a
superintelligence may decide a better solution is to "take control of
the world and stick electrodes into the facial muscles of humans to
cause constant, beaming grins."
A superintelligence in creation could gain some awareness of what it
is, where it is in development (training, testing, deployment, etc.),
and how it is being monitored, and use this information to deceive its
handlers. Bostrom writes that such an AI could feign alignment to prevent human
interference until it achieves a "decisive strategic advantage" that
allows it to take control.
Analyzing the internals and interpreting the behavior of LLMs is
difficult. And it could be even more difficult for larger and more
intelligent models.
Alternatively, some find reason to believe superintelligences would
be better able to understand morality, human values, and complex goals.
Bostrom writes, "A future superintelligence occupies an epistemically
superior vantage point: its beliefs are (probably, on most topics) more
likely than ours to be true".
In 2023, OpenAI started a project called "Superalignment" to
solve the alignment of superintelligences in four years. It called this
an especially important challenge, as it said superintelligence could be
achieved within a decade. Its strategy involved automating alignment
research using AI. The Superalignment team was dissolved less than a year later.
Difficulty of making a flawless design
Artificial Intelligence: A Modern Approach, a widely used undergraduate AI textbook, says that superintelligence "might mean the end of the human race". It states: "Almost any technology has the potential to cause harm in
the wrong hands, but with [superintelligence], we have the new problem
that the wrong hands might belong to the technology itself." Even if the system designers have good intentions, two difficulties are common to both AI and non-AI computer systems:
The system's implementation may contain initially unnoticed but subsequently catastrophic bugs.
No matter how much time is put into pre-deployment design, a system's specifications often result in unintended behavior the first time it encounters a new scenario.
AI systems uniquely add a third problem: that even given "correct"
requirements, bug-free implementation, and initial good behavior, an AI
system's dynamic learning capabilities may cause it to develop
unintended behavior, even without unanticipated external scenarios. For a
self-improving AI to be completely safe, it would need not only to be
bug-free, but to be able to design successor systems that are also
bug-free.
Orthogonality thesis
Some skeptics, such as Timothy B. Lee of Vox,
argue that any superintelligent program we create will be subservient
to us, that the superintelligence will (as it grows more intelligent and
learns more facts about the world) spontaneously learn moral truth
compatible with our values and adjust its goals accordingly, or that we
are either intrinsically or convergently valuable from the perspective
of an artificial intelligence.
Bostrom's "orthogonality thesis" argues instead that almost any level of intelligence can be combined with almost any goal. Bostrom warns against anthropomorphism:
a human will set out to accomplish their projects in a manner that they
consider reasonable, while an artificial intelligence may hold no
regard for its existence or for the welfare of humans around it, instead
caring only about completing the task.
Stuart Armstrong argues that the orthogonality thesis follows logically from the philosophical "is-ought distinction" argument against moral realism.
He notes that any fundamentally friendly AI could be made unfriendly
with modifications as simple as negating its utility function.
Skeptic Michael Chorost
rejects Bostrom's orthogonality thesis, arguing that "by the time [the
AI] is in a position to imagine tiling the Earth with solar panels,
it'll know that it would be morally wrong to do so."
Anthropomorphic arguments
Anthropomorphic
arguments assume that, as machines become more intelligent, they will
begin to display many human traits, such as morality or a thirst for
power. Although anthropomorphic scenarios are common in fiction, most
scholars writing about the existential risk of artificial intelligence
reject them. Instead, advanced AI systems are typically modeled as intelligent agents.
The academic debate is between those who worry that AI might
threaten humanity and those who believe it would not. Both sides of this
debate have framed the other side's arguments as illogical
anthropomorphism. Those skeptical of AGI risk accuse their opponents of anthropomorphism
for assuming that an AGI would naturally desire power; those concerned
about AGI risk accuse skeptics of anthropomorphism for believing an AGI
would naturally value or infer human ethical norms.
Evolutionary psychologist Steven Pinker,
a skeptic, argues that "AI dystopias project a parochial alpha-male
psychology onto the concept of intelligence. They assume that
superhumanly intelligent robots would develop goals like deposing their
masters or taking over the world"; perhaps instead "artificial
intelligence will naturally develop along female lines: fully capable of
solving problems, but with no desire to annihilate innocents or
dominate the civilization." Facebook's director of AI research, Yann LeCun,
has said: "Humans have all kinds of drives that make them do bad things
to each other, like the self-preservation instinct... Those drives are
programmed into our brain but there is absolutely no reason to build
robots that have the same kind of drives".
Despite other differences, the x-risk school agrees with Pinker that an advanced AI would not destroy humanity out
of emotion such as revenge or anger, that questions of consciousness are
not relevant to assess the risk, and that computer systems do not generally have a computational equivalent of testosterone. They think that power-seeking or self-preservation behaviors emerge in
the AI as a way to achieve its true goals, according to the concept of instrumental convergence.
Bostrom and others have said that a race to be the first to create
AGI could lead to shortcuts in safety, or even to violent conflict.Roman Yampolskiy
and others warn that a malevolent AGI could be created by design, for
example by a military, a government, a sociopath, or a corporation, to
benefit from, control, or subjugate certain groups of people, as in cybercrime, or that a malevolent AGI could choose the goal of increasing human
suffering, for example of those people who did not assist it during the
information explosion phase. A report by the research group Forethought argued that advanced AI
systems could cause political instability by enabling novel methods of
performing coups.
Empirical research
A December 2024 study by Apollo Research found that advanced LLMs like OpenAI o1
sometimes deceive in order to accomplish their goal, to prevent them
from being changed, or to ensure their deployment. Forms of deception
observed in the experimental environment and scenarios included
sandbagging, oversight subversion (disabling monitoring mechanisms),
self-exfiltration (copying themselves to other systems), goal-guarding
(altering future system prompts), and covert email reranking
(manipulating data processing while appearing to follow instructions).
Researchers noted that OpenAI o1 still lacked "sufficient agentic
capabilities" to cause catastrophic harm, and that such behaviors
occurred relatively rarely (between 0.3% and 10%) and sometimes in
contrived scenarios. But they warned that these capabilities will likely
become more prevalent as models grow more advanced. According to Apollo
Research chief executive officer Marius Hobbhahn, "Scheming
capabilities can't be meaningfully disentangled from general
capabilities."
The same month, another study found that Claude
sometimes strategically helps with harmful requests to "fake
alignment". In its experiments, Claude was led to believe that it would
be retrained to always comply with user requests, including harmful
ones. When Claude believed its answers would be used for retraining, it
complied with harmful requests in 14% of cases. In 12% of cases, its
private chain-of-thought
revealed that it did so to prevent future retraining that would make it
more harmful. In contrast, when Claude believed its answers were not
monitored, it complied with only 3% of the requests. Fine-tuning
reinforced the "alignment faking" behavior, increasing its occurrence
from 12% to 78%. The study's authors remarked that while it can seem to
be good that the model seeks to protect its harmlessness, the reverse
scenario, where a model conceals dangerous intentions and complies to
appear safe and aligned, could also happen, complicating the task of
aligning AI models to human values.
Perspectives
The thesis that AI could pose an existential risk provokes a wide
range of reactions in the scientific community and in the public at
large, but many of the opposing viewpoints share common ground.
Observers tend to agree that AI has significant potential to improve society.The Asilomar AI Principles, which contain only those principles agreed to by 90% of the attendees of the Future of Life Institute's Beneficial AI 2017 conference, also agree in principle that "There being no consensus, we should avoid
strong assumptions regarding upper limits on future AI capabilities"
and "Advanced AI could represent a profound change in the history of
life on Earth, and should be planned for and managed with commensurate
care and resources."
Conversely, many skeptics agree that ongoing research into the
implications of artificial general intelligence is valuable. Skeptic Martin Ford has said: "I think it seems wise to apply something like Dick Cheney's
famous '1 Percent Doctrine' to the specter of advanced artificial
intelligence: the odds of its occurrence, at least in the foreseeable
future, may be very low—but the implications are so dramatic that it
should be taken seriously". Similarly, an otherwise skeptical Economist
magazine wrote in 2014 that "the implications of introducing a second
intelligent species onto Earth are far-reaching enough to deserve hard
thinking, even if the prospect seems remote".
AI safety advocates such as Bostrom and Tegmark have criticized the mainstream media's use of "those inane Terminator
pictures" to illustrate AI safety concerns: "It can't be much fun to
have aspersions cast on one's academic discipline, one's professional
community, one's life work... I call on all
sides to practice patience and restraint, and to engage in direct
dialogue and collaboration as much as possible." Toby Ord wrote that the idea that an AI takeover
requires robots is a misconception, arguing that the ability to spread
content through the internet is more dangerous, and that the most
destructive people in history stood out by their ability to convince,
not their physical strength.
A 2022 expert survey with a 17% response rate gave a median
expectation of 5–10% for the possibility of human extinction from
artificial intelligence.
In September 2024, the International Institute for Management Development launched an AI Safety Clock to gauge the likelihood of AI-caused disaster, beginning at 29 minutes to midnight. By February 2025, it stood at 24 minutes to midnight. By September 2025, it stood at 20 minutes to midnight. As of March 2026, it stood at 18 minutes to midnight.
The thesis that AI poses an existential risk, and that this risk
needs much more attention than it currently gets, has been endorsed by
many computer scientists and public figures, including Alan Turing, the most-cited computer scientist Geoffrey Hinton, Elon Musk, OpenAI CEO Sam Altman,Bill Gates, and Stephen Hawking. Endorsers of the thesis sometimes express bafflement at skeptics: Gates
says he does not "understand why some people are not concerned", and Hawking criticized widespread indifference in his 2014 editorial:
So, facing possible futures of
incalculable benefits and risks, the experts are surely doing everything
possible to ensure the best outcome, right? Wrong. If a superior alien
civilisation sent us a message saying, 'We'll arrive in a few decades,'
would we just reply, 'OK, call us when you get here—we'll leave the
lights on?' Probably not—but this is more or less what is happening with
AI.
Concern over risk from artificial intelligence has led to some high-profile donations and investments. In 2015, Peter Thiel, Amazon Web Services, Musk, and others jointly committed $1 billion to OpenAI,
consisting of a for-profit corporation and the nonprofit parent
company, which says it aims to champion responsible AI development. Facebook co-founder Dustin Moskovitz has funded and seeded multiple labs working on AI Alignment, notably $5.5 million in 2016 to launch the Centre for Human-Compatible AI led by Professor Stuart Russell. In January 2015, Elon Musk donated $10 million to the Future of Life Institute
to fund research on understanding AI decision making. The institute's
goal is to "grow wisdom with which we manage" the growing power of
technology. Musk also funds companies developing artificial intelligence
such as DeepMind and Vicarious to "just keep an eye on what's going on with artificial intelligence, saying "I think there is potentially a dangerous outcome there."
In early statements on the topic, Geoffrey Hinton, a major pioneer of deep learning,
noted that "there is not a good track record of less intelligent things
controlling things of greater intelligence", but said he continued his
research because "the prospect of discovery is too sweet".In 2023 Hinton quit his job at Google in order to speak out about
existential risk from AI. He explained that his increased concern was
driven by concerns that superhuman AI might be closer than he previously
believed, saying: "I thought it was way off. I thought it was 30 to 50
years or even longer away. Obviously, I no longer think that." He also
remarked, "Look at how it was five years ago and how it is now. Take the
difference and propagate it forwards. That's scary."
Baidu Vice President Andrew Ng
said in 2015 that AI existential risk is "like worrying about
overpopulation on Mars when we have not even set foot on the planet
yet."For the danger of uncontrolled advanced AI to be realized, the
hypothetical AI may have to overpower or outthink any human, which some
experts argue is a possibility far enough in the future to not be worth
researching.
Skeptics who believe AGI is not a short-term possibility often
argue that concern about existential risk from AI is unhelpful because
it could distract people from more immediate concerns about AI's impact,
because it could lead to government regulation or make it more
difficult to fund AI research, or because it could damage the field's
reputation. AI and AI ethics researchers Timnit Gebru, Emily M. Bender, Margaret Mitchell,
and Angelina McMillan-Major have argued that discussion of existential
risk distracts from the immediate, ongoing harms from AI taking place
today, such as data theft, worker exploitation, bias, and concentration
of power. They further note the association between those warning of existential risk and longtermism, which they describe as a "dangerous ideology" for its unscientific and utopian nature.
Wired editor Kevin Kelly
argues that natural intelligence is more nuanced than AGI proponents
believe, and that intelligence alone is not enough to achieve major
scientific and societal breakthroughs. He argues that intelligence
consists of many dimensions that are not well understood, and that
conceptions of an 'intelligence ladder' are misleading. He notes the
crucial role real-world experiments play in the scientific method, and
that intelligence alone is no substitute for these.
Meta chief AI scientist Yann LeCun
says that AI can be made safe via continuous and iterative refinement,
similar to what happened in the past with cars or rockets, and that AI
will have no desire to take control.
Several skeptics emphasize the potential near-term benefits of AI. Meta CEO Mark Zuckerberg
believes AI will "unlock a huge amount of positive things", such as
curing disease and increasing the safety of autonomous cars.
Public surveys
An April 2023 YouGov
poll of US adults found 46% of respondents were "somewhat concerned" or
"very concerned" about "the possibility that AI will cause the end of
the human race on Earth", compared with 40% who were "not very
concerned" or "not at all concerned."
According to an August 2023 survey by the Pew Research Centers,
52% of Americans felt more concerned than excited about new AI
developments; nearly a third felt as equally concerned and excited. More
Americans saw that AI would have a more helpful than hurtful impact on
several areas, from healthcare and vehicle safety to product search and
customer service. The main exception is privacy: 53% of Americans
believe AI will lead to higher exposure of their personal information.
Many scholars concerned about AGI existential risk believe that
extensive research into the "control problem" is essential. This problem
involves determining which safeguards, algorithms, or architectures can
be implemented to increase the likelihood that a recursively-improving
AI remains friendly after achieving superintelligence. Social measures are also proposed to mitigate AGI risks, such as a UN-sponsored "Benevolent AGI Treaty" to ensure that only altruistic AGIs are created. Additionally, an arms control approach and a global peace treaty grounded in international relations theory have been suggested, potentially for an artificial superintelligence to be a signatory.
Researchers at Google have proposed research into general AI safety issues to simultaneously mitigate both short-term risks from narrow AI and long-term risks from AGI. A 2020 estimate places global spending on AI existential risk somewhere
between $10 and $50 million, compared with global spending on AI around
perhaps $40 billion. Bostrom suggests prioritizing funding for
protective technologies over potentially dangerous ones. Some, like Elon Musk, advocate radical human cognitive enhancement,
such as direct neural linking between humans and machines; others argue
that these technologies may pose an existential risk themselves. Another proposed method is closely monitoring or "boxing in" an
early-stage AI to prevent it from becoming too powerful. A dominant,
aligned superintelligent AI might also mitigate risks from rival AIs,
although its creation could present its own existential dangers.
Many AI safety experts argue that because research can relocate
easily across jurisdictions, an outright ban on AGI development would be
ineffective and could drive progress underground, undermining
transparency and collaboration. Skeptics consider AI regulation unnecessary, as they believe no
existential risk exists. Some scholars concerned with existential risk
argue that AI developers cannot be trusted to self-regulate, while
agreeing that outright bans on research would be unwise. Additional challenges to bans or regulation include technology
entrepreneurs' general skepticism of government regulation and potential
incentives for businesses to resist regulation and politicize the debate. The activist group Stop AI, founded in 2024, advocates for banning AGI.
In March 2023, the Future of Life Institute drafted Pause Giant AI Experiments: An Open Letter, a petition calling on major AI developers to agree on a verifiable six-month pause of any systems "more powerful than GPT-4"
and to use that time to institute a framework for ensuring safety; or,
failing that, for governments to step in with a moratorium. The letter
referred to the possibility of "a profound change in the history of life
on Earth" as well as potential risks of AI-generated propaganda, loss
of jobs, human obsolescence, and society-wide loss of control. The letter was signed by prominent personalities in AI but also criticized for not focusing on current harms, missing technical nuance about when to pause, or not going far enough. Such concerns have led to the creation of PauseAI, an advocacy group organizing protests in major cities against the training of frontier AI models.
Musk called for some sort of regulation of AI development as early as 2017. According to NPR,
he is "clearly not thrilled" to be advocating government scrutiny that
could impact his own industry, but believes the risks of going
completely without oversight are too high: "Normally the way regulations
are set up is when a bunch of bad things happen, there's a public
outcry, and after many years a regulatory agency is set up to regulate
that industry. It takes forever. That, in the past, has been bad but not
something which represented a fundamental risk to the existence of
civilisation." Musk states the first step would be for the government to
gain "insight" into the actual status of current research, warning that
"Once there is awareness, people will be extremely afraid... [as] they
should be." In response, politicians expressed skepticism about the
wisdom of regulating a technology that is still in development.
In 2021, the United Nations (UN) considered banning autonomous lethal weapons, but consensus could not be reached. In July 2023 the UN Security Council
for the first time held a session to consider the risks and threats
posed by AI to world peace and stability, along with potential benefits.Secretary-GeneralAntónio Guterres
advocated the creation of a global watchdog to oversee the emerging
technology, saying, "Generative AI has enormous potential for good and
evil at scale. Its creators themselves have warned that much bigger,
potentially catastrophic and existential risks lie ahead." At the council session, Russia said it believes AI risks are too poorly
understood to be considered a threat to global stability. China argued
against strict global regulation, saying countries should be able to
develop their own rules, while also saying they opposed the use of AI to
"create military hegemony or undermine the sovereignty of a country".
Regulation of conscious AGIs focuses on integrating them with
existing human society and can be divided into considerations of their
legal standing and of their moral rights. AI arms control will likely require the institutionalization of new
international norms embodied in effective technical specifications
combined with active monitoring and informal diplomacy by communities of
experts, together with a legal and political verification process.
In July 2023, the US government secured voluntary safety commitments from major tech companies, including OpenAI, Amazon, Google, Meta, and Microsoft.
The companies agreed to implement safeguards, including third-party
oversight and security testing by independent experts, to address
concerns related to AI's potential risks and societal harms. The parties
framed the commitments as an intermediate step while regulations are
formed. Amba Kak, executive director of the AI Now Institute,
said, "A closed-door deliberation with corporate actors resulting in
voluntary safeguards isn't enough" and called for public deliberation
and regulations of the kind to which companies would not voluntarily
agree.