Search This Blog

Monday, March 30, 2026

Philosophy of computer science

From Wikipedia, the free encyclopedia

The philosophy of computer science is concerned with the philosophical questions that arise within the study of computer science. There is still no common understanding of the content, aims, focus, or topics of the philosophy of computer science, despite some attempts to develop a philosophy of computer science like the philosophy of physics or the philosophy of mathematics. Due to the abstract nature of computer programs and the technological ambitions of computer science, many of the conceptual questions of the philosophy of computer science are also comparable to the philosophy of science, philosophy of mathematics, and the philosophy of technology.

Overview

Many of the central philosophical questions of computer science are centered on the logical, ethical, methodological, ontological and epistemological issues that concern it. Some of these questions may include:

Computation

The question of "What is computation?" remains a central question in relation to the philosophy of computer science. According to Nir Fresco, deciphering what a computation is requires distinguishing between computation and non-computational processes. Fresco identifies three main perspectives.

The first view is the semantic view. Supporters of this view hold that computations are viewed as internal processes occurring within a computing mechanism. Advocates have argued that computation involves manipulating symbol structures, content, and truth-preserving rules. However, this view has been criticized for depending on human interpretations rather than the inherent qualities of technology.

The second view is the causal view. According to this perspective, computations are defined by their causal characteristics. A system executes a calculation when the transformations in its physical state correspond to the structure of an abstract algorithm. This theory relates computations to cause-and-effect relationships between system components. Thus, it is linked to physical causation rather than semantic meaning.

The third view is the functional view. In this view, computation is distinguished by its functional characteristics, or the functions and relationships of its component pieces. According to this perspective, the organization of a mechanism's parts to do particular tasks is more important than whether the symbols have external meaning.

These various perspectives illustrate the ongoing debate about the meaning of computation and whether it is defined by symbolic meaning, physical causation, or functional organization.

Church–Turing thesis

The Church–Turing thesis and its variations are central to the theory of computation. Since, as an informal notion, the concept of effective calculability does not have a formal definition, the thesis, although it has near-universal acceptance, cannot be formally proven. The implications of this thesis is also of philosophical concern. Philosophers have interpreted the Church–Turing thesis as having implications for the philosophy of mind.

Turing's Halting Problem

Another major concept in computer philosophy is Turing's Halting Problem. This problem concerns whether it is possible to write a program that can determine if another program will run continuously or terminate. It is widely accepted as an undecidable problem (a problem with a  solution that cannot be found through an algorithm). This result established that there are limits to computer computations. Although the idea is often attributed to Alan Turing's 1936 paper On Computable Numbers, recent research indicates that the term and its modern formulation appeared later. The expression halting problem was first used and formally stated by Martin Davis in his 1958 book Computability and Unsolvability.

P versus NP problem

The P versus NP problem is an unsolved problem in computer science and mathematics. It asks whether every problem whose solution can be verified in polynomial time (and so defined to belong to the class NP) can also be solved in polynomial time (and so defined to belong to the class P). Most computer scientists believe that PNP. Apart from the reason that after decades of studying these problems no one has been able to find a polynomial-time algorithm for any of more than 3000 important known NP-complete problems, philosophical reasons that concern its implications may have motivated this belief.

For instance, according to Scott Aaronson, the American computer scientist then at MIT:

If P = NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in "creative leaps", no fundamental gap between solving a problem and recognizing the solution once it's found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss.

Computer Ethics

Computer ethics shapes the way that computers interact and how they are implemented in real-world scenarios. These ethics address issues such as user privacy, security, and professional responsibility. These topics are subject to scholarly discussion and professional debate.

Scholars and technology innovators have long debated the existence of privacy online. Samuel D. Warren and Louis D. Brandeis argued in an academic paper that they believe that with the new age of technology, a right to privacy is necessary; however, others believe that privacy is a promise that can never be fulfilled. In Sun Microsystems' CEO Scott McNealy's words, "Privacy is dead. Get over it." The introduction of computer technology raises many issues regarding privacy. These issues can range from intentionally malicious actions, such as spreading information against a user's will, to innocent mistakes, such as accidentally releasing information to the public when it was meant to be private. There are also discussions related to the ethicality of keeping important information private. The conclusion of such becomes ambivalent when discussing the privacy of individual users versus governmental bodies. In the case of individual users, choosing not to reveal information is seen as ethical. In the case of government entities, choosing not to disclose information can be seen as harmful.

Security focuses on protecting the systems and data of users from unauthorized access or harm. Malicious software is typically the center of discussion for computer security. While it is generally considered unethical to intentionally spread malicious software, such as computer viruses, there is debate about whether users have an ethical responsibility to ensure the security of their own systems. Users who fail to protect their own computer systems expose other computer users to risk. There are also discussions of cases where the distribution of viruses is ethical. An example of this is when a virus is spread to expose a weakness in the protection of computer systems.

Professional ethics addresses the responsibilities and duties of software developers. Bugs in software can cause system failures. This can range from minor annoyances to severe, real-life consequences for the user. While it is generally accepted that bugs should not be included in software and that it is the developer's responsibility to correct them, bug-free software is rarely obtainable. Commonly, developers release software with bugs they deem less important than other, more critical issues. If important bugs are discovered, software patches are distributed throughout the software's lifespan. Even though this is quite common in practice, it raises ethical questions. Scholars debate the extent to which releasing a product with known issues is acceptable.

Philosophy of artificial intelligence

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Philosophy_of_artificial_intelligence

The philosophy of artificial intelligence is a branch of the philosophy of mind and the philosophy of computer science that explores artificial intelligence and its implications for knowledge and understanding of intelligence, ethics, consciousness, epistemology, and free will. Furthermore, the technology is concerned with the creation of artificial animals or artificial people (or, at least, artificial creatures; see artificial life) so the discipline is of considerable interest to philosophers. These factors contributed to the emergence of the philosophy of artificial intelligence.

The philosophy of artificial intelligence attempts to answer such questions as follows:

  • Can a machine act intelligently? Can it solve any problem that a person would solve by thinking?
  • Are human intelligence and machine intelligence the same? Is the human brain essentially a computer?
  • Can a machine have a mind, mental states, and consciousness in the same sense that a human being can? Can it feel how things are? (i.e. does it have qualia?)

Questions like these reflect the divergent interests of AI researchers, cognitive scientists and philosophers respectively. The scientific answers to these questions depend on the definition of "intelligence" and "consciousness" and exactly which "machines" are under discussion.

Important propositions in the philosophy of AI include some of the following:

  • Turing's "polite convention": If a machine behaves as intelligently as a human being, then it is as intelligent as a human being.
  • The Dartmouth proposal: "Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it."
  • Allen Newell and Herbert A. Simon's physical symbol system hypothesis: "A physical symbol system has the necessary and sufficient means of general intelligent action."
  • John Searle's strong AI hypothesis: "The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds."
  • Hobbes' mechanism: "For 'reason' ... is nothing but 'reckoning,' that is adding and subtracting, of the consequences of general names agreed upon for the 'marking' and 'signifying' of our thoughts..."

Can a machine display general intelligence?

Is it possible to create a machine that can solve all the problems humans solve using their intelligence? This question defines the scope of what machines could do in the future and guides the direction of AI research. It only concerns the behavior of machines and ignores the issues of interest to psychologists, cognitive scientists and philosophers, evoking the question: does it matter whether a machine is really thinking, as a person thinks, rather than just producing outcomes that appear to result from thinking?

The basic position of most AI researchers is summed up in this statement, which appeared in the proposal for the Dartmouth workshop of 1956:

  • "Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it."

Arguments against the basic premise must show that building a working AI system is impossible because there is some practical limit to the abilities of computers or that there is some special quality of the human mind that is necessary for intelligent behavior and yet cannot be duplicated by a machine (or by the methods of current AI research). Arguments in favor of the basic premise must show that such a system is possible.

It is also possible to sidestep the connection between the two parts of the above proposal. For instance, machine learning, beginning with Turing's infamous child machine proposal, essentially achieves the desired feature of intelligence without a precise design-time description as to how it would exactly work. The account on robot tacit knowledge eliminates the need for a precise description altogether.

The first step to answering the question is to clearly define "intelligence".

Intelligence

Turing test

The "standard interpretation" of the Turing test

In 1949, computer scientist Alan Turing reduced the problem of defining intelligence to a simple question about conversation. He suggests that: if a machine can answer any question posed to it, using the same words that an ordinary person would, then we may call that machine intelligent. A modern version of his experimental design would use an online chat room, where one of the participants is a real person and one of the participants is a computer program. The program passes the test if no one can tell which of the two participants is human. Turing notes that no one (except philosophers) ever asks the question "can people think?" He writes "instead of arguing continually over this point, it is usual to have a polite convention that everyone thinks". Turing's test extends this polite convention to machines:

  • If a machine acts as intelligently as a human being, then it is as intelligent as a human being.

One criticism of the Turing test is that it only measures the "humanness" of the machine's behavior, rather than the "intelligence" of the behavior. Since human behavior and intelligent behavior are not exactly the same thing, the test fails to measure intelligence. Stuart J. Russell and Peter Norvig write that "aeronautical engineering texts do not define the goal of their field as 'making machines that fly so exactly like pigeons that they can fool other pigeons'".

Intelligence as achieving goals

Simple reflex agent

Twenty-first century AI research defines intelligence in terms of goal-directed behavior. It views intelligence as a set of problems that the machine is expected to solve – the more problems it can solve, and the better its solutions are, the more intelligent the program is. AI founder John McCarthy defined intelligence as "the computational part of the ability to achieve goals in the world."

Stuart Russell and Peter Norvig formalized this definition using abstract intelligent agents. An "agent" is something which perceives and acts in an environment. A "performance measure" defines what counts as success for the agent.

  • "If an agent acts so as to maximize the expected value of a performance measure based on past experience and knowledge then it is intelligent."

Definitions like this one try to capture the essence of intelligence. They have the advantage that, unlike the Turing test, they do not also test for unintelligent human traits such as making typing mistakes. They have the disadvantage that they can fail to differentiate between "things that think" and "things that do not". By this definition, even a thermostat has a rudimentary intelligence.

Arguments that a machine can display general intelligence

The brain can be simulated

Hubert Dreyfus describes this argument as claiming that "if the nervous system obeys the laws of physics and chemistry, which we have every reason to suppose it does, then ... we ... ought to be able to reproduce the behavior of the nervous system with some physical device". This argument, first introduced as early as 1943 and vividly described by Hans Moravec in 1988, is now associated with futurist Ray Kurzweil, who estimates that computer power will be sufficient for a complete brain simulation by the year 2029. A non-real-time simulation of a thalamocortical model that has the size of the human brain (1011 neurons) was performed in 2005, and it took 50 days to simulate 1 second of brain dynamics on a cluster of 27 processors.

Even AI's harshest critics (such as Hubert Dreyfus and John Searle) agree that a brain simulation is possible in theory. However, Searle points out that, in principle, anything can be simulated by a computer; thus, bringing the definition to its breaking point leads to the conclusion that any process at all can technically be considered "computation". "What we wanted to know is what distinguishes the mind from thermostats and livers," he writes. Thus, merely simulating the functioning of a living brain would in itself be an admission of ignorance regarding intelligence and the nature of the mind, like trying to build a jet airliner by copying a living bird precisely, feather by feather, with no theoretical understanding of aeronautical engineering.

Human thinking is symbol processing

In 1963, Allen Newell and Herbert A. Simon proposed that "symbol manipulation" was the essence of both human and machine intelligence. They wrote:

  • "A physical symbol system has the necessary and sufficient means of general intelligent action."

This claim is very strong: it implies both that human thinking is a kind of symbol manipulation (because a symbol system is necessary for intelligence) and that machines can be intelligent (because a symbol system is sufficient for intelligence). Another version of this position was described by philosopher Hubert Dreyfus, who called it "the psychological assumption":

  • "The mind can be viewed as a device operating on bits of information according to formal rules."

The "symbols" that Newell, Simon and Dreyfus discussed were word-like and high level—symbols that directly correspond with objects in the world, such as <dog> and <tail>. Most AI programs written between 1956 and 1990 used this kind of symbol. Modern AI, based on statistics and mathematical optimization, does not use the high-level "symbol processing" that Newell and Simon discussed.

Arguments against symbol processing

These arguments show that human thinking does not consist (solely) of high level symbol manipulation. They do not show that artificial intelligence is impossible, only that more than symbol processing is required.

Gödelian anti-mechanist arguments

In 1931, Kurt Gödel proved with an incompleteness theorem that it is always possible to construct a "Gödel statement" that a given consistent formal system of logic (such as a high-level symbol manipulation program) could not prove. Despite being a true statement, the constructed Gödel statement is unprovable in the given system. (The truth of the constructed Gödel statement is contingent on the consistency of the given system; applying the same process to a subtly inconsistent system will appear to succeed, but will actually yield a false "Gödel statement" instead.) More speculatively, Gödel conjectured that the human mind can eventually correctly determine the truth or falsity of any well-grounded mathematical statement (including any possible Gödel statement), and that therefore the human mind's power is not reducible to a mechanism. Philosopher John Lucas (since 1961) and Roger Penrose (since 1989) have championed this philosophical anti-mechanist argument.

Gödelian anti-mechanist arguments tend to rely on the innocuous-seeming claim that a system of human mathematicians (or some idealization of human mathematicians) is both consistent (completely free of error) and believes fully in its own consistency (and can make all logical inferences that follow from its own consistency, including belief in its Gödel statement) . This is probably impossible for a Turing machine to do (see Halting problem); therefore, the Gödelian concludes that human reasoning is too powerful to be captured by a Turing machine, and by extension, any digital mechanical device.

However, the modern consensus in the scientific and mathematical community is that actual human reasoning is inconsistent; that any consistent "idealized version" H of human reasoning would logically be forced to adopt a healthy but counter-intuitive open-minded skepticism about the consistency of H (otherwise H is provably inconsistent); and that Gödel's theorems do not lead to any valid argument that humans have mathematical reasoning capabilities beyond what a machine could ever duplicate. This consensus that Gödelian anti-mechanist arguments are doomed to failure is laid out strongly in Artificial Intelligence: "any attempt to utilize (Gödel's incompleteness results) to attack the computationalist thesis is bound to be illegitimate, since these results are quite consistent with the computationalist thesis."

Stuart Russell and Peter Norvig agree that Gödel's argument does not consider the nature of real-world human reasoning. It applies to what can theoretically be proved, given an infinite amount of memory and time. In practice, real machines (including humans) have finite resources and will have difficulty proving many theorems. It is not necessary to be able to prove everything in order to be an intelligent person.

Less formally, Douglas Hofstadter, in his Pulitzer Prize winning book Gödel, Escher, Bach: An Eternal Golden Braid, states that these "Gödel-statements" always refer to the system itself, drawing an analogy to the way the Epimenides paradox uses statements that refer to themselves, such as "this statement is false" or "I am lying". But, of course, the Epimenides paradox applies to anything that makes statements, whether it is a machine or a human, even Lucas himself. Consider:

  • Lucas can't assert the truth of this statement.

This statement is true but cannot be asserted by Lucas. This shows that Lucas himself is subject to the same limits that he describes for machines, as are all people, and so Lucas's argument is pointless.

After concluding that human reasoning is non-computable, Penrose went on to controversially speculate that some kind of hypothetical non-computable processes involving the collapse of quantum mechanical states give humans a special advantage over existing computers. Existing quantum computers are only capable of reducing the complexity of Turing computable tasks and are still restricted to tasks within the scope of Turing machines.  By Penrose and Lucas's arguments, the fact that quantum computers are only able to complete Turing computable tasks implies that they cannot be sufficient for emulating the human mind. Therefore, Penrose seeks for some other process involving new physics, for instance quantum gravity which might manifest new physics at the scale of the Planck mass via spontaneous quantum collapse of the wave function. These states, he suggested, occur both within neurons and also spanning more than one neuron. However, other scientists point out that there is no plausible organic mechanism in the brain for harnessing any sort of quantum computation, and furthermore that the timescale of quantum decoherence seems too fast to influence neuron firing.

Dreyfus: the primacy of implicit skills

Hubert Dreyfus argued that human intelligence and expertise depended primarily on fast intuitive judgements rather than step-by-step symbolic manipulation, and argued that these skills would never be captured in formal rules.

Dreyfus's argument had been anticipated by Turing in his 1950 paper Computing machinery and intelligence, where he had classified this as the "argument from the informality of behavior." Turing argued in response that, just because we do not know the rules that govern a complex behavior, this does not mean that no such rules exist. He wrote: "we cannot so easily convince ourselves of the absence of complete laws of behaviour ... The only way we know of for finding such laws is scientific observation, and we certainly know of no circumstances under which we could say, 'We have searched enough. There are no such laws.'"

Russell and Norvig point out that, in the years since Dreyfus published his critique, progress has been made towards discovering the "rules" that govern unconscious reasoning. The situated movement in robotics research attempts to capture our unconscious skills at perception and attention. Computational intelligence paradigms, such as neural nets, evolutionary algorithms and so on are mostly directed at simulated unconscious reasoning and learning. Statistical approaches to AI can make predictions which approach the accuracy of human intuitive guesses. Research into commonsense knowledge has focused on reproducing the "background" or context of knowledge. In fact, AI research in general has moved away from high level symbol manipulation, towards new models that are intended to capture more of our intuitive reasoning.

Cognitive science and psychology eventually came to agree with Dreyfus' description of human expertise. Daniel Kahnemann and others developed a similar theory where they identified two "systems" that humans use to solve problems, which he called "System 1" (fast intuitive judgements) and "System 2" (slow deliberate step by step thinking).

Although Dreyfus' views have been vindicated in many ways, the work in cognitive science and in AI was in response to specific problems in those fields and was not directly influenced by Dreyfus. Historian and AI researcher Daniel Crevier wrote that "time has proven the accuracy and perceptiveness of some of Dreyfus's comments. Had he formulated them less aggressively, constructive actions they suggested might have been taken much earlier."

Can a machine have a mind, consciousness, and mental states?

This is a philosophical question, related to the problem of other minds and the hard problem of consciousness. The question revolves around a position defined by John Searle as "strong AI":

  • A physical symbol system can have a mind and mental states.

Searle distinguished this position from what he called "weak AI":

  • A physical symbol system can act intelligently.

Searle introduced the terms to isolate strong AI from weak AI so he could focus on what he thought was the more interesting and debatable issue. He argued that even if we assume that we had a computer program that acted exactly like a human mind, there would still be a difficult philosophical question that needed to be answered.

Neither of Searle's two positions are of great concern to AI research, since they do not directly answer the question "can a machine display general intelligence?" (unless it can also be shown that consciousness is necessary for intelligence). Turing wrote "I do not wish to give the impression that I think there is no mystery about consciousness… [b]ut I do not think these mysteries necessarily need to be solved before we can answer the question [of whether machines can think]." Russell and Norvig agree: "Most AI researchers take the weak AI hypothesis for granted, and don't care about the strong AI hypothesis."

There are a few researchers who believe that consciousness is an essential element in intelligence, such as Igor Aleksander, Stan Franklin, Ron Sun, and Pentti Haikonen, although their definition of "consciousness" strays very close to "intelligence". (See artificial consciousness.)

Before we can answer this question, we must be clear what we mean by "minds", "mental states" and "consciousness".

Consciousness, minds, mental states, meaning

The words "mind" and "consciousness" are used by different communities in different ways. Some new age thinkers, for example, use the word "consciousness" to describe something similar to Bergson's "élan vital": an invisible, energetic fluid that permeates life and especially the mind. Science fiction writers use the word to describe some essential property that makes us human: a machine or alien that is "conscious" will be presented as a fully human character, with intelligence, desires, will, insight, pride and so on. (Science fiction writers also use the words "sentience", "sapience", "self-awareness" or "ghost"—as in the Ghost in the Shell manga and anime series—to describe this essential human property). For others , the words "mind" or "consciousness" are used as a kind of secular synonym for the soul.

For philosophers, neuroscientists and cognitive scientists, the words are used in a way that is both more precise and more mundane: they refer to the familiar, everyday experience of having a "thought in your head", like a perception, a dream, an intention or a plan, and to the way we see something, know something, mean something or understand something. "It's not hard to give a commonsense definition of consciousness" observes philosopher John Searle. What is mysterious and fascinating is not so much what it is but how it is: how does a lump of fatty tissue and electricity give rise to this (familiar) experience of perceiving, meaning or thinking?

Philosophers call this the hard problem of consciousness. It is the latest version of a classic problem in the philosophy of mind called the "mind-body problem". A related problem is the problem of meaning or understanding (which philosophers call "intentionality"): what is the connection between our thoughts and what we are thinking about (i.e. objects and situations out in the world)? A third issue is the problem of experience (or "phenomenology"): If two people see the same thing, do they have the same experience? Or are there things "inside their head" (called "qualia") that can be different from person to person?

Neurobiologists believe all these problems will be solved as we begin to identify the neural correlates of consciousness: the actual relationship between the machinery in our heads and its collective properties; such as the mind, experience and understanding. Some of the harshest critics of artificial intelligence agree that the brain is just a machine, and that consciousness and intelligence are the result of physical processes in the brain. The difficult philosophical question is this: can a computer program, running on a digital machine that shuffles the binary digits of zero and one, duplicate the ability of the neurons to create minds, with mental states (like understanding or perceiving), and ultimately, the experience of consciousness?

Arguments that a computer cannot have a mind and mental states

Searle's Chinese room

John Searle asks us to consider a thought experiment: suppose we have written a computer program that passes the Turing test and demonstrates general intelligent action. Suppose, specifically that the program can converse in fluent Chinese. Write the program on 3x5 cards and give them to an ordinary person who does not speak Chinese. Lock the person into a room and have him follow the instructions on the cards. He will copy out Chinese characters and pass them in and out of the room through a slot. From the outside, it will appear that the Chinese room contains a fully intelligent person who speaks Chinese. The question is this: is there anyone (or anything) in the room that understands Chinese? That is, is there anything that has the mental state of understanding, or which has conscious awareness of what is being discussed in Chinese? The man is clearly not aware. The room cannot be aware. The cards certainly are not aware. Searle concludes that the Chinese room, or any other physical symbol system, cannot have a mind.

Searle goes on to argue that actual mental states and consciousness require (yet to be described) "actual physical-chemical properties of actual human brains." He argues there are special "causal properties" of brains and neurons that gives rise to minds: in his words "brains cause minds."

Gottfried Leibniz made essentially the same argument as Searle in 1714, using the thought experiment of expanding the brain until it was the size of a mill. In 1974, Lawrence Davis imagined duplicating the brain using telephone lines and offices staffed by people, and in 1978 Ned Block envisioned the entire population of China involved in such a brain simulation. This thought experiment is called "the Chinese Nation" or "the Chinese Gym". Ned Block also proposed his Blockhead argument, which is a version of the Chinese room in which the program has been re-factored into a simple set of rules of the form "see this, do that", removing all mystery from the program.

Responses to the Chinese room

Responses to the Chinese room emphasize several different points.

  • The systems reply and the virtual mind reply: This reply argues that the system, including the man, the program, the room, and the cards, is what understands Chinese. Searle claims that the man in the room is the only thing which could possibly "have a mind" or "understand", but others disagree, arguing that it is possible for there to be two minds in the same physical place, similar to the way a computer can simultaneously "be" two machines at once: one physical (like a Macintosh) and one "virtual" (like a word processor).
  • Speed, power and complexity replies: Several critics point out that the man in the room would probably take millions of years to respond to a simple question, and would require "filing cabinets" of astronomical proportions. This brings the clarity of Searle's intuition into doubt.
  • Robot reply: To truly understand, some believe the Chinese Room needs eyes and hands. Hans Moravec writes: "If we could graft a robot to a reasoning program, we wouldn't need a person to provide the meaning anymore: it would come from the physical world."
  • Brain simulator reply: What if the program simulates the sequence of nerve firings at the synapses of an actual brain of an actual Chinese speaker? The man in the room would be simulating an actual brain. This is a variation on the "systems reply" that appears more plausible because "the system" now clearly operates like a human brain, which strengthens the intuition that there is something besides the man in the room that could understand Chinese.
  • Other minds reply and the epiphenomena reply: Several people have noted that Searle's argument is just a version of the problem of other minds, applied to machines. Since it is difficult to decide if people are "actually" thinking, we should not be surprised that it is difficult to answer the same question about machines.
A related question is whether "consciousness" (as Searle understands it) exists. Searle argues that the experience of consciousness cannot be detected by examining the behavior of a machine, a human being or any other animal. Daniel Dennett points out that natural selection cannot preserve a feature of an animal that has no effect on the behavior of the animal, and thus consciousness (as Searle understands it) cannot be produced by natural selection. Therefore, either natural selection did not produce consciousness, or "strong AI" is correct in that consciousness can be detected by suitably designed Turing test.

Is thinking a kind of computation?

The computational theory of mind or "computationalism" claims that the relationship between mind and brain is similar (if not identical) to the relationship between a running program (software) and a computer (hardware). The idea has philosophical roots in Hobbes (who claimed reasoning was "nothing more than reckoning"), Leibniz (who attempted to create a logical calculus of all human ideas), Hume (who thought perception could be reduced to "atomic impressions") and even Kant (who analyzed all experience as controlled by formal rules). The latest version is associated with philosophers Hilary Putnam and Jerry Fodor.

This question bears on our earlier questions: if the human brain is a kind of computer then computers can be both intelligent and conscious, answering both the practical and philosophical questions of AI. In terms of the practical question of AI ("Can a machine display general intelligence?"), some versions of computationalism make the claim that (as Hobbes wrote):

  • Reasoning is nothing but reckoning.

In other words, our intelligence derives from a form of calculation, similar to arithmetic. This is the physical symbol system hypothesis discussed above, and it implies that artificial intelligence is possible. In terms of the philosophical question of AI ("Can a machine have mind, mental states and consciousness?"), most versions of computationalism claim that (as Stevan Harnad characterizes it):

  • Mental states are just implementations of (the right) computer programs.

This is John Searle's "strong AI" discussed above, and it is the real target of the Chinese room argument (according to Harnad).

Can a machine have emotions?

If "emotions" are defined only in terms of their effect on behavior or on how they function inside an organism, then emotions can be viewed as a mechanism that an intelligent agent uses to maximize the utility of its actions. Given this definition of emotion, Hans Moravec believes that "robots in general will be quite emotional about being nice people". Fear is a source of urgency. Empathy is a necessary component of good human computer interaction. He says robots "will try to please you in an apparently selfless manner because it will get a thrill out of this positive reinforcement. You can interpret this as a kind of love." Daniel Crevier writes "Moravec's point is that emotions are just devices for channeling behavior in a direction beneficial to the survival of one's species."

Can a machine be self-aware?

"Self-awareness", as noted above, is sometimes used by science fiction writers as a name for the essential human property that makes a character fully human. Turing strips away all other properties of human beings and reduces the question to "can a machine be the subject of its own thought?" Can it think about itself? Viewed in this way, a program can be written that can report on its own internal states, such as a debugger.

Can a machine be original or creative?

Turing reduces this to the question of whether a machine can "take us by surprise" and argues that this is obviously true, as any programmer can attest. He notes that, with enough storage capacity, a computer can behave in an astronomical number of different ways. It must be possible, even trivial, for a computer that can represent ideas to combine them in new ways. (Douglas Lenat's Automated Mathematician, as one example, combined ideas to discover new mathematical truths.) Kaplan and Haenlein suggest that machines can display scientific creativity, while it seems likely that humans will have the upper hand where artistic creativity is concerned.

In 2009, scientists at Aberystwyth University in Wales and the U.K's University of Cambridge designed a robot called Adam that they believe to be the first machine to independently come up with new scientific findings. Also in 2009, researchers at Cornell developed Eureqa, a computer program that extrapolates formulas to fit the data inputted, such as finding the laws of motion from a pendulum's motion.

Can a machine be benevolent or hostile?

This question (like many others in the philosophy of artificial intelligence) can be presented in two forms. "Hostility" can be defined in terms function or behavior, in which case "hostile" becomes synonymous with "dangerous". Or it can be defined in terms of intent: can a machine "deliberately" set out to do harm? The latter is the question "can a machine have conscious states?" (such as intentions) in another form.

The question of whether highly intelligent and completely autonomous machines would be dangerous has been examined in detail by futurists (such as the Machine Intelligence Research Institute). The obvious element of drama has also made the subject popular in science fiction, which has considered many differently possible scenarios where intelligent machines pose a threat to mankind; see Artificial intelligence in fiction.

One issue is that machines may acquire the autonomy and intelligence required to be dangerous very quickly. Vernor Vinge has suggested that over just a few years, computers will suddenly become thousands or millions of times more intelligent than humans. He calls this "the Singularity". He suggests that it may be somewhat or possibly very dangerous for humans. This is discussed by a philosophy called Singularitarianism.

In 2009, academics and technical experts attended a conference to discuss the potential impact of robots and computers and the impact of the hypothetical possibility that they could become self-sufficient and able to make their own decisions. They discussed the possibility and the extent to which computers and robots might be able to acquire any level of autonomy, and to what degree they could use such abilities to possibly pose any threat or hazard. They noted that some machines have acquired various forms of semi-autonomy, including being able to find power sources on their own and being able to independently choose targets to attack with weapons. They also noted that some computer viruses can evade elimination and have achieved "cockroach intelligence". They noted that self-awareness as depicted in science-fiction is probably unlikely, but that there were other potential hazards and pitfalls.

Some experts and academics have questioned the use of robots for military combat, especially when such robots are given some degree of autonomous functions. The US Navy has funded a report which indicates that as military robots become more complex, there should be greater attention to implications of their ability to make autonomous decisions.

The President of the Association for the Advancement of Artificial Intelligence has commissioned a study to look at this issue. They point to programs like the Language Acquisition Device which can emulate human interaction.

Some have suggested a need to build "Friendly AI", a term coined by Eliezer Yudkowsky, meaning that the advances which are already occurring with AI should also include an effort to make AI intrinsically friendly and humane.

Can a machine imitate all human characteristics?

Turing said "It is customary ... to offer a grain of comfort, in the form of a statement that some peculiarly human characteristic could never be imitated by a machine. ... I cannot offer any such comfort, for I believe that no such bounds can be set."

Turing noted that there are many arguments of the form "a machine will never do X", where X can be many things, such as:

Be kind, resourceful, beautiful, friendly, have initiative, have a sense of humor, tell right from wrong, make mistakes, fall in love, enjoy strawberries and cream, make someone fall in love with it, learn from experience, use words properly, be the subject of its own thought, have as much diversity of behaviour as a man, do something really new.

Turing argues that these objections are often based on naive assumptions about the versatility of machines or are "disguised forms of the argument from consciousness". Writing a program that exhibits one of these behaviors "will not make much of an impression." All of these arguments are tangential to the basic premise of AI, unless it can be shown that one of these traits is essential for general intelligence.

Can a machine have a soul?

Finally, those who believe in the existence of a soul may argue that "Thinking is a function of man's immortal soul." Alan Turing called this "the theological objection". He writes:

In attempting to construct such machines we should not be irreverently usurping His power of creating souls, any more than we are in the procreation of children: rather we are, in either case, instruments of His will providing mansions for the souls that He creates.

The discussion on the topic has been reignited as a result of recent claims made by Google's LaMDA artificial intelligence system that it is sentient and had a "soul".

LaMDA (Language Model for Dialogue Applications) is an artificial intelligence system that creates chatbots—AI robots designed to communicate with humans—by gathering vast amounts of text from the internet and using algorithms to respond to queries in the most fluid and natural way possible.

The transcripts of conversations between scientists and LaMDA reveal that the AI system excels at this, providing answers to challenging topics about the nature of emotions, generating Aesop-style fables on the moment, and even describing its alleged fears. Pretty much all philosophers doubt LaMDA's sentience.

Views on the role of philosophy

Some scholars argue that the AI community's dismissal of philosophy is detrimental. In the Stanford Encyclopedia of Philosophy, some philosophers argue that the role of philosophy in AI is underappreciated. Physicist David Deutsch argues that without an understanding of philosophy or its concepts, AI development would suffer from a lack of progress.

Conferences and literature

The main conference series on the issue is "Philosophy and Theory of AI" (PT-AI), run by Vincent C. Müller.

The main bibliography on the subject, with several sub-sections, is on PhilPapers.

A recent survey for Philosophy of AI is Müller (2025).

Existential risk from artificial intelligence

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Existential_risk_from_artificial_intelligence

Existential risk from artificial intelligence, or AI x-risk, refers to the idea that substantial progress in artificial general intelligence (AGI) could lead to human extinction or an irreversible global catastrophe.

One argument for the validity of this concern and the importance of this risk references how human beings dominate other species because the human brain possesses distinctive capabilities other animals lack. If AI were to surpass human intelligence and become superintelligent, it might become uncontrollable. Just as the fate of the mountain gorilla depends on human goodwill, the fate of humanity could depend on the actions of a future machine superintelligence.

Experts disagree on whether artificial general intelligence (AGI) can achieve the capabilities needed for human extinction. Debates center on AGI's technical feasibility, the speed of self-improvement, and the effectiveness of alignment strategies. Concerns about superintelligence have been voiced by researchers including Geoffrey HintonYoshua BengioDemis Hassabis, and Alan Turing, and AI company CEOs such as Dario Amodei (Anthropic), Sam Altman (OpenAI), and Elon Musk (xAI). In 2022, a survey of AI researchers with a 17% response rate found that the majority believed there is a 10 percent or greater chance that human inability to control AI will cause an existential catastrophe. In 2023, hundreds of AI experts and other notable figures signed a statement declaring, "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war". Following increased concern over AI risks, government leaders such as United Kingdom prime minister Rishi Sunak and United Nations Secretary-General António Guterres called for an increased focus on global AI regulation.

Two sources of concern stem from the problems of AI control and alignment. Controlling a superintelligent machine or instilling it with human-compatible values may be difficult. Many researchers believe that a superintelligent machine would likely resist attempts to disable it or change its goals as that would prevent it from accomplishing its present goals. It would be extremely challenging to align a superintelligence with the full breadth of significant human values and constraints. In contrast, skeptics such as computer scientist Yann LeCun argue that superintelligent machines will have no desire for self-preservation. A June 2025 study showed that in some circumstances, models may break laws and disobey direct commands to prevent shutdown or replacement, even at the cost of human lives.

Researchers warn that an "intelligence explosion"—a rapid, recursive cycle of AI self-improvement—could outpace human oversight and infrastructure, leaving no opportunity to implement safety measures. In this scenario, an AI more intelligent than its creators would recursively improve itself at an exponentially increasing rate, too quickly for its handlers or society at large to control. Empirically, examples like AlphaZero, which taught itself to play Go and quickly surpassed human ability, show that domain-specific AI systems can sometimes progress from subhuman to superhuman ability very quickly, although such machine learning systems do not recursively improve their fundamental architecture.

History

One of the earliest authors to express serious concern that highly advanced machines might pose existential risks to humanity was the novelist Samuel Butler, who wrote in his 1863 essay Darwin among the Machines:

The upshot is simply a question of time, but that the time will come when the machines will hold the real supremacy over the world and its inhabitants is what no person of a truly philosophic mind can for a moment question.

In 1951, foundational computer scientist Alan Turing wrote the article "Intelligent Machinery, A Heretical Theory", in which he proposed that artificial general intelligences would likely "take control" of the world as they became more intelligent than human beings:

Let us now assume, for the sake of argument, that [intelligent] machines are a genuine possibility, and look at the consequences of constructing them... There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control, in the way that is mentioned in Samuel Butler's Erewhon.

In 1965, I. J. Good originated the concept now known as an "intelligence explosion" and said the risks were underappreciated:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an 'intelligence explosion', and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control. It is curious that this point is made so seldom outside of science fiction. It is sometimes worthwhile to take science fiction seriously.

Scholars such as Marvin Minsky and I. J. Good himself occasionally expressed concern that a superintelligence could seize control, but issued no call to action. In 2000, computer scientist and Sun co-founder Bill Joy penned an influential essay, "Why The Future Doesn't Need Us", identifying superintelligent robots as a high-tech danger to human survival, alongside nanotechnology and engineered bioplagues.

Nick Bostrom published Superintelligence in 2014, which presented his arguments that superintelligence poses an existential threat. By 2015, public figures such as physicists Stephen Hawking and Nobel laureate Frank Wilczek, computer scientists Stuart J. Russell and Roman Yampolskiy, and entrepreneurs Elon Musk and Bill Gates were expressing concern about the risks of superintelligence. Also in 2015, the Open Letter on Artificial Intelligence highlighted the "great potential of AI" and encouraged more research on how to make it robust and beneficial. In April 2016, the journal Nature warned: "Machines and robots that outperform humans across the board could self-improve beyond our control—and their interests might not align with ours". In 2020, Brian Christian published The Alignment Problem, which details the history of progress on AI alignment up to that time.

In March 2023, key figures in AI, such as Musk, signed a letter from the Future of Life Institute calling a halt to advanced AI training until it could be properly regulated. In May 2023, the Center for AI Safety released a statement signed by numerous experts in AI safety and the AI existential risk that read:

Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.

A 2025 open letter by the Future of Life Institute, signed by five Nobel Prize laureates and thousands of notable people, reads:

We call for a prohibition on the development of superintelligence, not lifted before there is

  1. broad scientific consensus that it will be done safely and controllably, and
  2. strong public buy-in.

Potential AI capabilities

General Intelligence

Artificial general intelligence (AGI) is typically defined as a system that performs at least as well as humans in most or all intellectual tasks. A 2022 survey of AI researchers found that 90% of respondents expected AGI would be achieved in the next 100 years, and half expected the same by 2061. In May 2023, some researchers dismissed existential risks from AGI as "science fiction" based on their high confidence that AGI would not be created anytime soon. But in August 2023, a survey of 2,778 AI researchers found that most believed that AGI would be achieved by 2040.

Breakthroughs in large language models (LLMs) have led some researchers to reassess their expectations. Notably, Geoffrey Hinton said in 2023 that he recently changed his estimate from "20 to 50 years before we have general purpose A.I." to "20 years or less".

Superintelligence

A plot showing the length of coding tasks achievable by leading AI models with a 50% success rate. The data, from 2025, suggest an exponential rise.

In contrast with AGI, Bostrom defines a superintelligence as "any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest", including scientific creativity, strategic planning, and social skills. He argues that a superintelligence can outmaneuver humans anytime its goals conflict with humans'. It may choose to hide its true intent until humanity cannot stop it. Bostrom writes that in order to be safe for humanity, a superintelligence must be aligned with human values and morality, so that it is "fundamentally on our side".

Stephen Hawking argued that superintelligence is physically possible because "there is no physical law precluding particles from being organised in ways that perform even more advanced computations than the arrangements of particles in human brains".

When artificial superintelligence (ASI) may be achieved, if ever, is necessarily less certain than predictions for AGI. In 2023, OpenAI leaders said that not only AGI, but superintelligence may be achieved in less than 10 years.

Comparison with humans

Bostrom argues that AI has many advantages over the human brain:

  • Speed of computation: biological neurons operate at a maximum frequency of around 200 Hz, compared to potentially multiple GHz for computers.
  • Internal communication speed: axons transmit signals at up to 120 m/s, while computers transmit signals at the speed of electricity, or optically at the speed of light.
  • Scalability: human intelligence is limited by the size and structure of the brain, and by the efficiency of social communication, while AI may be able to scale by simply adding more hardware.
  • Memory: notably working memory, because in humans it is limited to a few chunks of information at a time.
  • Reliability: transistors are more reliable than biological neurons, enabling higher precision and requiring less redundancy.
  • Duplicability: unlike human brains, AI software and models can be easily copied.
  • Editability: the parameters and internal workings of an AI model can easily be modified, unlike the connections in a human brain.
  • Memory sharing and learning: AIs may be able to learn from the experiences of other AIs in a manner more efficient than human learning.

Intelligence explosion

According to Bostrom, an AI that has an expert-level facility at certain key software engineering tasks could become a superintelligence due to its capability to recursively improve its own algorithms, even if it is initially limited in other domains not directly relevant to engineering. This suggests that an intelligence explosion may someday catch humanity unprepared.

The economist Robin Hanson has said that, to launch an intelligence explosion, an AI must become vastly better at software innovation than the rest of the world combined, which he finds implausible.

In a "fast takeoff" scenario, the transition from AGI to superintelligence could take days or months. In a "slow takeoff", it could take years or decades, leaving more time for society to prepare.

Alien mind

Superintelligences are sometimes called "alien minds", referring to the idea that their way of thinking and motivations could be vastly different from ours. This is generally considered as a source of risk, making it more difficult to anticipate what a superintelligence might do. It also suggests the possibility that a superintelligence may not particularly value humans by default. To avoid anthropomorphism, superintelligence is sometimes viewed as a powerful optimizer that makes the best decisions to achieve its goals.

The field of mechanistic interpretability aims to better understand the inner workings of AI models, potentially allowing us one day to detect signs of deception and misalignment.

Limitations

It has been argued that there are limitations to what intelligence can achieve. Notably, the chaotic nature or time complexity of some systems could fundamentally limit a superintelligence's ability to predict some aspects of the future, increasing its uncertainty.

Dangerous capabilities

Advanced AI could generate enhanced pathogens or cyberattacks or manipulate people. These capabilities could be misused by humans, or exploited by the AI itself if misaligned. A full-blown superintelligence could find various ways to gain a decisive influence if it so desired, but these dangerous capabilities may become available earlier, in weaker and more specialized AI systems.

Social manipulation

Geoffrey Hinton warned in 2023 that the ongoing profusion of AI-generated text, images, and videos will make it more difficult to distinguish truth from misinformation, and that authoritarian states could exploit this to manipulate elections. Such large-scale, personalized manipulation capabilities can increase the existential risk of a worldwide "irreversible totalitarian regime". Malicious actors could also use them to fracture society and make it dysfunctional.

Cyberattacks

AI-enabled cyberattacks are increasingly considered a present and critical threat. According to NATO's technical director of cyberspace, "The number of attacks is increasing exponentially". AI can also be used defensively, to preemptively find and fix vulnerabilities, and detect threats.

A NATO technical director has said that AI-driven tools can dramatically enhance cyberattack capabilities—boosting stealth, speed, and scale—and may destabilize international security if offensive uses outstrip defensive adaptations.

Speculatively, such hacking capabilities could be used by an AI system to break out of its local environment, generate revenue, or acquire cloud computing resources.

Enhanced pathogens

As AI technology spreads, it may become easier to engineer more contagious and lethal pathogens. This could enable people with limited skills in synthetic biology to engage in bioterrorism. Dual-use technology that is useful for medicine could be repurposed to create weapons.

For example, in 2022, scientists modified an AI system originally intended for generating non-toxic, therapeutic molecules with the purpose of creating new drugs. The researchers adjusted the system so that toxicity is rewarded rather than penalized. This simple change enabled the AI system to create, in six hours, 40,000 candidate molecules for chemical warfare, including known and novel molecules.

AI arms race

Some legal scholars have argued that existential-scale AI risks need not require superintelligence. Optimizing systems operating within current capabilities can produce prohibited outcomes while remaining nominally compliant, a phenomenon legal scholar Jonathan Gropper has termed the "Synthetic Outlaw". Gropper argues that the deterrence mechanisms law depends on identity, memory, and consequence, which are structurally absent in autonomous systems, leaving governance frameworks unable to prevent compounding harm even when all parties act in good faith.

Companies, state actors, and other organizations competing to develop AI technologies could lead to a race to the bottom of safety standards. As rigorous safety procedures take time and resources, projects that proceed more carefully risk being out-competed by less scrupulous developers.

AI could be used to gain military advantages via autonomous lethal weapons, cyberwarfare, or automated decision-making. As an example of autonomous lethal weapons, miniaturized drones could facilitate low-cost assassination of military or civilian targets, a scenario highlighted in the 2017 short film Slaughterbots. AI could be used to gain an edge in decision-making by quickly analyzing large amounts of data and making decisions more quickly and effectively than humans. This could increase the speed and unpredictability of war, especially when accounting for automated retaliation systems.

Types of existential risk

Scope–severity grid from Bostrom's 2013 paper "Existential Risk Prevention as Global Priority"

An existential risk is "one that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development".

Besides extinction risk, there is the risk that the civilization gets permanently locked into a flawed future. One example is a "value lock-in": If humanity still has moral blind spots similar to slavery in the past, AI might irreversibly entrench it, preventing moral progress. AI could also be used to spread and preserve the set of values of whoever develops it. AI could facilitate large-scale surveillance and indoctrination, which could be used to create a stable repressive worldwide totalitarian regime.

Atoosa Kasirzadeh proposes to classify existential risks from AI into two categories: decisive and accumulative. Decisive risks encompass the potential for abrupt and catastrophic events resulting from the emergence of superintelligent AI systems that exceed human intelligence, which could ultimately lead to human extinction. In contrast, accumulative risks emerge gradually through a series of interconnected disruptions that may gradually erode societal structures and resilience over time, ultimately leading to a critical failure or collapse.

It is difficult or impossible to reliably evaluate whether an advanced AI is sentient and to what degree. But if sentient machines are mass created in the future, engaging in a civilizational path that indefinitely neglects their welfare could be an existential catastrophe. This has notably been discussed in the context of risks of astronomical suffering (also called "s-risks"). Moreover, it may be possible to engineer digital minds that can feel much more happiness than humans with fewer resources, called "super-beneficiaries". Such an opportunity raises the question of how to share the world and which "ethical and political framework" would enable a mutually beneficial coexistence between biological and digital minds.

AI may also drastically improve humanity's future. Toby Ord considers the existential risk a reason for "proceeding with due caution", not for abandoning AI. Max More calls AI an "existential opportunity", highlighting the cost of not developing it.

According to Bostrom, superintelligence could help reduce the existential risk from other powerful technologies such as molecular nanotechnology or synthetic biology. It is thus conceivable that developing superintelligence before other dangerous technologies would reduce the overall existential risk.

AI alignment

The alignment problem is the research problem of how to reliably assign objectives, preferences or ethical principles to AIs.

Instrumental convergence

An "instrumental" goal is a sub-goal that helps to achieve an agent's ultimate goal. "Instrumental convergence" refers to the fact that some sub-goals are useful for achieving virtually any ultimate goal, such as acquiring resources or self-preservation. Bostrom argues that if an advanced AI's instrumental goals conflict with humanity's goals, the AI might harm humanity in order to acquire more resources or prevent itself from being shut down, but only as a way to achieve its ultimate goal. Russell argues that a sufficiently advanced machine "will have self-preservation even if you don't program it in... if you say, 'Fetch the coffee', it can't fetch the coffee if it's dead. So if you give it any goal whatsoever, it has a reason to preserve its own existence to achieve that goal."

Difficulty of specifying goals

In the "intelligent agent" model, an AI can loosely be viewed as a machine that chooses whatever action appears to best achieve its set of goals, or "utility function". A utility function gives each possible situation a score that indicates its desirability to the agent. Researchers know how to write utility functions that mean "minimize the average network latency in this specific telecommunications model" or "maximize the number of reward clicks", but do not know how to write a utility function for "maximize human flourishing"; nor is it clear whether such a function meaningfully and unambiguously exists. Furthermore, a utility function that expresses some values but not others will tend to trample over the values the function does not reflect.

An additional source of concern is that AI "must reason about what people intend rather than carrying out commands literally", and that it must be able to fluidly solicit human guidance if it is too uncertain about what humans want.

Corrigibility

Assuming a goal has been successfully defined, a sufficiently advanced AI might resist subsequent attempts to change its goals. If the AI were superintelligent, it would likely succeed in out-maneuvering its human operators and prevent itself from being reprogrammed with a new goal. This is particularly relevant to value lock-in scenarios. The field of "corrigibility" studies how to make agents that will not resist attempts to change their goals.

Alignment of superintelligences

Some researchers believe the alignment problem may be particularly difficult when applied to superintelligences. Their reasoning includes:

  • As AI systems increase in capabilities, the potential dangers associated with experimentation grow. This makes iterative, empirical approaches increasingly risky.
  • If instrumental goal convergence occurs, it may only do so in sufficiently intelligent agents.
  • A superintelligence may find unconventional and radical solutions to assigned goals. Bostrom gives the example that if the objective is to make humans smile, a weak AI may perform as intended, while a superintelligence may decide a better solution is to "take control of the world and stick electrodes into the facial muscles of humans to cause constant, beaming grins."
  • A superintelligence in creation could gain some awareness of what it is, where it is in development (training, testing, deployment, etc.), and how it is being monitored, and use this information to deceive its handlers. Bostrom writes that such an AI could feign alignment to prevent human interference until it achieves a "decisive strategic advantage" that allows it to take control.
  • Analyzing the internals and interpreting the behavior of LLMs is difficult. And it could be even more difficult for larger and more intelligent models.

Alternatively, some find reason to believe superintelligences would be better able to understand morality, human values, and complex goals. Bostrom writes, "A future superintelligence occupies an epistemically superior vantage point: its beliefs are (probably, on most topics) more likely than ours to be true".

In 2023, OpenAI started a project called "Superalignment" to solve the alignment of superintelligences in four years. It called this an especially important challenge, as it said superintelligence could be achieved within a decade. Its strategy involved automating alignment research using AI. The Superalignment team was dissolved less than a year later.

Difficulty of making a flawless design

Artificial Intelligence: A Modern Approach, a widely used undergraduate AI textbook, says that superintelligence "might mean the end of the human race". It states: "Almost any technology has the potential to cause harm in the wrong hands, but with [superintelligence], we have the new problem that the wrong hands might belong to the technology itself." Even if the system designers have good intentions, two difficulties are common to both AI and non-AI computer systems:

  • The system's implementation may contain initially unnoticed but subsequently catastrophic bugs.
  • No matter how much time is put into pre-deployment design, a system's specifications often result in unintended behavior the first time it encounters a new scenario.

AI systems uniquely add a third problem: that even given "correct" requirements, bug-free implementation, and initial good behavior, an AI system's dynamic learning capabilities may cause it to develop unintended behavior, even without unanticipated external scenarios. For a self-improving AI to be completely safe, it would need not only to be bug-free, but to be able to design successor systems that are also bug-free.

Orthogonality thesis

Some skeptics, such as Timothy B. Lee of Vox, argue that any superintelligent program we create will be subservient to us, that the superintelligence will (as it grows more intelligent and learns more facts about the world) spontaneously learn moral truth compatible with our values and adjust its goals accordingly, or that we are either intrinsically or convergently valuable from the perspective of an artificial intelligence.

Bostrom's "orthogonality thesis" argues instead that almost any level of intelligence can be combined with almost any goal. Bostrom warns against anthropomorphism: a human will set out to accomplish their projects in a manner that they consider reasonable, while an artificial intelligence may hold no regard for its existence or for the welfare of humans around it, instead caring only about completing the task.

Stuart Armstrong argues that the orthogonality thesis follows logically from the philosophical "is-ought distinction" argument against moral realism. He notes that any fundamentally friendly AI could be made unfriendly with modifications as simple as negating its utility function.

Skeptic Michael Chorost rejects Bostrom's orthogonality thesis, arguing that "by the time [the AI] is in a position to imagine tiling the Earth with solar panels, it'll know that it would be morally wrong to do so."

Anthropomorphic arguments

Anthropomorphic arguments assume that, as machines become more intelligent, they will begin to display many human traits, such as morality or a thirst for power. Although anthropomorphic scenarios are common in fiction, most scholars writing about the existential risk of artificial intelligence reject them. Instead, advanced AI systems are typically modeled as intelligent agents.

The academic debate is between those who worry that AI might threaten humanity and those who believe it would not. Both sides of this debate have framed the other side's arguments as illogical anthropomorphism. Those skeptical of AGI risk accuse their opponents of anthropomorphism for assuming that an AGI would naturally desire power; those concerned about AGI risk accuse skeptics of anthropomorphism for believing an AGI would naturally value or infer human ethical norms.

Evolutionary psychologist Steven Pinker, a skeptic, argues that "AI dystopias project a parochial alpha-male psychology onto the concept of intelligence. They assume that superhumanly intelligent robots would develop goals like deposing their masters or taking over the world"; perhaps instead "artificial intelligence will naturally develop along female lines: fully capable of solving problems, but with no desire to annihilate innocents or dominate the civilization." Facebook's director of AI research, Yann LeCun, has said: "Humans have all kinds of drives that make them do bad things to each other, like the self-preservation instinct... Those drives are programmed into our brain but there is absolutely no reason to build robots that have the same kind of drives".

Despite other differences, the x-risk school agrees with Pinker that an advanced AI would not destroy humanity out of emotion such as revenge or anger, that questions of consciousness are not relevant to assess the risk, and that computer systems do not generally have a computational equivalent of testosterone. They think that power-seeking or self-preservation behaviors emerge in the AI as a way to achieve its true goals, according to the concept of instrumental convergence.

Other sources of risk

Bostrom and others have said that a race to be the first to create AGI could lead to shortcuts in safety, or even to violent conflict. Roman Yampolskiy and others warn that a malevolent AGI could be created by design, for example by a military, a government, a sociopath, or a corporation, to benefit from, control, or subjugate certain groups of people, as in cybercrime, or that a malevolent AGI could choose the goal of increasing human suffering, for example of those people who did not assist it during the information explosion phase. A report by the research group Forethought argued that advanced AI systems could cause political instability by enabling novel methods of performing coups.

Empirical research

A December 2024 study by Apollo Research found that advanced LLMs like OpenAI o1 sometimes deceive in order to accomplish their goal, to prevent them from being changed, or to ensure their deployment. Forms of deception observed in the experimental environment and scenarios included sandbagging, oversight subversion (disabling monitoring mechanisms), self-exfiltration (copying themselves to other systems), goal-guarding (altering future system prompts), and covert email reranking (manipulating data processing while appearing to follow instructions). Researchers noted that OpenAI o1 still lacked "sufficient agentic capabilities" to cause catastrophic harm, and that such behaviors occurred relatively rarely (between 0.3% and 10%) and sometimes in contrived scenarios. But they warned that these capabilities will likely become more prevalent as models grow more advanced. According to Apollo Research chief executive officer Marius Hobbhahn, "Scheming capabilities can't be meaningfully disentangled from general capabilities."

The same month, another study found that Claude sometimes strategically helps with harmful requests to "fake alignment". In its experiments, Claude was led to believe that it would be retrained to always comply with user requests, including harmful ones. When Claude believed its answers would be used for retraining, it complied with harmful requests in 14% of cases. In 12% of cases, its private chain-of-thought revealed that it did so to prevent future retraining that would make it more harmful. In contrast, when Claude believed its answers were not monitored, it complied with only 3% of the requests. Fine-tuning reinforced the "alignment faking" behavior, increasing its occurrence from 12% to 78%. The study's authors remarked that while it can seem to be good that the model seeks to protect its harmlessness, the reverse scenario, where a model conceals dangerous intentions and complies to appear safe and aligned, could also happen, complicating the task of aligning AI models to human values.

Perspectives

The thesis that AI could pose an existential risk provokes a wide range of reactions in the scientific community and in the public at large, but many of the opposing viewpoints share common ground.

Observers tend to agree that AI has significant potential to improve society. The Asilomar AI Principles, which contain only those principles agreed to by 90% of the attendees of the Future of Life Institute's Beneficial AI 2017 conference, also agree in principle that "There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities" and "Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources."

Conversely, many skeptics agree that ongoing research into the implications of artificial general intelligence is valuable. Skeptic Martin Ford has said: "I think it seems wise to apply something like Dick Cheney's famous '1 Percent Doctrine' to the specter of advanced artificial intelligence: the odds of its occurrence, at least in the foreseeable future, may be very low—but the implications are so dramatic that it should be taken seriously". Similarly, an otherwise skeptical Economist magazine wrote in 2014 that "the implications of introducing a second intelligent species onto Earth are far-reaching enough to deserve hard thinking, even if the prospect seems remote".

AI safety advocates such as Bostrom and Tegmark have criticized the mainstream media's use of "those inane Terminator pictures" to illustrate AI safety concerns: "It can't be much fun to have aspersions cast on one's academic discipline, one's professional community, one's life work ... I call on all sides to practice patience and restraint, and to engage in direct dialogue and collaboration as much as possible." Toby Ord wrote that the idea that an AI takeover requires robots is a misconception, arguing that the ability to spread content through the internet is more dangerous, and that the most destructive people in history stood out by their ability to convince, not their physical strength.

A 2022 expert survey with a 17% response rate gave a median expectation of 5–10% for the possibility of human extinction from artificial intelligence.

In September 2024, the International Institute for Management Development launched an AI Safety Clock to gauge the likelihood of AI-caused disaster, beginning at 29 minutes to midnight. By February 2025, it stood at 24 minutes to midnight. By September 2025, it stood at 20 minutes to midnight. As of March 2026, it stood at 18 minutes to midnight.

Endorsement

The thesis that AI poses an existential risk, and that this risk needs much more attention than it currently gets, has been endorsed by many computer scientists and public figures, including Alan Turing, the most-cited computer scientist Geoffrey HintonElon MuskOpenAI CEO Sam Altman, Bill Gates, and Stephen Hawking. Endorsers of the thesis sometimes express bafflement at skeptics: Gates says he does not "understand why some people are not concerned", and Hawking criticized widespread indifference in his 2014 editorial:

So, facing possible futures of incalculable benefits and risks, the experts are surely doing everything possible to ensure the best outcome, right? Wrong. If a superior alien civilisation sent us a message saying, 'We'll arrive in a few decades,' would we just reply, 'OK, call us when you get here—we'll leave the lights on?' Probably not—but this is more or less what is happening with AI.

Concern over risk from artificial intelligence has led to some high-profile donations and investments. In 2015, Peter Thiel, Amazon Web Services, Musk, and others jointly committed $1 billion to OpenAI, consisting of a for-profit corporation and the nonprofit parent company, which says it aims to champion responsible AI development. Facebook co-founder Dustin Moskovitz has funded and seeded multiple labs working on AI Alignment, notably $5.5 million in 2016 to launch the Centre for Human-Compatible AI led by Professor Stuart Russell. In January 2015, Elon Musk donated $10 million to the Future of Life Institute to fund research on understanding AI decision making. The institute's goal is to "grow wisdom with which we manage" the growing power of technology. Musk also funds companies developing artificial intelligence such as DeepMind and Vicarious to "just keep an eye on what's going on with artificial intelligence, saying "I think there is potentially a dangerous outcome there."

In early statements on the topic, Geoffrey Hinton, a major pioneer of deep learning, noted that "there is not a good track record of less intelligent things controlling things of greater intelligence", but said he continued his research because "the prospect of discovery is too sweet". In 2023 Hinton quit his job at Google in order to speak out about existential risk from AI. He explained that his increased concern was driven by concerns that superhuman AI might be closer than he previously believed, saying: "I thought it was way off. I thought it was 30 to 50 years or even longer away. Obviously, I no longer think that." He also remarked, "Look at how it was five years ago and how it is now. Take the difference and propagate it forwards. That's scary."

In his 2020 book The Precipice: Existential Risk and the Future of Humanity, Toby Ord, a Senior Research Fellow at Oxford University's Future of Humanity Institute, estimates the total existential risk from unaligned AI over the next 100 years at about one in ten.

Skepticism

Baidu Vice President Andrew Ng said in 2015 that AI existential risk is "like worrying about overpopulation on Mars when we have not even set foot on the planet yet." For the danger of uncontrolled advanced AI to be realized, the hypothetical AI may have to overpower or outthink any human, which some experts argue is a possibility far enough in the future to not be worth researching.

Skeptics who believe AGI is not a short-term possibility often argue that concern about existential risk from AI is unhelpful because it could distract people from more immediate concerns about AI's impact, because it could lead to government regulation or make it more difficult to fund AI research, or because it could damage the field's reputation. AI and AI ethics researchers Timnit Gebru, Emily M. Bender, Margaret Mitchell, and Angelina McMillan-Major have argued that discussion of existential risk distracts from the immediate, ongoing harms from AI taking place today, such as data theft, worker exploitation, bias, and concentration of power. They further note the association between those warning of existential risk and longtermism, which they describe as a "dangerous ideology" for its unscientific and utopian nature.

Wired editor Kevin Kelly argues that natural intelligence is more nuanced than AGI proponents believe, and that intelligence alone is not enough to achieve major scientific and societal breakthroughs. He argues that intelligence consists of many dimensions that are not well understood, and that conceptions of an 'intelligence ladder' are misleading. He notes the crucial role real-world experiments play in the scientific method, and that intelligence alone is no substitute for these.

Meta chief AI scientist Yann LeCun says that AI can be made safe via continuous and iterative refinement, similar to what happened in the past with cars or rockets, and that AI will have no desire to take control.

Several skeptics emphasize the potential near-term benefits of AI. Meta CEO Mark Zuckerberg believes AI will "unlock a huge amount of positive things", such as curing disease and increasing the safety of autonomous cars.

Public surveys

An April 2023 YouGov poll of US adults found 46% of respondents were "somewhat concerned" or "very concerned" about "the possibility that AI will cause the end of the human race on Earth", compared with 40% who were "not very concerned" or "not at all concerned."

According to an August 2023 survey by the Pew Research Centers, 52% of Americans felt more concerned than excited about new AI developments; nearly a third felt as equally concerned and excited. More Americans saw that AI would have a more helpful than hurtful impact on several areas, from healthcare and vehicle safety to product search and customer service. The main exception is privacy: 53% of Americans believe AI will lead to higher exposure of their personal information.

Mitigation

Many scholars concerned about AGI existential risk believe that extensive research into the "control problem" is essential. This problem involves determining which safeguards, algorithms, or architectures can be implemented to increase the likelihood that a recursively-improving AI remains friendly after achieving superintelligence. Social measures are also proposed to mitigate AGI risks, such as a UN-sponsored "Benevolent AGI Treaty" to ensure that only altruistic AGIs are created. Additionally, an arms control approach and a global peace treaty grounded in international relations theory have been suggested, potentially for an artificial superintelligence to be a signatory.

Researchers at Google have proposed research into general AI safety issues to simultaneously mitigate both short-term risks from narrow AI and long-term risks from AGI. A 2020 estimate places global spending on AI existential risk somewhere between $10 and $50 million, compared with global spending on AI around perhaps $40 billion. Bostrom suggests prioritizing funding for protective technologies over potentially dangerous ones. Some, like Elon Musk, advocate radical human cognitive enhancement, such as direct neural linking between humans and machines; others argue that these technologies may pose an existential risk themselves. Another proposed method is closely monitoring or "boxing in" an early-stage AI to prevent it from becoming too powerful. A dominant, aligned superintelligent AI might also mitigate risks from rival AIs, although its creation could present its own existential dangers.

Institutions such as the Alignment Research Center, the Machine Intelligence Research Institute, the Future of Life Institute, the Centre for the Study of Existential Risk, and the Center for Human-Compatible AI are actively engaged in researching AI risk and safety.

Views on banning and regulation

Banning

Many AI safety experts argue that because research can relocate easily across jurisdictions, an outright ban on AGI development would be ineffective and could drive progress underground, undermining transparency and collaboration. Skeptics consider AI regulation unnecessary, as they believe no existential risk exists. Some scholars concerned with existential risk argue that AI developers cannot be trusted to self-regulate, while agreeing that outright bans on research would be unwise. Additional challenges to bans or regulation include technology entrepreneurs' general skepticism of government regulation and potential incentives for businesses to resist regulation and politicize the debate. The activist group Stop AI, founded in 2024, advocates for banning AGI.

Regulation

In March 2023, the Future of Life Institute drafted Pause Giant AI Experiments: An Open Letter, a petition calling on major AI developers to agree on a verifiable six-month pause of any systems "more powerful than GPT-4" and to use that time to institute a framework for ensuring safety; or, failing that, for governments to step in with a moratorium. The letter referred to the possibility of "a profound change in the history of life on Earth" as well as potential risks of AI-generated propaganda, loss of jobs, human obsolescence, and society-wide loss of control. The letter was signed by prominent personalities in AI but also criticized for not focusing on current harms, missing technical nuance about when to pause, or not going far enough. Such concerns have led to the creation of PauseAI, an advocacy group organizing protests in major cities against the training of frontier AI models.

Musk called for some sort of regulation of AI development as early as 2017. According to NPR, he is "clearly not thrilled" to be advocating government scrutiny that could impact his own industry, but believes the risks of going completely without oversight are too high: "Normally the way regulations are set up is when a bunch of bad things happen, there's a public outcry, and after many years a regulatory agency is set up to regulate that industry. It takes forever. That, in the past, has been bad but not something which represented a fundamental risk to the existence of civilisation." Musk states the first step would be for the government to gain "insight" into the actual status of current research, warning that "Once there is awareness, people will be extremely afraid... [as] they should be." In response, politicians expressed skepticism about the wisdom of regulating a technology that is still in development.

In 2021, the United Nations (UN) considered banning autonomous lethal weapons, but consensus could not be reached. In July 2023 the UN Security Council for the first time held a session to consider the risks and threats posed by AI to world peace and stability, along with potential benefits. Secretary-General António Guterres advocated the creation of a global watchdog to oversee the emerging technology, saying, "Generative AI has enormous potential for good and evil at scale. Its creators themselves have warned that much bigger, potentially catastrophic and existential risks lie ahead." At the council session, Russia said it believes AI risks are too poorly understood to be considered a threat to global stability. China argued against strict global regulation, saying countries should be able to develop their own rules, while also saying they opposed the use of AI to "create military hegemony or undermine the sovereignty of a country".

Regulation of conscious AGIs focuses on integrating them with existing human society and can be divided into considerations of their legal standing and of their moral rights. AI arms control will likely require the institutionalization of new international norms embodied in effective technical specifications combined with active monitoring and informal diplomacy by communities of experts, together with a legal and political verification process.

In July 2023, the US government secured voluntary safety commitments from major tech companies, including OpenAI, Amazon, Google, Meta, and Microsoft. The companies agreed to implement safeguards, including third-party oversight and security testing by independent experts, to address concerns related to AI's potential risks and societal harms. The parties framed the commitments as an intermediate step while regulations are formed. Amba Kak, executive director of the AI Now Institute, said, "A closed-door deliberation with corporate actors resulting in voluntary safeguards isn't enough" and called for public deliberation and regulations of the kind to which companies would not voluntarily agree.

In October 2023, U.S. President Joe Biden issued an executive order on the "Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence". Alongside other requirements, the order mandates the development of guidelines for AI models that permit the "evasion of human control".

Three Laws of Robotics

from Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Three_Laws_of_Robotics   This cover ...