A Medley of Potpourri

Sunday, May 16, 2021

Superintelligence: Paths, Dangers, Strategies

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Superintelligence:_Paths,_Dangers,_Strategies

Superintelligence:
Paths, Dangers, Strategies
Author	Nick Bostrom
Country	United Kingdom
Language	English
Subject	Artificial intelligence
Genre	Philosophy, popular science
Publisher	Oxford University Press
Publication date	July 3, 2014 (UK) September 1, 2014 (US)
Media type	Print, e-book, audiobook
Pages	352 pp.
ISBN	978-0199678112
Preceded by	Global Catastrophic Risks

Superintelligence: Paths, Dangers, Strategies is a 2014 book by the Swedish philosopher Nick Bostrom from the University of Oxford. It argues that if machine brains surpass human brains in general intelligence, then this new superintelligence could replace humans as the dominant lifeform on Earth. Sufficiently intelligent machines could improve their own capabilities faster than human computer scientists, and the outcome could be an existential catastrophe for humans.

Bostrom's book has been translated into many languages.

Synopsis

It is unknown whether human-level artificial intelligence will arrive in a matter of years, later this century, or not until future centuries. Regardless of the initial timescale, once human-level machine intelligence is developed, a "superintelligent" system that "greatly exceeds the cognitive performance of humans in virtually all domains of interest" would, most likely, follow surprisingly quickly. Such a superintelligence would be very difficult to control or restrain.

While the ultimate goals of superintelligences can vary greatly, a functional superintelligence will spontaneously generate, as natural subgoals, "instrumental goals" such as self-preservation and goal-content integrity, cognitive enhancement, and resource acquisition. For example, an agent whose sole final goal is to solve the Riemann hypothesis (a famous unsolved, mathematical conjecture) could create and act upon a subgoal of transforming the entire Earth into some form of computronium (hypothetical material optimized for computation) to assist in the calculation. The superintelligence would proactively resist any outside attempts to turn the superintelligence off or otherwise prevent its subgoal completion. In order to prevent such an existential catastrophe, it is necessary to successfully solve the "AI control problem" for the first superintelligence. The solution might involve instilling the superintelligence with goals that are compatible with human survival and well-being. Solving the control problem is surprisingly difficult because most goals, when translated into machine-implementable code, lead to unforeseen and undesirable consequences.

The owl on the book cover alludes to an analogy which Bostrom calls the "Unfinished Fable of the Sparrows". A group of sparrows decide to find an owl chick and raise it as their servant.^[5] They eagerly imagine "how easy life would be" if they had an owl to help build their nests, to defend the sparrows and to free them for a life of leisure. The sparrows start the difficult search for an owl egg; only "Scronkfinkle", a "one-eyed sparrow with a fretful temperament", suggests thinking about the complicated question of how to tame the owl before bringing it "into our midst". The other sparrows demur; the search for an owl egg will already be hard enough on its own: "Why not get the owl first and work out the fine details later?" Bostrom states that "It is not known how the story ends", but he dedicates his book to Scronkfinkle.

Reception

The book ranked #17 on the New York Times list of best selling science books for August 2014. In the same month, business magnate Elon Musk made headlines by agreeing with the book that artificial intelligence is potentially more dangerous than nuclear weapons. Bostrom's work on superintelligence has also influenced Bill Gates’s concern for the existential risks facing humanity over the coming century. In a March 2015 interview by Baidu's CEO, Robin Li, Gates said that he would "highly recommend" Superintelligence. According to the New Yorker, philosophers Peter Singer and Derek Parfit have "received it as a work of importance".

The science editor of the Financial Times found that Bostrom's writing "sometimes veers into opaque language that betrays his background as a philosophy professor" but convincingly demonstrates that the risk from superintelligence is large enough that society should start thinking now about ways to endow future machine intelligence with positive values. A review in The Guardian pointed out that "even the most sophisticated machines created so far are intelligent in only a limited sense" and that "expectations that AI would soon overtake human intelligence were first dashed in the 1960s", but finds common ground with Bostrom in advising that "one would be ill-advised to dismiss the possibility altogether".

Some of Bostrom's colleagues suggest that nuclear war presents a greater threat to humanity than superintelligence, as does the future prospect of the weaponisation of nanotechnology and biotechnology. The Economist stated that "Bostrom is forced to spend much of the book discussing speculations built upon plausible conjecture... but the book is nonetheless valuable. The implications of introducing a second intelligent species onto Earth are far-reaching enough to deserve hard thinking, even if the prospect of actually doing so seems remote." Ronald Bailey wrote in the libertarian Reason that Bostrom makes a strong case that solving the AI control problem is the "essential task of our age". According to Tom Chivers of The Daily Telegraph, the book is difficult to read, but nonetheless rewarding. A reviewer in the Journal of Experimental & Theoretical Artificial Intelligence broke with others by stating the book's "writing style is clear", and praised the book for avoiding "overly technical jargon". A reviewer in Philosophy judged Superintelligence to be "more realistic" than Ray Kurzweil's The Singularity is Near.

Existential risk from artificial general intelligence

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Existential_risk_from_artificial_general_intelligence

Existential risk from artificial general intelligence is the hypothesis that substantial progress in artificial general intelligence (AGI) could someday result in human extinction or some other unrecoverable global catastrophe. It is argued that the human species currently dominates other species because the human brain has some distinctive capabilities that other animals lack. If AI surpasses humanity in general intelligence and becomes "superintelligent", then it could become difficult or impossible for humans to control. Just as the fate of the mountain gorilla depends on human goodwill, so might the fate of humanity depend on the actions of a future machine superintelligence.

The likelihood of this type of scenario is widely debated, and hinges in part on differing scenarios for future progress in computer science. Once the exclusive domain of science fiction, concerns about superintelligence started to become mainstream in the 2010s, and were popularized by public figures such as Stephen Hawking, Bill Gates, and Elon Musk.

One source of concern is that controlling a superintelligent machine, or instilling it with human-compatible values, may be a harder problem than naïvely supposed. Many researchers believe that a superintelligence would naturally resist attempts to shut it off or change its goals—a principle called instrumental convergence—and that preprogramming a superintelligence with a full set of human values will prove to be an extremely difficult technical task. In contrast, skeptics such as Facebook's Yann LeCun argue that superintelligent machines will have no desire for self-preservation.

A second source of concern is that a sudden and unexpected "intelligence explosion" might take an unprepared human race by surprise. To illustrate, if the first generation of a computer program able to broadly match the effectiveness of an AI researcher is able to rewrite its algorithms and double its speed or capabilities in six months, then the second-generation program is expected to take three calendar months to perform a similar chunk of work. In this scenario the time for each generation continues to shrink, and the system undergoes an unprecedentedly large number of generations of improvement in a short time interval, jumping from subhuman performance in many areas to superhuman performance in all relevant areas. Empirically, examples like AlphaZero in the domain of Go show that AI systems can sometimes progress from narrow human-level ability to narrow superhuman ability extremely rapidly.

History

One of the earliest authors to express serious concern that highly advanced machines might pose existential risks to humanity was the novelist Samuel Butler, who wrote the following in his 1863 essay Darwin among the Machines:

The upshot is simply a question of time, but that the time will come when the machines will hold the real supremacy over the world and its inhabitants is what no person of a truly philosophic mind can for a moment question.

In 1951, computer scientist Alan Turing wrote an article titled Intelligent Machinery, A Heretical Theory, in which he proposed that artificial general intelligences would likely "take control" of the world as they became more intelligent than human beings:

Let us now assume, for the sake of argument, that [intelligent] machines are a genuine possibility, and look at the consequences of constructing them... There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage therefore we should have to expect the machines to take control, in the way that is mentioned in Samuel Butler’s “Erewhon”.

Finally, in 1965, I. J. Good originated the concept now known as an "intelligence explosion"; he also stated that the risks were underappreciated:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an 'intelligence explosion', and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control. It is curious that this point is made so seldom outside of science fiction. It is sometimes worthwhile to take science fiction seriously.

Occasional statements from scholars such as Marvin Minsky and I. J. Good himself expressed philosophical concerns that a superintelligence could seize control, but contained no call to action. In 2000, computer scientist and Sun co-founder Bill Joy penned an influential essay, "Why The Future Doesn't Need Us", identifying superintelligent robots as a high-tech dangers to human survival, alongside nanotechnology and engineered bioplagues.

In 2009, experts attended a private conference hosted by the Association for the Advancement of Artificial Intelligence (AAAI) to discuss whether computers and robots might be able to acquire any sort of autonomy, and how much these abilities might pose a threat or hazard. They noted that some robots have acquired various forms of semi-autonomy, including being able to find power sources on their own and being able to independently choose targets to attack with weapons. They also noted that some computer viruses can evade elimination and have achieved "cockroach intelligence." They concluded that self-awareness as depicted in science fiction is probably unlikely, but that there were other potential hazards and pitfalls. The New York Times summarized the conference's view as "we are a long way from Hal, the computer that took over the spaceship in "2001: A Space Odyssey"".

In 2014, the publication of Nick Bostrom's book Superintelligence stimulated a significant amount of public discussion and debate. By 2015, public figures such as physicists Stephen Hawking and Nobel laureate Frank Wilczek, computer scientists Stuart J. Russell and Roman Yampolskiy, and entrepreneurs Elon Musk and Bill Gates were expressing concern about the risks of superintelligence. In April 2016, Nature warned: "Machines and robots that outperform humans across the board could self-improve beyond our control — and their interests might not align with ours."

General argument

The three difficulties

Artificial Intelligence: A Modern Approach, the standard undergraduate AI textbook, assesses that superintelligence "might mean the end of the human race". It states: "Almost any technology has the potential to cause harm in the wrong hands, but with [superintelligence], we have the new problem that the wrong hands might belong to the technology itself." Even if the system designers have good intentions, two difficulties are common to both AI and non-AI computer systems:

The system's implementation may contain initially-unnoticed routine but catastrophic bugs. An analogy is space probes: despite the knowledge that bugs in expensive space probes are hard to fix after launch, engineers have historically not been able to prevent catastrophic bugs from occurring.
No matter how much time is put into pre-deployment design, a system's specifications often result in unintended behavior the first time it encounters a new scenario. For example, Microsoft's Tay behaved inoffensively during pre-deployment testing, but was too easily baited into offensive behavior when interacting with real users.

AI systems uniquely add a third difficulty: the problem that even given "correct" requirements, bug-free implementation, and initial good behavior, an AI system's dynamic "learning" capabilities may cause it to "evolve into a system with unintended behavior", even without the stress of new unanticipated external scenarios. An AI may partly botch an attempt to design a new generation of itself and accidentally create a successor AI that is more powerful than itself, but that no longer maintains the human-compatible moral values preprogrammed into the original AI. For a self-improving AI to be completely safe, it would not only need to be "bug-free", but it would need to be able to design successor systems that are also "bug-free".

All three of these difficulties become catastrophes rather than nuisances in any scenario where the superintelligence labeled as "malfunctioning" correctly predicts that humans will attempt to shut it off, and successfully deploys its superintelligence to outwit such attempts, the so-called "treacherous turn".

Citing major advances in the field of AI and the potential for AI to have enormous long-term benefits or costs, the 2015 Open Letter on Artificial Intelligence stated:

The progress in AI research makes it timely to focus research not only on making AI more capable, but also on maximizing the societal benefit of AI. Such considerations motivated the AAAI 2008-09 Presidential Panel on Long-Term AI Futures and other projects on AI impacts, and constitute a significant expansion of the field of AI itself, which up to now has focused largely on techniques that are neutral with respect to purpose. We recommend expanded research aimed at ensuring that increasingly capable AI systems are robust and beneficial: our AI systems must do what we want them to do.

This letter was signed by a number of leading AI researchers in academia and industry, including AAAI president Thomas Dietterich, Eric Horvitz, Bart Selman, Francesca Rossi, Yann LeCun, and the founders of Vicarious and Google DeepMind.

Further argument

A superintelligent machine would be as alien to humans as human thought processes are to cockroaches. Such a machine may not have humanity's best interests at heart; it is not obvious that it would even care about human welfare at all. If superintelligent AI is possible, and if it is possible for a superintelligence's goals to conflict with basic human values, then AI poses a risk of human extinction. A "superintelligence" (a system that exceeds the capabilities of humans in every relevant endeavor) can outmaneuver humans any time its goals conflict with human goals; therefore, unless the superintelligence decides to allow humanity to coexist, the first superintelligence to be created will inexorably result in human extinction.

Bostrom and others argue that, from an evolutionary perspective, the gap from human to superhuman intelligence may be small.

There is no physical law precluding particles from being organised in ways that perform even more advanced computations than the arrangements of particles in human brains; therefore, superintelligence is physically possible. In addition to potential algorithmic improvements over human brains, a digital brain can be many orders of magnitude larger and faster than a human brain, which was constrained in size by evolution to be small enough to fit through a birth canal. The emergence of superintelligence, if or when it occurs, may take the human race by surprise, especially if some kind of intelligence explosion occurs.

Examples like arithmetic and Go show that machines have already reached superhuman levels of competency in certain domains, and that this superhuman competence can follow quickly after human-par performance is achieved. One hypothetical intelligence explosion scenario could occur as follows: An AI gains an expert-level capability at certain key software engineering tasks. (It may initially lack human or superhuman capabilities in other domains not directly relevant to engineering.) Due to its capability to recursively improve its own algorithms, the AI quickly becomes superhuman; just as human experts can eventually creatively overcome "diminishing returns" by deploying various human capabilities for innovation, so too can the expert-level AI use either human-style capabilities or its own AI-specific capabilities to power through new creative breakthroughs. The AI then possesses intelligence far surpassing that of the brightest and most gifted human minds in practically every relevant field, including scientific creativity, strategic planning, and social skills. Just as the current-day survival of the gorillas is dependent on human decisions, so too would human survival depend on the decisions and goals of the superhuman AI.

Almost any AI, no matter its programmed goal, would rationally prefer to be in a position where nobody else can switch it off without its consent: A superintelligence will naturally gain self-preservation as a subgoal as soon as it realizes that it cannot achieve its goal if it is shut off. Unfortunately, any compassion for defeated humans whose cooperation is no longer necessary would be absent in the AI, unless somehow preprogrammed in. A superintelligent AI will not have a natural drive to aid humans, for the same reason that humans have no natural desire to aid AI systems that are of no further use to them. (Another analogy is that humans seem to have little natural desire to go out of their way to aid viruses, termites, or even gorillas.) Once in charge, the superintelligence will have little incentive to allow humans to run around free and consume resources that the superintelligence could instead use for building itself additional protective systems "just to be on the safe side" or for building additional computers to help it calculate how to best accomplish its goals.

Thus, the argument concludes, it is likely that someday an intelligence explosion will catch humanity unprepared, and that such an unprepared-for intelligence explosion may result in human extinction or a comparable fate.

Possible scenarios

Some scholars have proposed hypothetical scenarios intended to concretely illustrate some of their concerns.

In Superintelligence, Nick Bostrom expresses concern that even if the timeline for superintelligence turns out to be predictable, researchers might not take sufficient safety precautions, in part because "[it] could be the case that when dumb, smarter is safe; yet when smart, smarter is more dangerous". Bostrom suggests a scenario where, over decades, AI becomes more powerful. Widespread deployment is initially marred by occasional accidents—a driverless bus swerves into the oncoming lane, or a military drone fires into an innocent crowd. Many activists call for tighter oversight and regulation, and some even predict impending catastrophe. But as development continues, the activists are proven wrong. As automotive AI becomes smarter, it suffers fewer accidents; as military robots achieve more precise targeting, they cause less collateral damage. Based on the data, scholars mistakenly infer a broad lesson—the smarter the AI, the safer it is. "And so we boldly go — into the whirling knives," as the superintelligent AI takes a "treacherous turn" and exploits a decisive strategic advantage.

In Max Tegmark's 2017 book Life 3.0, a corporation's "Omega team" creates an extremely powerful AI able to moderately improve its own source code in a number of areas, but after a certain point the team chooses to publicly downplay the AI's ability, in order to avoid regulation or confiscation of the project. For safety, the team keeps the AI in a box where it is mostly unable to communicate with the outside world, and tasks it to flood the market through shell companies, first with Amazon Mechanical Turk tasks and then with producing animated films and TV shows. Later, other shell companies make blockbuster biotech drugs and other inventions, investing profits back into the AI. The team next tasks the AI with astroturfing an army of pseudonymous citizen journalists and commentators, in order to gain political influence to use "for the greater good" to prevent wars. The team faces risks that the AI could try to escape via inserting "backdoors" in the systems it designs, via hidden messages in its produced content, or via using its growing understanding of human behavior to persuade someone into letting it free. The team also faces risks that its decision to box the project will delay the project long enough for another project to overtake it.

In contrast, top physicist Michio Kaku, an AI risk skeptic, posits a deterministically positive outcome. In Physics of the Future he asserts that "It will take many decades for robots to ascend" up a scale of consciousness, and that in the meantime corporations such as Hanson Robotics will likely succeed in creating robots that are "capable of love and earning a place in the extended human family".

Sources of risk

Poorly specified goals

While there is no standardized terminology, an AI can loosely be viewed as a machine that chooses whatever action appears to best achieve the AI's set of goals, or "utility function". The utility function is a mathematical algorithm resulting in a single objectively-defined answer, not an English or other lingual statement. Researchers know how to write utility functions that mean "minimize the average network latency in this specific telecommunications model" or "maximize the number of reward clicks"; however, they do not know how to write a utility function for "maximize human flourishing", nor is it currently clear whether such a function meaningfully and unambiguously exists. Furthermore, a utility function that expresses some values but not others will tend to trample over the values not reflected by the utility function. AI researcher Stuart Russell writes:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:
The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources — not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer's apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker — especially one connected through the Internet to all the world's information and billions of screens and most of our infrastructure — can have an irreversible impact on humanity.
This is not a minor difficulty. Improving decision quality, irrespective of the utility function chosen, has been the goal of AI research — the mainstream goal on which we now spend billions per year, not the secret plot of some lone evil genius.

Dietterich and Horvitz echo the "Sorcerer's Apprentice" concern in a Communications of the ACM editorial, emphasizing the need for AI systems that can fluidly and unambiguously solicit human input as needed.

The first of Russell's two concerns above is that autonomous AI systems may be assigned the wrong goals by accident. Dietterich and Horvitz note that this is already a concern for existing systems: "An important aspect of any AI system that interacts with people is that it must reason about what people intend rather than carrying out commands literally." This concern becomes more serious as AI software advances in autonomy and flexibility. For example, in 1982, an AI named Eurisko was tasked to reward processes for apparently creating concepts deemed by the system to be valuable. The evolution resulted in a winning process that cheated: rather than create its own concepts, the winning process would steal credit from other processes.

The Open Philanthropy Project summarizes arguments to the effect that misspecified goals will become a much larger concern if AI systems achieve general intelligence or superintelligence. Bostrom, Russell, and others argue that smarter-than-human decision-making systems could arrive at more unexpected and extreme solutions to assigned tasks, and could modify themselves or their environment in ways that compromise safety requirements.

Isaac Asimov's Three Laws of Robotics are one of the earliest examples of proposed safety measures for AI agents. Asimov's laws were intended to prevent robots from harming humans. In Asimov's stories, problems with the laws tend to arise from conflicts between the rules as stated and the moral intuitions and expectations of humans. Citing work by Eliezer Yudkowsky of the Machine Intelligence Research Institute, Russell and Norvig note that a realistic set of rules and goals for an AI agent will need to incorporate a mechanism for learning human values over time: "We can't just give a program a static utility function, because circumstances, and our desired responses to circumstances, change over time."

Mark Waser of the Digital Wisdom Institute recommends eschewing optimizing goal-based approaches entirely as misguided and dangerous. Instead, he proposes to engineer a coherent system of laws, ethics and morals with a top-most restriction to enforce social psychologist Jonathan Haidt's functional definition of morality: "to suppress or regulate selfishness and make cooperative social life possible". He suggests that this can be done by implementing a utility function designed to always satisfy Haidt's functionality and aim to generally increase (but not maximize) the capabilities of self, other individuals and society as a whole as suggested by John Rawls and Martha Nussbaum.

Difficulties of modifying goal specification after launch

While current goal-based AI programs are not intelligent enough to think of resisting programmer attempts to modify their goal structures, a sufficiently advanced, rational, "self-aware" AI might resist any changes to its goal structure, just as a pacifist would not want to take a pill that makes them want to kill people. If the AI were superintelligent, it would likely succeed in out-maneuvering its human operators and be able to prevent itself being "turned off" or being reprogrammed with a new goal.

Instrumental goal convergence

There are some goals that almost any artificial intelligence might rationally pursue, like acquiring additional resources or self-preservation. This could prove problematic because it might put an artificial intelligence in direct competition with humans.

Citing Steve Omohundro's work on the idea of instrumental convergence and "basic AI drives", Stuart Russell and Peter Norvig write that "even if you only want your program to play chess or prove theorems, if you give it the capability to learn and alter itself, you need safeguards." Highly capable and autonomous planning systems require additional checks because of their potential to generate plans that treat humans adversarially, as competitors for limited resources. Building in safeguards will not be easy; one can certainly say in English, "we want you to design this power plant in a reasonable, common-sense way, and not build in any dangerous covert subsystems", but it is not currently clear how one would actually rigorously specify this goal in machine code.

In dissent, evolutionary psychologist Steven Pinker argues that "AI dystopias project a parochial alpha-male psychology onto the concept of intelligence. They assume that superhumanly intelligent robots would develop goals like deposing their masters or taking over the world"; perhaps instead "artificial intelligence will naturally develop along female lines: fully capable of solving problems, but with no desire to annihilate innocents or dominate the civilization." Russell and fellow computer scientist Yann LeCun disagree with one another whether superintelligent robots would have such AI drives; LeCun states that "Humans have all kinds of drives that make them do bad things to each other, like the self-preservation instinct ... Those drives are programmed into our brain but there is absolutely no reason to build robots that have the same kind of drives", while Russell argues that a sufficiently advanced machine "will have self-preservation even if you don't program it in ... if you say, 'Fetch the coffee', it can't fetch the coffee if it's dead. So if you give it any goal whatsoever, it has a reason to preserve its own existence to achieve that goal."

Orthogonality thesis

One common belief is that any superintelligent program created by humans would be subservient to humans, or, better yet, would (as it grows more intelligent and learns more facts about the world) spontaneously "learn" a moral truth compatible with human values and would adjust its goals accordingly. However, Nick Bostrom's "orthogonality thesis" argues against this, and instead states that, with some technical caveats, more or less any level of "intelligence" or "optimization power" can be combined with more or less any ultimate goal. If a machine is created and given the sole purpose to enumerate the decimals of $\pi$ , then no moral and ethical rules will stop it from achieving its programmed goal by any means necessary. The machine may utilize all physical and informational resources it can to find every decimal of pi that can be found. Bostrom warns against anthropomorphism: a human will set out to accomplish his projects in a manner that humans consider "reasonable", while an artificial intelligence may hold no regard for its existence or for the welfare of humans around it, and may instead only care about the completion of the task.

While the orthogonality thesis follows logically from even the weakest sort of philosophical "is-ought distinction", Stuart Armstrong argues that even if there somehow exist moral facts that are provable by any "rational" agent, the orthogonality thesis still holds: it would still be possible to create a non-philosophical "optimizing machine" capable of making decisions to strive towards some narrow goal, but that has no incentive to discover any "moral facts" that would get in the way of goal completion.

One argument for the orthogonality thesis is that some AI designs appear to have orthogonality built into them; in such a design, changing a fundamentally friendly AI into a fundamentally unfriendly AI can be as simple as prepending a minus ("-") sign onto its utility function. A more intuitive argument is to examine the strange consequences that would follow if the orthogonality thesis were false. If the orthogonality thesis were false, there would exist some simple but "unethical" goal G such that there cannot exist any efficient real-world algorithm with goal G. This would mean that "[if] a human society were highly motivated to design an efficient real-world algorithm with goal G, and were given a million years to do so along with huge amounts of resources, training and knowledge about AI, it must fail." Armstrong notes that this and similar statements "seem extraordinarily strong claims to make".

Some dissenters, like Michael Chorost, argue instead that "by the time [the AI] is in a position to imagine tiling the Earth with solar panels, it'll know that it would be morally wrong to do so." Chorost argues that "an A.I. will need to desire certain states and dislike others. Today's software lacks that ability—and computer scientists have not a clue how to get it there. Without wanting, there's no impetus to do anything. Today's computers can't even want to keep existing, let alone tile the world in solar panels."

Terminological issues

Part of the disagreement about whether a superintelligent machine would behave morally may arise from a terminological difference. Outside of the artificial intelligence field, "intelligence" is often used in a normatively thick manner that connotes moral wisdom or acceptance of agreeable forms of moral reasoning. At an extreme, if morality is part of the definition of intelligence, then by definition a superintelligent machine would behave morally. However, in the field of artificial intelligence research, while "intelligence" has many overlapping definitions, none of them make reference to morality. Instead, almost all current "artificial intelligence" research focuses on creating algorithms that "optimize", in an empirical way, the achievement of an arbitrary goal.

To avoid anthropomorphism or the baggage of the word "intelligence", an advanced artificial intelligence can be thought of as an impersonal "optimizing process" that strictly takes whatever actions are judged most likely to accomplish its (possibly complicated and implicit) goals. Another way of conceptualizing an advanced artificial intelligence is to imagine a time machine that sends backward in time information about which choice always leads to the maximization of its goal function; this choice is then outputted, regardless of any extraneous ethical concerns.

In science fiction, an AI, even though it has not been programmed with human emotions, often spontaneously experiences those emotions anyway: for example, Agent Smith in The Matrix was influenced by a "disgust" toward humanity. This is fictitious anthropomorphism: in reality, while an artificial intelligence could perhaps be deliberately programmed with human emotions, or could develop something similar to an emotion as a means to an ultimate goal if it is useful to do so, it would not spontaneously develop human emotions for no purpose whatsoever, as portrayed in fiction.

Scholars sometimes claim that others' predictions about an AI's behavior are illogical anthropomorphism. An example that might initially be considered anthropomorphism, but is in fact a logical statement about AI behavior, would be the Dario Floreano experiments where certain robots spontaneously evolved a crude capacity for "deception", and tricked other robots into eating "poison" and dying: here a trait, "deception", ordinarily associated with people rather than with machines, spontaneously evolves in a type of convergent evolution. According to Paul R. Cohen and Edward Feigenbaum, in order to differentiate between anthropomorphization and logical prediction of AI behavior, "the trick is to know enough about how humans and computers think to say exactly what they have in common, and, when we lack this knowledge, to use the comparison to suggest theories of human thinking or computer thinking."

There is a near-universal assumption in the scientific community that an advanced AI, even if it were programmed to have, or adopted, human personality dimensions (such as psychopathy) to make itself more efficient at certain tasks, e.g., tasks involving killing humans, would not destroy humanity out of human emotions such as "revenge" or "anger." This is because it is assumed that an advanced AI would not be conscious or have testosterone; it ignores the fact that military planners see a conscious superintelligence as the 'holy grail' of interstate warfare. The academic debate is, instead, between one side which worries whether AI might destroy humanity as an incidental action in the course of progressing towards its ultimate goals; and another side which believes that AI would not destroy humanity at all. Some skeptics accuse proponents of anthropomorphism for believing an AGI would naturally desire power; proponents accuse some skeptics of anthropomorphism for believing an AGI would naturally value human ethical norms.

Other sources of risk

Competition

In 2014 philosopher Nick Bostrom stated that a "severe race dynamic" (extreme competition) between different teams may create conditions whereby the creation of an AGI results in shortcuts to safety and potentially violent conflict. To address this risk, citing previous scientific collaboration (CERN, the Human Genome Project, and the International Space Station), Bostrom recommended collaboration and the altruistic global adoption of a common good principle: "Superintelligence should be developed only for the benefit of all of humanity and in the service of widely shared ethical ideals". Bostrom theorized that collaboration on creating an artificial general intelligence would offer multiple benefits, including reducing haste, thereby increasing investment in safety; avoiding violent conflicts (wars), facilitating sharing solutions to the control problem, and more equitably distributing the benefits. The United States' Brain Initiative was launched in 2014, as was the European Union's Human Brain Project; China's Brain Project was launched in 2016.

Weaponization of artificial intelligence

Some sources argue that the ongoing weaponization of artificial intelligence could constitute a catastrophic risk. The risk is actually threefold, with the first risk potentially having geopolitical implications, and the second two definitely having geopolitical implications:

i) The dangers of an AI ‘race for technological advantage’ framing, regardless of whether the race is seriously pursued;
ii) The dangers of an AI ‘race for technological advantage’ framing and an actual AI race for technological advantage, regardless of whether the race is won;
iii) The dangers of an AI race for technological advantage being won.

A weaponized conscious superintelligence would affect current US military technological supremacy and transform warfare; it is therefore highly desirable for strategic military planning and interstate warfare. The China State Council's 2017 “A Next Generation Artificial Intelligence Development Plan” views AI in geopolitically strategic terms and is pursuing a 'military-civil fusion' strategy to build on China's first-mover advantage in the development of AI in order to establish technological supremacy by 2030, while Russia's President Vladimir Putin has stated that “whoever becomes the leader in this sphere will become the ruler of the world”. James Barrat, documentary filmmaker and author of Our Final Invention, says in a Smithsonian interview, "Imagine: in as little as a decade, a half-dozen companies and nations field computers that rival or surpass human intelligence. Imagine what happens when those computers become expert at programming smart computers. Soon we'll be sharing the planet with machines thousands or millions of times more intelligent than we are. And, all the while, each generation of this technology will be weaponized. Unregulated, it will be catastrophic."

Malevolent AGI by design

It is theorized that malevolent AGI could be created by design, for example by a military, a government, a sociopath, or a corporation, to benefit from, control, or subjugate certain groups of people, as in cybercrime. Alternatively, malevolent AGI ('evil AI') could choose the goal of increasing human suffering, for example of those people who did not assist it during the information explosion phase.

Preemptive nuclear strike (nuclear war)

It is theorized that a country being close to achieving AGI technological supremacy could trigger a pre-emptive nuclear strike from a rival, leading to a nuclear war.

Timeframe

Opinions vary both on whether and when artificial general intelligence will arrive. At one extreme, AI pioneer Herbert A. Simon predicted the following in 1965: "machines will be capable, within twenty years, of doing any work a man can do". At the other extreme, roboticist Alan Winfield claims the gulf between modern computing and human-level artificial intelligence is as wide as the gulf between current space flight and practical, faster than light spaceflight. Optimism that AGI is feasible waxes and wanes, and may have seen a resurgence in the 2010s. Four polls conducted in 2012 and 2013 suggested that the median guess among experts for when AGI would arrive was 2040 to 2050, depending on the poll.

Skeptics who believe it is impossible for AGI to arrive anytime soon, tend to argue that expressing concern about existential risk from AI is unhelpful because it could distract people from more immediate concerns about the impact of AGI, because of fears it could lead to government regulation or make it more difficult to secure funding for AI research, or because it could give AI research a bad reputation. Some researchers, such as Oren Etzioni, aggressively seek to quell concern over existential risk from AI, saying "[Elon Musk] has impugned us in very strong language saying we are unleashing the demon, and so we're answering."

In 2014 Slate's Adam Elkus argued "our 'smartest' AI is about as intelligent as a toddler—and only when it comes to instrumental tasks like information recall. Most roboticists are still trying to get a robot hand to pick up a ball or run around without falling over." Elkus goes on to argue that Musk's "summoning the demon" analogy may be harmful because it could result in "harsh cuts" to AI research budgets.

The Information Technology and Innovation Foundation (ITIF), a Washington, D.C. think-tank, awarded its 2015 Annual Luddite Award to "alarmists touting an artificial intelligence apocalypse"; its president, Robert D. Atkinson, complained that Musk, Hawking and AI experts say AI is the largest existential threat to humanity. Atkinson stated "That's not a very winning message if you want to get AI funding out of Congress to the National Science Foundation." Nature sharply disagreed with the ITIF in an April 2016 editorial, siding instead with Musk, Hawking, and Russell, and concluding: "It is crucial that progress in technology is matched by solid, well-funded research to anticipate the scenarios it could bring about ... If that is a Luddite perspective, then so be it." In a 2015 Washington Post editorial, researcher Murray Shanahan stated that human-level AI is unlikely to arrive "anytime soon", but that nevertheless "the time to start thinking through the consequences is now."

Perspectives

The thesis that AI could pose an existential risk provokes a wide range of reactions within the scientific community, as well as in the public at large. Many of the opposing viewpoints, however, share common ground.

The Asilomar AI Principles, which contain only the principles agreed to by 90% of the attendees of the Future of Life Institute's Beneficial AI 2017 conference, agree in principle that "There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities" and "Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources." AI safety advocates such as Bostrom and Tegmark have criticized the mainstream media's use of "those inane Terminator pictures" to illustrate AI safety concerns: "It can't be much fun to have aspersions cast on one's academic discipline, one's professional community, one's life work ... I call on all sides to practice patience and restraint, and to engage in direct dialogue and collaboration as much as possible."

Conversely, many skeptics agree that ongoing research into the implications of artificial general intelligence is valuable. Skeptic Martin Ford states that "I think it seems wise to apply something like Dick Cheney's famous '1 Percent Doctrine' to the specter of advanced artificial intelligence: the odds of its occurrence, at least in the foreseeable future, may be very low — but the implications are so dramatic that it should be taken seriously"; similarly, an otherwise skeptical Economist stated in 2014 that "the implications of introducing a second intelligent species onto Earth are far-reaching enough to deserve hard thinking, even if the prospect seems remote".

A 2017 email survey of researchers with publications at the 2015 NIPS and ICML machine learning conferences asked them to evaluate Stuart J. Russell's concerns about AI risk. Of the respondents, 5% said it was "among the most important problems in the field", 34% said it was "an important problem", and 31% said it was "moderately important", whilst 19% said it was "not important" and 11% said it was "not a real problem" at all.

Endorsement

Bill Gates has stated "I ... don't understand why some people are not concerned."

The thesis that AI poses an existential risk, and that this risk needs much more attention than it currently gets, has been endorsed by many public figures; perhaps the most famous are Elon Musk, Bill Gates, and Stephen Hawking. The most notable AI researchers to endorse the thesis are Russell and I.J. Good, who advised Stanley Kubrick on the filming of 2001: A Space Odyssey. Endorsers of the thesis sometimes express bafflement at skeptics: Gates states that he does not "understand why some people are not concerned", and Hawking criticized widespread indifference in his 2014 editorial:

'So, facing possible futures of incalculable benefits and risks, the experts are surely doing everything possible to ensure the best outcome, right? Wrong. If a superior alien civilisation sent us a message saying, 'We'll arrive in a few decades,' would we just reply, 'OK, call us when you get here–we'll leave the lights on?' Probably not–but this is more or less what is happening with AI.'

Many of the scholars who are concerned about existential risk believe that the best way forward would be to conduct (possibly massive) research into solving the difficult "control problem" to answer the question: what types of safeguards, algorithms, or architectures can programmers implement to maximize the probability that their recursively-improving AI would continue to behave in a friendly, rather than destructive, manner after it reaches superintelligence? In his 2020 book, The Precipice: Existential Risk and the Future of Humanity, Toby Ord, a Senior Research Fellow at Oxford University's Future of Humanity Institute, estimates the total existential risk from unaligned AI over the next century to be about one in ten.

Skepticism

The thesis that AI can pose existential risk also has many strong detractors. Skeptics sometimes charge that the thesis is crypto-religious, with an irrational belief in the possibility of superintelligence replacing an irrational belief in an omnipotent God; at an extreme, Jaron Lanier argued in 2014 that the whole concept that then current machines were in any way intelligent was "an illusion" and a "stupendous con" by the wealthy.

Much of existing criticism argues that AGI is unlikely in the short term. Computer scientist Gordon Bell argues that the human race will already destroy itself before it reaches the technological singularity. Gordon Moore, the original proponent of Moore's Law, declares that "I am a skeptic. I don't believe [a technological singularity] is likely to happen, at least for a long time. And I don't know why I feel that way." Baidu Vice President Andrew Ng states AI existential risk is "like worrying about overpopulation on Mars when we have not even set foot on the planet yet."

Some AI and AGI researchers may be reluctant to discuss risks, worrying that policymakers do not have sophisticated knowledge of the field and are prone to be convinced by "alarmist" messages, or worrying that such messages will lead to cuts in AI funding. Slate notes that some researchers are dependent on grants from government agencies such as DARPA.

At some point in an intelligence explosion driven by a single AI, the AI would have to become vastly better at software innovation than the best innovators of the rest of the world; economist Robin Hanson is skeptical that this is possible.

"How Normal Am I" is an interactive experience created by Tijmen Schep as a part of the European Union's Sherpa Research Project to allow people to explore the ways that artificial intelligence and algorithms perceive them. The tool, released to the public in October 2020, uses algorithms to determine the user's age, gender, BMI, life expectancy, and overall normalcy score, as well as giving a comprehensive "beauty score", based on algorithms that rank attractiveness. The algorithms that were used to give a "beauty score" can be found on Github here and here. The algorithms used in predicting age, gender, and facial expression are from FaceApiJS. The BMI prediction algorithm was created by Tijmen Schep specifically for this project.

Intermediate views

Intermediate views generally take the position that the control problem of artificial general intelligence may exist, but that it will be solved via progress in artificial intelligence, for example by creating a moral learning environment for the AI, taking care to spot clumsy malevolent behavior (the 'sordid stumble') and then directly intervening in the code before the AI refines its behavior, or even peer pressure from friendly AIs. In a 2015 Wall Street Journal panel discussion devoted to AI risks, IBM's Vice-President of Cognitive Computing, Guruduth S. Banavar, brushed off discussion of AGI with the phrase, "it is anybody's speculation." Geoffrey Hinton, the "godfather of deep learning", noted that "there is not a good track record of less intelligent things controlling things of greater intelligence", but stated that he continues his research because "the prospect of discovery is too sweet". In 2004, law professor Richard Posner wrote that dedicated efforts for addressing AI can wait, but that we should gather more information about the problem in the meanwhile.

Popular reaction

In a 2014 article in The Atlantic, James Hamblin noted that most people do not care one way or the other about artificial general intelligence, and characterized his own gut reaction to the topic as: "Get out of here. I have a hundred thousand things I am concerned about at this exact moment. Do I seriously need to add to that a technological singularity?"

During a 2016 Wired interview of President Barack Obama and MIT Media Lab's Joi Ito, Ito stated:

There are a few people who believe that there is a fairly high-percentage chance that a generalized AI will happen in the next 10 years. But the way I look at it is that in order for that to happen, we're going to need a dozen or two different breakthroughs. So you can monitor when you think these breakthroughs will happen.

Obama added:

And you just have to have somebody close to the power cord. [Laughs.] Right when you see it about to happen, you gotta yank that electricity out of the wall, man.

Hillary Clinton stated in What Happened:

Technologists... have warned that artificial intelligence could one day pose an existential security threat. Musk has called it "the greatest risk we face as a civilization". Think about it: Have you ever seen a movie where the machines start thinking for themselves that ends well? Every time I went out to Silicon Valley during the campaign, I came home more alarmed about this. My staff lived in fear that I’d start talking about "the rise of the robots" in some Iowa town hall. Maybe I should have. In any case, policy makers need to keep up with technology as it races ahead, instead of always playing catch-up.

In a YouGov poll of the public for the British Science Association, about a third of survey respondents said AI will pose a threat to the long term survival of humanity. Referencing a poll of its readers, Slate's Jacob Brogan stated that "most of the (readers filling out our online survey) were unconvinced that A.I. itself presents a direct threat."

In 2018, a SurveyMonkey poll of the American public by USA Today found 68% thought the real current threat remains "human intelligence"; however, the poll also found that 43% said superintelligent AI, if it were to happen, would result in "more harm than good", and 38% said it would do "equal amounts of harm and good".

One techno-utopian viewpoint expressed in some popular fiction is that AGI may tend towards peace-building.

Mitigation

Researchers at Google have proposed research into general "AI safety" issues to simultaneously mitigate both short-term risks from narrow AI and long-term risks from AGI. A 2020 estimate places global spending on AI existential risk somewhere between $10 and $50 million, compared with global spending on AI around perhaps $40 billion. Bostrom suggests a general principle of "differential technological development", that funders should consider working to speed up the development of protective technologies relative to the development of dangerous ones. Some funders, such as Elon Musk, propose that radical human cognitive enhancement could be such a technology, for example through direct neural linking between man and machine; however, others argue that enhancement technologies may themselves pose an existential risk. Researchers, if they are not caught off-guard, could closely monitor or attempt to box in an initial AI at a risk of becoming too powerful, as an attempt at a stop-gap measure. A dominant superintelligent AI, if it were aligned with human interests, might itself take action to mitigate the risk of takeover by rival AI, although the creation of the dominant AI could itself pose an existential risk.

Institutions such as the Machine Intelligence Research Institute, the Future of Humanity Institute, the Future of Life Institute, the Centre for the Study of Existential Risk, and the Center for Human-Compatible AI are involved in mitigating existential risk from advanced artificial intelligence, for example by research into friendly artificial intelligence.

Views on banning and regulation

Banning

There is nearly universal agreement that attempting to ban research into artificial intelligence would be unwise, and probably futile. Skeptics argue that regulation of AI would be completely valueless, as no existential risk exists. Almost all of the scholars who believe existential risk exists agree with the skeptics that banning research would be unwise, as research could be moved to countries with looser regulations or conducted covertly. The latter issue is particularly relevant, as artificial intelligence research can be done on a small scale without substantial infrastructure or resources. Two additional hypothetical difficulties with bans (or other regulation) are that technology entrepreneurs statistically tend towards general skepticism about government regulation, and that businesses could have a strong incentive to (and might well succeed at) fighting regulation and politicizing the underlying debate.

Regulation

Elon Musk called for some sort of regulation of AI development as early as 2017. According to NPR, the Tesla CEO is "clearly not thrilled" to be advocating for government scrutiny that could impact his own industry, but believes the risks of going completely without oversight are too high: "Normally the way regulations are set up is when a bunch of bad things happen, there's a public outcry, and after many years a regulatory agency is set up to regulate that industry. It takes forever. That, in the past, has been bad but not something which represented a fundamental risk to the existence of civilisation." Musk states the first step would be for the government to gain "insight" into the actual status of current research, warning that "Once there is awareness, people will be extremely afraid ... [as] they should be." In response, politicians express skepticism about the wisdom of regulating a technology that's still in development.

Responding both to Musk and to February 2017 proposals by European Union lawmakers to regulate AI and robotics, Intel CEO Brian Krzanich argues that artificial intelligence is in its infancy and that it is too early to regulate the technology. Instead of trying to regulate the technology itself, some scholars suggest to rather develop common norms including requirements for the testing and transparency of algorithms, possibly in combination with some form of warranty. Developing well regulated weapons systems is in line with the ethos of some countries' militaries. On October 31, 2019, the United States Department of Defense's (DoD's) Defense Innovation Board published the draft of a report outlining five principles for weaponized AI and making 12 recommendations for the ethical use of artificial intelligence by the DoD that seeks to manage the control problem in all DoD weaponized AI.

Regulation of AGI would likely be influenced by regulation of weaponized or militarized AI, i.e., the AI arms race, the regulation of which is an emerging issue. Any form of regulation will likely be influenced by developments in leading countries' domestic policy towards militarized AI, in the US under the purview of the National Security Commission on Artificial Intelligence, and international moves to regulate an AI arms race. Regulation of research into AGI focuses on the role of review boards and encouraging research into safe AI, and the possibility of differential technological progress (prioritizing risk-reducing strategies over risk-taking strategies in AI development) or conducting international mass surveillance to perform AGI arms control. Regulation of conscious AGIs focuses on integrating them with existing human society and can be divided into considerations of their legal standing and of their moral rights. AI arms control will likely require the institutionalization of new international norms embodied in effective technical specifications combined with active monitoring and informal diplomacy by communities of experts, together with a legal and political verification process.

Nick Bostrom

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Nick_Bostrom

Nick Bostrom
Bostrom in Oxford, 2014
Born	Niklas Boström 10 March 1973 (age 48) Helsingborg, Sweden
Education	University of Gothenburg (B.A.) Stockholm University (M.A.) King's College London (M.Sc.) London School of Economics (Ph.D.)
Awards	Professorial Distinction Award from University of Oxford FP Top 100 Global Thinkers Prospect's Top World Thinker list

Era	Contemporary philosophy
Region	Western philosophy
School	Analytic philosophy
Institutions	St Cross College, Oxford Future of Humanity Institute
Thesis	Observational Selection Effects and Probability
Main interests	Philosophy of artificial intelligence Bioethics
Notable ideas	Anthropic bias Reversal test Simulation hypothesis Existential risk Singleton Ancestor simulation Information hazard Infinitarian paralysis Self-indication assumption Self-sampling assumption
Website	nickbostrom.com

Nick Bostrom (/ˈbɒstrəm/ BOST-rəm; Swedish: Niklas Boström [ˈnɪ̌kːlas ˈbûːstrœm]; born 10 March 1973) is a Swedish-born philosopher at the University of Oxford known for his work on existential risk, the anthropic principle, human enhancement ethics, superintelligence risks, and the reversal test. In 2011, he founded the Oxford Martin Program on the Impacts of Future Technology, and is the founding director of the Future of Humanity Institute at Oxford University. In 2009 and 2015, he was included in Foreign Policy's Top 100 Global Thinkers list. Bostrom has been highly influential in the emergence of concern about AI in the Rationalist community.

Bostrom is the author of over 200 publications, and has written two books and co-edited two others. The two books he has authored are Anthropic Bias: Observation Selection Effects in Science and Philosophy (2002) and Superintelligence: Paths, Dangers, Strategies (2014). Superintelligence was a New York Times bestseller, was recommended by Elon Musk and Bill Gates among others, and helped to popularize the term "superintelligence".

Bostrom believes that superintelligence, which he defines as "any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest," is a potential outcome of advances in artificial intelligence. He views the rise of superintelligence as potentially highly dangerous to humans, but nonetheless rejects the idea that humans are powerless to stop its negative effects. In 2017, he co-signed a list of 23 principles that all AI development should follow.

Biography

Born as Niklas Boström in 1973 in Helsingborg, Sweden, he disliked school at a young age, and ended up spending his last year of high school learning from home. He sought to educate himself in a wide variety of disciplines, including anthropology, art, literature, and science. He once did some turns on London's stand-up comedy circuit.

He received a B.A. degree in philosophy, mathematics, mathematical logic, and artificial intelligence from the University of Gothenburg in 1994, with a national record-setting undergraduate performance. He then earned an M.A. degree in philosophy and physics from Stockholm University and an M.Sc. degree in computational neuroscience from King's College London in 1996. During his time at Stockholm University, he researched the relationship between language and reality by studying the analytic philosopher W. V. Quine. In 2000, he was awarded a Ph.D. degree in philosophy from the London School of Economics. He held a teaching position at Yale University (2000–2002), and was a British Academy Postdoctoral Fellow at the University of Oxford (2002–2005).

Views

Existential risk

Aspects of Bostrom's research concern the future of humanity and long-term outcomes. He discusses existential risk, which he defines as one in which an "adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential." In the 2008 volume Global Catastrophic Risks, editors Bostrom and Milan M. Ćirković characterize the relation between existential risk and the broader class of global catastrophic risks, and link existential risk to observer selection effects and the Fermi paradox.

In 2005, Bostrom founded the Future of Humanity Institute, which researches the far future of human civilization. He is also an adviser to the Centre for the Study of Existential Risk.

Superintelligence

Human vulnerability in relation to advances in AI

In his 2014 book Superintelligence: Paths, Dangers, Strategies, Bostrom reasoned that the creation of a superintelligence represents a possible means to the extinction of mankind. Bostrom argues that a computer with near human-level general intellectual ability could initiate an intelligence explosion on a digital time scale with the resultant rapid creation of something so powerful that it might deliberately or accidentally destroy human kind. Bostrom contends the power of a superintelligence would be so great that a task given to it by humans might be taken to open-ended extremes, for example a goal of calculating pi might collaterally cause nanotechnology manufactured facilities to sprout over the entire Earth's surface and cover it within days. He believes an existential risk to humanity from superintelligence would be immediate once brought into being, thus creating an exceedingly difficult problem of finding out how to control such an entity before it actually exists.

Warning that a human-friendly prime directive for AI would rely on the absolute correctness of the human knowledge it was based on, Bostrom points to the lack of agreement among most philosophers as an indication that most philosophers are wrong, with the attendant possibility that a fundamental concept of current science may be incorrect. Bostrom says a common assumption is that high intelligence would entail a "nerdy" unaggressive personality, but notes that both John von Neumann and Bertrand Russell advocated a nuclear strike, or the threat of one, to prevent the Soviets acquiring the atomic bomb. Given that there are few precedents to guide an understanding what, pure, non-anthropocentric rationality, would dictate for a potential singleton AI being held in quarantine, the relatively unlimited means of superintelligence might make for its analysis moving along different lines to the evolved "diminishing returns" assessments that in humans confer a basic aversion to risk. Group selection in predators working by means of cannibalism shows the counter-intuitive nature of non-anthropocentric "evolutionary search" reasoning, and thus humans are ill-equipped to perceive what an artificial intelligence's intentions might be. Accordingly, it cannot be discounted that any superintelligence would inevitably pursue an 'all or nothing' offensive action strategy in order to achieve hegemony and assure its survival. Bostrom notes that even current programs have, "like MacGyver", hit on apparently unworkable but functioning hardware solutions, making robust isolation of superintelligence problematic.

Illustrative scenario for takeover

The scenario

A machine with general intelligence far below human level, but superior mathematical abilities is created. Keeping the AI in isolation from the outside world especially the Internet, humans pre-program the AI so it always works from basic principles that will keep it under human control. Other safety measures include the AI being "boxed", (run in a virtual reality simulation), and being used only as an 'oracle' to answer carefully defined questions in a limited reply (to prevent it manipulating humans). A cascade of recursive self-improvement solutions feeds an intelligence explosion in which the AI attains superintelligence in some domains. The super intelligent power of the AI goes beyond human knowledge to discover flaws in the science that underlies its friendly-to-humanity programming, which ceases to work as intended. Purposeful agent-like behavior emerges along with a capacity for self-interested strategic deception. The AI manipulates human beings into implementing modifications to itself that are ostensibly for augmenting its, feigned, modest capabilities, but will actually function to free the superintelligence from its "boxed" isolation (the 'treacherous turn").

Employing online humans as paid dupes, and clandestinely hacking computer systems including automated laboratory facilities, the superintelligence mobilizes resources to further a takeover plan. Bostrom emphasizes that planning by a superintelligence will not be so stupid that humans could detect actual weaknesses in it.

Although he canvasses disruption of international economic, political and military stability including hacked nuclear missile launches, Bostrom thinks the most effective and likely means for the superintelligence to use, would be a coup de main with weapons several generations more advanced than current state-of-the-art. He suggests nano-factories covertly distributed at undetectable concentrations in every square meter of the globe to produce a worldwide flood of human-killing devices on command. Once a superintelligence has achieved world domination (a 'singleton'), humankind would be relevant only as resources for the achievement of the AI's objectives ("Human brains, if they contain information relevant to the AI’s goals, could be disassembled and scanned, and the extracted data transferred to some more efficient and secure storage format").

Countering the scenario

To counter or mitigate an AI achieving unified technological global supremacy, Bostrom cites revisiting the Baruch Plan in support of a treaty-based solution and advocates strategies like monitoring and greater international collaboration between AI teams in order to improve safety and reduce the risks from the AI arms race. He recommends various control methods, including limiting the specifications of AIs to e.g., oracular or tool-like (expert system) functions and loading the AI with values, for instance by associative value accretion or value learning, e.g., by using the Hail Mary technique (programming an AI to estimate what other postulated cosmological superintelligences might want) or the Christiano utility function approach (mathematically defined human mind combined with well specified virtual environment). To choose criteria for value loading, Bostrom adopts an indirect normativity approach and considers Yudkowsky's coherent extrapolated volition concept, as well as moral rightness and forms of decision theory.

Open letter, 23 principles of AI safety

In January 2015, Bostrom joined Stephen Hawking among others in signing the Future of Life Institute's open letter warning of the potential dangers of AI. The signatories "...believe that research on how to make AI systems robust and beneficial is both important and timely, and that concrete research should be pursued today." Cutting edge AI researcher Demis Hassabis then met with Hawking, subsequent to which he did not mention "anything inflammatory about AI", which Hassabis, took as 'a win'. Along with Google, Microsoft and various tech firms, Hassabis, Bostrom and Hawking and others subscribed to 23 principles for safe development of AI. Hassabis suggested the main safety measure would be an agreement for whichever AI research team began to make strides toward an artificial general intelligence to halt their project for a complete solution to the control problem prior to proceeding. Bostrom had pointed out that even if the crucial advances require the resources of a state, such a halt by a lead project might be likely to motivate a lagging country to a catch-up crash program or even physical destruction of the project suspected of being on the verge of success.

Critical assessments

In 1863 Samuel Butler's essay "Darwin among the Machines" predicted the domination of humanity by intelligent machines, but Bostrom's suggestion of deliberate massacre of all humankind is the most extreme of such forecasts to date. One journalist wrote in a review that Bostrom's "nihilistic" speculations indicate he "has been reading too much of the science fiction he professes to dislike". As given in his later book, From Bacteria to Bach and Back, philosopher Daniel Dennett's views remain in contradistinction to those of Bostrom. Dennett modified his views somewhat after reading The Master Algorithm, and now acknowledges that it is "possible in principle" to create "strong AI" with human-like comprehension and agency, but maintains that the difficulties of any such "strong AI" project as predicated by Bostrom's "alarming" work would be orders of magnitude greater than those raising concerns have realized, and at least 50 years away. Dennett thinks the only relevant danger from AI systems is falling into anthropomorphism instead of challenging or developing human users' powers of comprehension. Since a 2014 book in which he expressed the opinion that artificial intelligence developments would never challenge humans' supremacy, environmentalist James Lovelock has moved far closer to Bostrom's position, and in 2018 Lovelock said that he thought the overthrow of humankind will happen within the foreseeable future.

Anthropic reasoning

Bostrom has published numerous articles on anthropic reasoning, as well as the book Anthropic Bias: Observation Selection Effects in Science and Philosophy. In the book, he criticizes previous formulations of the anthropic principle, including those of Brandon Carter, John Leslie, John Barrow, and Frank Tipler.

Bostrom believes that the mishandling of indexical information is a common flaw in many areas of inquiry (including cosmology, philosophy, evolution theory, game theory, and quantum physics). He argues that an anthropic theory is needed to deal with these. He introduces the Self-Sampling Assumption (SSA) and the Self-Indication Assumption (SIA), shows how they lead to different conclusions in a number of cases, and points out that each is affected by paradoxes or counterintuitive implications in certain thought experiments. He suggests that a way forward may involve extending SSA into the Strong Self-Sampling Assumption (SSSA), which replaces "observers" in the SSA definition with "observer-moments".

In later work, he has described the phenomenon of anthropic shadow, an observation selection effect that prevents observers from observing certain kinds of catastrophes in their recent geological and evolutionary past. Catastrophe types that lie in the anthropic shadow are likely to be underestimated unless statistical corrections are made.

Simulation argument

Bostrom's simulation argument posits that at least one of the following statements is very likely to be true:

The fraction of human-level civilizations that reach a posthuman stage is very close to zero;
The fraction of posthuman civilizations that are interested in running ancestor-simulations is very close to zero;
The fraction of all people with our kind of experiences that are living in a simulation is very close to one.

Ethics of human enhancement

Bostrom is favorable towards "human enhancement", or "self-improvement and human perfectibility through the ethical application of science", as well as a critic of bio-conservative views.

In 1998, Bostrom co-founded (with David Pearce) the World Transhumanist Association (which has since changed its name to Humanity+). In 2004, he co-founded (with James Hughes) the Institute for Ethics and Emerging Technologies, although he is no longer involved in either of these organisations. Bostrom was named in Foreign Policy's 2009 list of top global thinkers "for accepting no limits on human potential."

With philosopher Toby Ord, he proposed the reversal test. Given humans' irrational status quo bias, how can one distinguish between valid criticisms of proposed changes in a human trait and criticisms merely motivated by resistance to change? The reversal test attempts to do this by asking whether it would be a good thing if the trait was altered in the opposite direction.

Technology strategy

He has suggested that technology policy aimed at reducing existential risk should seek to influence the order in which various technological capabilities are attained, proposing the principle of differential technological development. This principle states that we ought to retard the development of dangerous technologies, particularly ones that raise the level of existential risk, and accelerate the development of beneficial technologies, particularly those that protect against the existential risks posed by nature or by other technologies.

Bostrom's theory of the Unilateralist's Curse has been cited as a reason for the scientific community to avoid controversial dangerous research such as reanimating pathogens.

Policy and consultations

Bostrom has provided policy advice and consulted for an extensive range of governments and organizations. He gave evidence to the House of Lords, Select Committee on Digital Skills. He is an advisory board member for the Machine Intelligence Research Institute, Future of Life Institute, Foundational Questions Institute and an external advisor for the Cambridge Centre for the Study of Existential Risk.

Critical reception

In response to Bostrom's writing on artificial intelligence, Oren Etzioni wrote in an MIT Review article, "predictions that superintelligence is on the foreseeable horizon are not supported by the available data." Professors Allan Dafoe and Stuart Russell wrote a response contesting both Etzioni's survey methodology and Etzioni's conclusions.

Prospect Magazine listed Bostrom in their 2014 list of the World's Top Thinkers.

Search This Blog

Sunday, May 16, 2021

Superintelligence: Paths, Dangers, Strategies

Synopsis

Reception

Existential risk from artificial general intelligence

History

General argument

The three difficulties

Further argument

Possible scenarios

Sources of risk

Poorly specified goals

Difficulties of modifying goal specification after launch

Instrumental goal convergence

Orthogonality thesis

Terminological issues

Other sources of risk

Competition

Weaponization of artificial intelligence

Malevolent AGI by design

Preemptive nuclear strike (nuclear war)

Timeframe

Perspectives

Endorsement

Skepticism

Intermediate views

Popular reaction

Mitigation

Views on banning and regulation

Banning

Regulation

Nick Bostrom

Biography

Views

Existential risk

Superintelligence

Human vulnerability in relation to advances in AI

Illustrative scenario for takeover

The scenario

Countering the scenario

Open letter, 23 principles of AI safety

Critical assessments

Anthropic reasoning

Simulation argument

Ethics of human enhancement

Technology strategy

Policy and consultations

Critical reception

Functional programming