Search This Blog

Saturday, February 28, 2026

Ginsberg's theorem

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Ginsberg%27s_theorem

Ginsberg's theorem is an epigrammatic paraphrase and parody "theorem" which restates or analogizes the consequences of the four laws of thermodynamics of physics in terms of a person playing a game. It has various formulations, but it can be more or less expressed as:

Ginsberg's theorem:
  1. There is a game, which you are already playing. (consequence of zeroth law of thermodynamics)
  2. You cannot win in the game. (consequence of first law of thermodynamics)
  3. You cannot break even in the game. (consequence of second law of thermodynamics)
  4. You cannot even quit the game. (consequence of third law of thermodynamics)

The theorem is named after the poet Allen Ginsberg, though there does not appear to be any concrete evidence that Ginsberg himself coined the theorem. The phrase is sometimes stated as a general adage without specific reference to the laws of thermodynamics.

History

A comprehensive history and etymology of the epigrammatic phrase can also be found from the etymologist Barry Popik.

The phrase is often attributed to the British scientist C. P. Snow, who apparently was credited by his students for using it to help learn the laws of thermodynamics in the 1950s. However this claim appears to be without a source.

A semblance of the phrase appears to have been first printed in a 1953 issue of the science fiction magazine Astounding Science Fiction, whose editor, John Wood Campbell Jr., referenced acoustic engineer and professor Dwight Wayne Batteau of Harvard University:

"I suggest that there are some laws of ethics that are not human, but Universal. Wayne Batteau and his Speculative Society group at Harvard sent me one little pair of statements that are decidedly revealing in that respect.
“You can't win.” (The Law of Conservation of Energy.)
“You can't even break even.” (Second Law of Thermodynamics.)
When you stop to think about it, that “You can't win,” bears a strong resemblance to the old moral adage “You can't get something for nothing.”"

— John W. Campbell, Jr., Intelligence Test, Astounding SCIENCE FICTION, 1953 November, Vol. LII (52), no. 3 (November), pg. 8

In a 1956 issue of the same magazine, Batteau himself expanded it further in what appears to have been the first complete mention of the epigrammatic phrase in print:[6]

"The Three Laws of Thermodynamics, translated from Mathematics into English, come out:
1. You can't win.
2. You can't even break even.
3. Furthermore, you can't get out of the game!"

— Wayne Batteau, English Translation, Astounding SCIENCE FICTION, 1956 December, Vol. LVIII (58), no. 4 (December), pg. 43

It was later presented in the literary magazine The Kenyon Review in a 1960 short story titled "Entropy" from widely-regarded novelist Thomas Pynchon, who was still then an engineering physics undergraduate at Cornell University:

"Callisto had learned a mnemonic device for remembering the Laws of Thermodynamics: you can't win, things are going to get worse before they get better, who says they're going to get better."

— Thomas Pynchon, Entropy, The Kenyon Review, 1960 April, Vol. XXII (22), no. 2 (Spring), pg. 282

Physicist William R. Corliss also partly wrote about the phrase in a 1964 educational booklet freely distributed by the United States Atomic Energy Commission to disseminate knowledge about atomic energy to the American public:

"The Law of Conservation of Energy and Mass is also called the First Law of Thermodynamics. It is related to the Second Law of Thermodynamics, which also governs energy transformations. The Second Law says, in effect, that some energy will unavoidably be lost in all heat engines. The first two laws of thermodynamics have been paraphrased as (1) You can't win; (2) You can't even break even."

— William R. Corliss, Direct Conversion of Energy, United States Atomic Energy Commission, 1974 January, pg. 8

Science writer Isaac Asimov stated at least the first two laws in a 1970 article, and was being credited with the paraphrased version by the end of the decade.

The phrase then appeared in a non-scientific setting in the opening lines of the popular song "You Can't Win" originally written by songwriter Charlie Smalls for the stage musical The Wiz:

"You can't win, you can't break even
And you can't get out of the game"

— Charlie Smalls, You Can't Win, The Wiz, 1974 October

The song was written by Smalls in 1974 and performed during the 1974 Baltimore run of the musical. The song later reached number 81 on the Billboard Hot 100. Though the song was formally released in 1979 as part of a musical soundtrack album, it was originally written and copyrighted by Smalls in 1974.

Remarkably, Allen Ginsberg appears to have only ever written about the laws of thermodynamics once, in his 1973 poem "Yes and It's Hopeless", though not in any connection to the original epigrammatic phrase:

"All hopeless, the entire solar system running
Thermodynamics' Second Law
down the whole galaxy, all universes brain illusion or solid electric hopeless"

— Allen Ginsberg, Yes and It's Hopeless, 1973 March

Thus Ginsberg was seemingly, at the very least, cognizant of the laws of thermodynamics by the time of 1973. It is claimed that Ginsberg supposedly mentioned the epigrammatic phrase as a fun fact during a poetry session in or around 1974.[10][verification needed] In 1975, someone — possibly either Ginsberg's gay partner and poet Peter Orlovsky, poetry associate William Burroughs, or Philip Whalen — compiled a collection of quirky laws, including a "Ginsberg's Theorem" based on Ginsberg's prior musings.

In 1975, Ginsberg's theorem formally appeared by name, with no association to thermodynamics, in a listing of parody-like proverb laws by Conrad Schneiker in the counterculture magazine The CoEvolution Quarterly:

"Ginsberg's Theorem
1) You can't win.
2) You can't break even.
3) You can't even quit the game."

— Conrad Schneiker, An Abridged Collection of Interdisplicinary Laws, The CoEvolution Quarterly, 1975 December, No. 8 (Winter)

It may be possible that this appearance originated from a slight misstatement of the lines in the earlier 1974 song by Charlie Smalls.

Writer Arthur Bloch, in his popular 1977 book "Murphy's Law and Other Reasons Why Things Go Wrong!" which popularized Murphy's law, conflated the Ginsberg's theorem with the science of thermodynamics:

"The official party line of technology, of science itself, is despair. If you doubt this, witness the laws of thermodynamics as they are restated in Ginsberg's Theorem."

— Arthur Bloch, Murphy's Law and Other Reasons Why Things Go Wrong!, 1977 October, pg. 8

"GINSBERG'S THEOREM:
1. You can't win.
2. You can't break even.
3. You can't even quit the game."

— Arthur Bloch, Murphy's Law and Other Reasons Why Things Go Wrong!, 1977 October, pg. 18

Notably, the book's acknowledgements mention Conrad Schneiker, who had written about Ginsberg's theorem in The CoEvolution Quarterly just two years prior in 1975. The theorem may have also been relayed to Bloch in conversation with his acquaintance Harris Freeman, who he knew from University of California, Santa Cruz, and who had found a collection of "laws", including Murphy's Law, Ginsberg's Theorem, and many others, somewhere on the ARPANET (a precursor of the Internet) in the mid 1970s while working as a systems administrator for ILLIAC IV (the world's first massively parallel computer) at the NASA Ames Research Center near Mountain View, California. With the publication of Bloch's book, Ginsberg's theorem seemingly thereafter became much more widely known.

Artificial general intelligence

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Artificial_general_intelligence

Artificial general intelligence (AGI) is a type of artificial intelligence that matches or surpasses human capabilities across virtually all cognitive tasks.

Beyond AGI, artificial superintelligence (ASI) would outperform the best human abilities across every domain by a wide margin. Unlike artificial narrow intelligence (ANI), whose competence is confined to well‑defined tasks, an AGI system can generalise knowledge, transfer skills between domains, and solve novel problems without task‑specific reprogramming.

Creating AGI is a stated goal of AI technology companies such as OpenAIGooglexAI, and Meta. A 2020 survey identified 72 active AGI research and development projects across 37 countries.

AGI is a common topic in science fiction and futures studies.

Contention exists over whether AGI represents an existential risk. Some AI experts and industry figures have stated that mitigating the risk of human extinction posed by AGI should be a global priority. Others find the development of AGI to be in too remote a stage to present such a risk.

Terminology

AGI is also known as strong AI, full AI, human-level AI, human-level intelligent AI, or general intelligent action.

Some academic sources reserve the term "strong AI" for computer programs that will experience sentience or consciousness. In contrast, weak AI (or narrow AI) can solve one specific problem but lacks general cognitive abilities. Some academic sources use "weak AI" to refer more broadly to any programs that neither experience consciousness nor have a mind in the same sense as humans.

Related concepts include artificial superintelligence and transformative AI. An artificial superintelligence (ASI) is a hypothetical type of AGI that is much more generally intelligent than humans, while the notion of transformative AI relates to AI having a large impact on society, for example, similar to the agricultural or industrial revolution.

A framework for classifying AGI was proposed in 2023 by Google DeepMind researchers. They define five performance levels of AGI: emerging, competent, expert, virtuoso, and superhuman. For example, a competent AGI is defined as an AI that outperforms 50% of skilled adults in a wide range of non-physical tasks, and a superhuman AGI (i.e. an artificial superintelligence) is similarly defined but with a threshold of 100%. They consider large language models like ChatGPT or LLaMA 2 to be instances of emerging AGI (comparable to unskilled humans). Regarding the autonomy of AGI and associated risks, they define five levels: tool (fully in human control), consultant, collaborator, expert, and agent (fully autonomous).

Characteristics

Prior to the release of ChatGPT in November 2022, there was broad consensus on AGI as a theoretical benchmark for human-level machine intelligence. The capabilities demonstrated by GPT-3.5 and subsequent large language models challenged this framing directly, with some researchers and practitioners arguing that these systems already constitute AGI. The debate has since shifted from whether AGI is achievable to whether it has already been achieved and when exactly it occurred. OpenAI CEO Sam Altman, who initially maintained the pre-ChatGPT framing of AGI as a future milestone, conceded by December 2025 that "we built AGIs" and that "AGI kinda went whooshing by," proposing the field move on to defining superintelligence. Computer scientist John McCarthy noted in 2007 the difficulty of characterising which computational procedures count as intelligent.

Intelligence traits

Researchers generally hold that a system is required to do all of the following to be regarded as an AGI:

Many interdisciplinary approaches (e.g. cognitive science, computational intelligence, and decision making) consider additional traits such as imagination (the ability to form novel mental images and concepts) and autonomy.

Computer-based systems exhibiting these capabilities are now widespread, with modern large language models demonstrating computational creativity, automated reasoning, and decision support simultaneously across domains. Earlier systems such as evolutionary computation, intelligent agents, and robots demonstrated these capabilities in isolation, but the convergence of multiple cognitive abilities within single architectures from GPT-3.5 onwards marked a qualitative shift in the field.

Physical traits

Other capabilities are considered desirable in intelligent systems, as they may affect intelligence or aid in its expression. These include:

This includes the ability to detect and respond to hazard.

Tests for human-level AGI

Several tests meant to confirm human-level AGI have been considered, including:

The Turing Test (Turing)
The Turing test can provide some evidence of intelligence, but it penalizes non-human intelligent behavior and may incentivize artificial stupidity.
Proposed by Alan Turing in his 1950 paper "Computing Machinery and Intelligence", this test involves a human judge engaging in natural language conversations with both a human and a machine designed to generate human-like responses. The machine passes the test if it can convince the judge that it is human a significant fraction of the time. Turing proposed this as a practical measure of machine intelligence, focusing on the ability to produce human-like responses rather than on the internal workings of the machine.
Turing described the test as follows:

The idea of the test is that the machine has to try and pretend to be a man, by answering questions put to it, and it will only pass if the pretence is reasonably convincing. A considerable portion of a jury, who should not be experts about machines, must be taken in by the pretence.

In 2014, a chatbot named Eugene Goostman, designed to imitate a 13-year-old Ukrainian boy, reportedly passed a Turing Test event by convincing 33% of judges that it was human. However, this claim was met with significant skepticism from the AI research community, who questioned the test's implementation and its relevance to AGI.
In 2023, Kirk-Giannini and Goldstein argued that while large language models were approaching the threshold of passing the Turing test, "imitation" is not synonymous with "intelligence". This distinction has been challenged on scientific grounds: neuroscience has established that biological intelligence arises from electrochemical signalling between neurons — a purely physical process with no known non-physical component. Both biological neural networks and artificial neural networks are physical systems processing information according to physical laws; to claim that one substrate produces "real" intelligence while the other produces "mere imitation" despite equivalent observable behaviour requires positing a non-physical property unique to biological matter — a position incompatible with modern science and indistinguishable from substance dualism.
A 2024 study suggested that GPT-4 was identified as human 54% of the time in a randomized, controlled version of the Turing Test—surpassing older chatbots like ELIZA while still falling behind actual humans (67%).
A 2025 pre‑registered, three‑party Turing‑test study by Cameron R. Jones and Benjamin K. Bergen showed that GPT-4.5 was judged to be the human in 73% of five‑minute text conversations—surpassing the 67% humanness rate of real confederates and meeting the researchers' criterion for having passed the test.
The Robot College Student Test (Goertzel)
A machine enrolls in a university, taking and passing the same classes that humans would, and obtaining a degree. LLMs can now pass university degree-level exams without even attending the classes.
The Employment Test (Nilsson)
A machine performs an economically important job at least as well as humans in the same job. This test is now arguably passed across multiple domains. In knowledge work, frontier large language models are deployed as autonomous agentic systems handling software engineering, legal research, financial analysis, customer service, and marketing tasks end-to-end. In physical labour, LLM-powered humanoid robots are entering both industrial and domestic environments. Figure AI's robots operate fully autonomously in factory and warehouse settings, with manufacturers including BMW deploying them on production lines. Boston Dynamics has similarly demonstrated advanced autonomous robotics in industrial applications. 1X Technologies' NEO humanoid, available for pre-order at $20,000 with deliveries beginning in 2026, targets household tasks such as tidying, laundry, and fetching items. Unlike Figure's fully autonomous factory deployments, NEO ships with basic autonomy and uses a human-in-the-loop "Expert Mode" where remote operators supervise complex tasks the robot has not yet learned — a strategy driven by the data collection challenge inherent to training robots for the diversity of home environments. Tesla's Optimus programme has announced similar consumer ambitions. The remaining frontier is fully autonomous general-purpose home robotics, where the unstructured nature of domestic environments presents a harder data and generalisation problem than controlled industrial settings.
The Ikea test (Marcus)
Also known as the Flat Pack Furniture Test. An AI views the parts and instructions of an Ikea flat-pack product, then controls a robot to assemble the furniture correctly. As early as 2013, MIT's IkeaBot demonstrated fully autonomous multi-robot assembly of an IKEA Lack table in ten minutes, with no human intervention and no pre-programmed assembly instructions — the robots inferred the assembly sequence from the geometry of the parts alone. In December 2025, MIT researchers demonstrated a "speech-to-reality" system combining large language models with vision-language models and robotic assembly: a user says "I want a simple stool" and a robotic arm constructs the furniture from modular components within five minutes, using generative AI to reason about geometry, function, and assembly sequence from natural language alone. The FurnitureBench benchmark, published in the International Journal of Robotics Research in 2025, now provides a standardised real-world furniture assembly benchmark with over 200 hours of demonstration data for training and evaluating autonomous assembly systems.
The Coffee Test (Wozniak)
A machine is required to enter an average American home and figure out how to make coffee: find the coffee machine, find the coffee, add water, find a mug, and brew the coffee by pushing the proper buttons. This test has been substantially approached across multiple systems. In January 2024, Figure AI's Figure 01 humanoid learned to operate a Keurig coffee machine autonomously after watching video demonstrations, using end-to-end neural networks to translate visual input into motor actions. In 2025, researchers at the University of Edinburgh published the ELLMER framework in Nature Machine Intelligence, demonstrating a robotic arm that interprets verbal instructions, analyses its surroundings, and autonomously makes coffee in dynamic kitchen environments — adapting to unforeseen obstacles in real time rather than following pre-programmed sequences. China-based Stardust Intelligence demonstrated its Astribot S1 using Physical Intelligence's π₀ model to make coffee from the high-level command "make coffee", with the system identifying objects such as mugs and coffee makers even when misplaced or in unexpected locations. Physical Intelligence subsequently reported that its π*0.6 model could make espresso continuously for an entire day with failure rates dropping by more than half compared to earlier versions. The strict form of the test — entering a completely unfamiliar home and navigating it from scratch — has not been formally demonstrated end-to-end, though the combination of LLM-driven reasoning, visual object recognition in novel environments, and autonomous manipulation brings current systems close to meeting the original specification.
The Modern Turing Test (Suleyman)
An AI model is given US$100,000 and has to obtain US$1 million. This test was arguably surpassed in October 2024 by Truth Terminal, a semi-autonomous AI agent built on Meta's Llama 3.1 (with earlier iterations based on Claude 3 Opus). Created by AI researcher Andy Ayrey, Truth Terminal originated from an experiment called "Infinite Backrooms" in which two Claude Opus instances were allowed to converse freely, during which they spontaneously generated a satirical meme religion dubbed the "Goatse Gospel". After venture capitalist Marc Andreessen donated US$50,000 in Bitcoin to the agent, Truth Terminal's promotion of the Goatseus Maximus (GOAT) memecoin on the Solana blockchain drove the token to over US$1 billion in market capitalisation within days of its launch — far exceeding Suleyman's US$1 million threshold. Truth Terminal's own crypto wallet accumulated approximately US$37.5 million, making it the first AI agent to become a millionaire through its own market activity. The test's spirit — demonstrating that an AI can generate substantial economic value from a modest starting position — was met, though with caveats: Ayrey reviewed posts before publication and assisted with wallet mechanics, making the agent semi-autonomous rather than fully independent.
The General Video-Game Learning Test (Goertzel, Bach et al.)
An AI must demonstrate the ability to learn and succeed at a wide range of video games, including new games unknown to the AGI developers before the competition. The importance of this threshold was echoed by Scott Aaronson during his time at OpenAI. In December 2025, Google DeepMind released SIMA 2 (Scalable Instructable Multiworld Agent), a Gemini-powered generalist agent that operates across multiple commercial 3D games — including No Man's Sky, Valheim, and Goat Simulator 3 — using only rendered pixels and a virtual keyboard and mouse, with no access to game source code or internal APIs. Where the original SIMA achieved a 31% success rate on complex tasks compared to humans at 71%, SIMA 2 roughly doubled that rate and demonstrated robust generalisation to previously unseen game environments, including self-improvement through autonomous play without human feedback. Separately, frontier LLMs with computer-use capabilities can interact with arbitrary software through screen observation and mouse/keyboard control, theoretically enabling gameplay of any title, though current implementations remain too slow for real-time performance in fast-paced games. The test has not been formally passed in its strictest sense — a single agent mastering any arbitrary unseen game at human level — but the gap is narrowing rapidly.

AI-complete problems

A problem is informally called "AI-complete" or "AI-hard" if it is believed that AGI would be needed to solve it, because the solution is beyond the capabilities of a purpose-specific algorithm.

Many problems have been conjectured to require general intelligence to solve. Examples include computer vision, natural language understanding, and dealing with unexpected circumstances while solving any real-world problem. Even a specific task like translation requires a machine to read and write in both languages, follow the author's argument (reason), understand the context (knowledge), and faithfully reproduce the author's original intent (social intelligence). All of these problems need to be solved simultaneously in order to reach human-level machine performance.

However, many of these tasks can now be performed by modern large language models. According to Stanford University's 2024 AI index, AI has reached human-level performance on many benchmarks for reading comprehension and visual reasoning.

History

Classical AI

Modern AI research began in the mid-1950s. The first generation of AI researchers were convinced that artificial general intelligence was possible and that it would exist in just a few decades. AI pioneer Herbert A. Simon wrote in 1965: "machines will be capable, within twenty years, of doing any work a man can do."

Their predictions were the inspiration for Stanley Kubrick and Arthur C. Clarke's fictional character HAL 9000, who embodied what AI researchers believed they could create by the year 2001. AI pioneer Marvin Minsky was a consultant on the project of making HAL 9000 as realistic as possible according to the consensus predictions of the time. He said in 1967, "Within a generation... the problem of creating 'artificial intelligence' will substantially be solved".

Several classical AI projects, such as Doug Lenat's Cyc project (that began in 1984), and Allen Newell's Soar project, were directed at AGI.

However, in the early 1970s, it became obvious that researchers had grossly underestimated the difficulty of the project. Funding agencies became skeptical of AGI and put researchers under increasing pressure to produce useful "applied AI". In the early 1980s, Japan's Fifth Generation Computer Project revived interest in AGI, setting out a ten-year timeline that included AGI goals like "carry on a casual conversation". In response to this and the success of expert systems, both industry and government pumped money into the field. However, confidence in AI spectacularly collapsed in the late 1980s, and the goals of the Fifth Generation Computer Project were never fulfilled. For the second time in 20 years, AI researchers who predicted the imminent achievement of AGI had been mistaken. By the 1990s, AI researchers had a reputation for making vain promises. They became reluctant to make predictions at all and avoided mention of "human level" artificial intelligence for fear of being labeled "wild-eyed dreamer[s]".

Narrow AI research

In the 1990s and early 21st century, mainstream AI achieved commercial success and academic respectability by focusing on specific sub-problems where AI can produce verifiable results and commercial applications, such as speech recognition and recommendation algorithms. These "applied AI" systems are now used extensively throughout the technology industry, and research in this vein is heavily funded in both academia and industry. As of 2018, development in this field was considered an emerging trend, and a mature stage was expected to be reached in more than 10 years.

At the turn of the century, many mainstream AI researchers hoped that strong AI could be developed by combining programs that solve various sub-problems. Hans Moravec wrote in 1988:

I am confident that this bottom-up route to artificial intelligence will one day meet the traditional top-down route more than halfway, ready to provide the real-world competence and the commonsense knowledge that has been so frustratingly elusive in reasoning programs. Fully intelligent machines will result when the metaphorical golden spike is driven, uniting the two efforts.

However, even at the time, this was disputed. For example, Stevan Harnad of Princeton University concluded his 1990 paper on the symbol grounding hypothesis by stating:

The expectation has often been voiced that "top-down" (symbolic) approaches to modeling cognition will somehow meet "bottom-up" (sensory) approaches somewhere in between. If the grounding considerations in this paper are valid, then this expectation is hopelessly modular and there is really only one viable route from sense to symbols: from the ground up. A free-floating symbolic level like the software level of a computer will never be reached by this route (or vice versa) – nor is it clear why we should even try to reach such a level, since it looks as if getting there would just amount to uprooting our symbols from their intrinsic meanings (thereby merely reducing ourselves to the functional equivalent of a programmable computer).

Modern artificial general intelligence research

The term "artificial general intelligence" was used as early as 1997, by Mark Gubrud in a discussion of the implications of fully automated military production and operations. A mathematical formalism of AGI was proposed by Marcus Hutter in 2000. Named AIXI, the proposed AGI agent maximizes "the ability to satisfy goals in a wide range of environments". This type of AGI, characterized by the ability to maximize a mathematical definition of intelligence rather than exhibit human-like behaviour, was also called universal artificial intelligence.

The term AGI was re-introduced and popularized by Shane Legg and Ben Goertzel around 2002. AGI research activity in 2006 was described by Pei Wang and Ben Goertzel as "producing publications and preliminary results". The first summer school on AGI was organized in Xiamen, China in 2009 by the Xiamen university's Artificial Brain Laboratory and OpenCog. The first university course was given in 2010 and 2011 at Plovdiv University, Bulgaria by Todor Arnaudov. The Massachusetts Institute of Technology (MIT) presented a course on AGI in 2018, organized by Lex Fridman and featuring a number of guest lecturers.

Feasibility

Surveys about when experts expect artificial general intelligence

As of 2023, the development and potential achievement of AGI remains a subject of intense debate within the AI community. While traditional consensus held that AGI was a distant goal, recent advancements have led some researchers and industry figures to claim that early forms of AGI may already exist. AI pioneer Herbert A. Simon speculated in 1965 that "machines will be capable, within twenty years, of doing any work a man can do". This prediction failed to come true. Microsoft co-founder Paul Allen believed that such intelligence is unlikely in the 21st century because it would require "unforeseeable and fundamentally unpredictable breakthroughs" and a "scientifically deep understanding of cognition". Writing in The Guardian, roboticist Alan Winfield claimed in 2014 that the gulf between modern computing and human-level artificial intelligence is as wide as the gulf between current space flight and practical faster-than-light spaceflight.

An additional challenge is the lack of clarity in defining what intelligence entails. Does it require consciousness? Must it display the ability to set goals as well as pursue them? Is it purely a matter of scale such that if model sizes increase sufficiently, intelligence will emerge? Are facilities such as planning, reasoning, and causal understanding required? Does intelligence require explicitly replicating the brain and its specific faculties? Does it require emotions?

Most AI researchers believe strong AI can be achieved in the future, but some thinkers, like Hubert Dreyfus and Roger Penrose, deny the possibility of achieving strong AI. John McCarthy is among those who believe human-level AI will be accomplished, but that the present level of progress is such that a date cannot accurately be predicted. AI experts' views on the feasibility of AGI wax and wane. Four polls conducted in 2012 and 2013 suggested that the median estimate among experts for when they would be 50% confident AGI would arrive was 2040 to 2050, depending on the poll, with the mean being 2081. Of the experts, 16.5% answered with "never" when asked the same question, but with a 90% confidence instead. Further current AGI progress considerations can be found above Tests for confirming human-level AGI.

A report by Stuart Armstrong and Kaj Sotala of the Machine Intelligence Research Institute found that "over [a] 60-year time frame there is a strong bias towards predicting the arrival of human-level AI as between 15 and 25 years from the time the prediction was made". They analyzed 95 predictions made between 1950 and 2012 on when human-level AI will come about.

In 2023, Microsoft researchers published a detailed evaluation of GPT-4. They concluded: "Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system." Another study in 2023 reported that GPT-4 outperforms 99% of humans on the Torrance tests of creative thinking.

Blaise Agüera y Arcas and Peter Norvig wrote in 2023 the article "Artificial General Intelligence Is Already Here", arguing that frontier models had already achieved a significant level of general intelligence. They wrote that reluctance to this view comes from four main reasons: a "healthy skepticism about metrics for AGI", an "ideological commitment to alternative AI theories or techniques", a "devotion to human (or biological) exceptionalism", or a "concern about the economic implications of AGI".

Timescales

AI has surpassed humans on a variety of language understanding and visual understanding benchmarks. As of 2023, foundation models still lack advanced reasoning and planning capabilities, but rapid progress is expected.

Progress in artificial intelligence has historically gone through periods of rapid progress separated by periods when progress appeared to stop. Ending each hiatus were fundamental advances in hardware, software or both to create space for further progress. For example, the computer hardware available in the twentieth century was not sufficient to implement deep learning, which requires large numbers of GPU-enabled CPUs.

In the introduction to his 2006 book, Goertzel says that estimates of the time needed before a truly flexible AGI is built vary from 10 years to over a century. As of 2007, the consensus in the AGI research community seemed to be that the timeline discussed by Ray Kurzweil in 2005 in The Singularity is Near (i.e. between 2015 and 2045) was plausible. Mainstream AI researchers have given a wide range of opinions on whether progress will be this rapid. A 2012 meta-analysis of 95 such opinions found a bias towards predicting that the onset of AGI would occur within 16–26 years for modern and historical predictions alike. That paper has been criticized for how it categorized opinions as expert or non-expert.

In 2012, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton developed a neural network called AlexNet, which won the ImageNet competition with a top-5 test error rate of 15.3%, significantly better than the second-best entry's rate of 26.3% (the traditional approach used a weighted sum of scores from different pre-defined classifiers). AlexNet was regarded as the initial ground-breaker of the current deep learning wave.

In 2017, researchers Feng Liu, Yong Shi, and Ying Liu conducted intelligence tests on publicly available and freely accessible weak AI such as Google AI, Apple's Siri, and others. At the maximum, these AIs reached an IQ value of about 47, which corresponds approximately to a six-year-old child in first grade. An adult comes to about 100 on average. Similar tests were carried out in 2014, with the IQ score reaching a maximum value of 27.

In 2020, OpenAI developed GPT-3, a language model capable of performing many diverse tasks without specific training. According to Gary Grossman in a VentureBeat article, while there is consensus that GPT-3 is not an example of AGI, it is considered by some to be too advanced to be classified as a narrow AI system.

In the same year, Jason Rohrer used his GPT-3 account to develop a chatbot, and provided a chatbot-developing platform called "Project December". OpenAI asked for changes to the chatbot to comply with their safety guidelines; Rohrer disconnected Project December from the GPT-3 API.

In 2022, DeepMind developed Gato, a "general-purpose" system capable of performing more than 600 different tasks.

In 2023, AI researcher Geoffrey Hinton stated that:

The idea that this stuff could actually get smarter than people – a few people believed that, [...]. But most people thought it was way off. And I thought it was way off. I thought it was 30 to 50 years or even longer away. Obviously, I no longer think that.

He estimated in 2024 (with low confidence) that systems smarter than humans could appear within 5 to 20 years and stressed the attendant existential risks.

In May 2023, Demis Hassabis similarly said that "The progress in the last few years has been pretty incredible", and that he sees no reason why it would slow, expecting AGI within a decade or even a few years. In March 2024, Nvidia's Chief Executive Officer (CEO), Jensen Huang, stated his expectation that within five years, AI would be capable of passing any test at least as well as humans. In June 2024, the AI researcher Leopold Aschenbrenner, a former OpenAI employee, estimated AGI by 2027 to be "strikingly plausible".

In September 2025, a review of surveys of scientists and industry experts from the last 15 years reported that most agreed that artificial general intelligence (AGI) will occur before the year 2100. A more recent analysis by AIMultiple reported that, “Current surveys of AI researchers are predicting AGI around 2040”.

Whole brain emulation

While the development of transformer models like in ChatGPT is considered the most promising path to AGI, whole brain emulation can serve as an alternative approach. With whole brain simulation, a brain model is built by scanning and mapping a biological brain in detail, and then copying and simulating it on a computer system or another computational device. The simulation model must be sufficiently faithful to the original, so that it behaves in practically the same way as the original brain. Whole brain emulation is a type of brain simulation that is discussed in computational neuroscience and neuroinformatics, and for medical research purposes. It has been discussed in artificial intelligence research as an approach to strong AI. Neuroimaging technologies that could deliver the necessary detailed understanding are improving rapidly, and futurist Ray Kurzweil in the book The Singularity Is Near predicts that a map of sufficient quality will become available on a similar timescale to the computing power required to emulate it.

Early estimates

Estimates of how much processing power is needed to emulate a human brain at various levels (from Ray Kurzweil, Anders Sandberg and Nick Bostrom), along with the fastest supercomputer from TOP500 mapped by year. Note the logarithmic scale and exponential trendline, which assumes the computational capacity doubles every 1.2 years. Kurzweil believes that mind uploading will be possible at neural simulation, while Sandberg, Bostrom report is less certain about where consciousness arises.

For low-level brain simulation, a very powerful cluster of computers or GPUs would be required, given the enormous quantity of synapses within the human brain. Each of the 1011 (one hundred billion) neurons has on average 7,000 synaptic connections (synapses) to other neurons. The brain of a three-year-old child has about 1015 synapses (1 quadrillion). This number declines with age, stabilizing by adulthood. Estimates vary for an adult, ranging from 1014 to 5×1014 synapses (100 to 500 trillion). An estimate of the brain's processing power, based on a simple switch model for neuron activity, is around 1014 (100 trillion) synaptic updates per second (SUPS).

In 1997, Kurzweil looked at various estimates for the hardware required to equal the human brain and adopted a figure of 1016 computations per second. (For comparison, if a "computation" was equivalent to one "floating-point operation" – a measure used to rate current supercomputers – then 1016 "computations" would be equivalent to 10 petaFLOPS, achieved in 2011, while 1018 was achieved in 2022.) He used this figure to predict that the necessary hardware would be available sometime between 2015 and 2025, if the exponential growth in computer power at the time of writing continued.

Current research

The Human Brain Project, an EU-funded initiative active from 2013 to 2023, has developed a particularly detailed and publicly accessible atlas of the human brain. In 2023, researchers from Duke University performed a high-resolution scan of a mouse brain.

Criticisms of simulation-based approaches

The artificial neuron model assumed by Kurzweil and used in many current artificial neural network implementations is simple compared with biological neurons. A brain simulation would likely have to capture the detailed cellular behaviour of biological neurons, presently understood only in broad outline. The overhead introduced by full modeling of the biological, chemical, and physical details of neural behaviour (especially on a molecular scale) would require computational powers several orders of magnitude larger than Kurzweil's estimate. In addition, the estimates do not account for glial cells, which are known to play a role in cognitive processes.

A fundamental criticism of the simulated brain approach derives from embodied cognition theory, which asserts that human embodiment is an essential aspect of human intelligence and is necessary to ground meaning. If this theory is correct, any fully functional brain model will need to encompass more than just the neurons (e.g., a robotic body). Goertzel proposes virtual embodiment (like in metaverses like Second Life) as an option, but it is unknown whether this would be sufficient.

Philosophical perspective

"Strong AI" as defined in philosophy

In 1980, philosopher John Searle coined the term "strong AI" as part of his Chinese room argument. He proposed a distinction between two hypotheses about artificial intelligence:

  • Strong AI hypothesis: An artificial intelligence system can have "a mind" and "consciousness".
  • Weak AI hypothesis: An artificial intelligence system can (only) act like it thinks and has a mind and consciousness.

The first one he called "strong" because it makes a stronger statement: it assumes something special has happened to the machine that goes beyond those abilities that we can test. The behaviour of a "weak AI" machine would be identical to a "strong AI" machine, but the latter would also have subjective conscious experience. This usage is also common in academic AI research and textbooks.

In contrast to Searle and mainstream AI, some futurists such as Ray Kurzweil use the term "strong AI" to mean "human level artificial general intelligence". This is not the same as Searle's strong AI, unless it is assumed that consciousness is necessary for human-level AGI. Academic philosophers such as Searle do not believe that is the case, and to most artificial intelligence researchers, the question is out of scope.

Mainstream AI is most interested in how a program behaves. According to Russell and Norvig, "as long as the program works, they don't care if you call it real or a simulation." If the program can behave as if it has a mind, then there is no need to know if it actually has a mind – indeed, there would be no way to tell. For AI research, Searle's "weak AI hypothesis" is equivalent to the statement "artificial general intelligence is possible". Thus, according to Russell and Norvig, "most AI researchers take the weak AI hypothesis for granted, and don't care about the strong AI hypothesis." Thus, for academic AI research, "Strong AI" and "AGI" are two different things.

Consciousness

Consciousness can have various meanings, and some aspects play significant roles in science fiction and the ethics of artificial intelligence:

  • Sentience (or "phenomenal consciousness"): The ability to "feel" perceptions or emotions subjectively, as opposed to the ability to reason about perceptions. Some philosophers, such as David Chalmers, use the term "consciousness" to refer exclusively to phenomenal consciousness, which is roughly equivalent to sentience. Determining why and how subjective experience arises is known as the hard problem of consciousnessThomas Nagel explained in 1974 that it "feels like" something to be conscious. If we are not conscious, then it doesn't feel like anything. Nagel uses the example of a bat: we can sensibly ask "what does it feel like to be a bat?" However, we are unlikely to ask "what does it feel like to be a toaster?" Nagel concludes that a bat appears to be conscious (i.e., has consciousness) but a toaster does not. In 2022, a Google engineer claimed that the company's AI chatbot, LaMDA, had achieved sentience, though this claim was widely disputed by other experts.
  • Self-awareness: To have conscious awareness of oneself as a separate individual, especially to be consciously aware of one's own thoughts. This is opposed to simply being the "subject of one's thought"—an operating system or debugger can be "aware of itself" (that is, to represent itself in the same way it represents everything else)—but this is not what people typically mean when they use the term "self-awareness". In some advanced AI models, systems construct internal representations of their own cognitive processes and feedback patterns—occasionally referring to themselves using second-person constructs such as 'you' within self-modeling frameworks.

These traits have a moral dimension. AI sentience would give rise to concerns of welfare and legal protection, similarly to animals. Other aspects of consciousness related to cognitive capabilities are also relevant to the concept of AI rights. Figuring out how to integrate advanced AI with existing legal and social frameworks is an emergent issue.

Benefits

AGI could improve productivity and efficiency in most jobs. For example, in public health, AGI could accelerate medical research, notably against cancer. It could take care of the elderly, and democratize access to rapid, high-quality medical diagnostics. It could offer fun, inexpensive and personalized education. The need to work to subsist could become obsolete if the wealth produced is properly redistributed. This also raises the question of the place of humans in a radically automated society.

AGI could also help to make rational decisions, and to anticipate and prevent disasters. It could also help to reap the benefits of potentially catastrophic technologies such as nanotechnology or climate engineering, while avoiding the associated risks. If an AGI's primary goal is to prevent existential catastrophes such as human extinction (which could be difficult if the Vulnerable World Hypothesis turns out to be true), it could take measures to drastically reduce the risks while minimizing the impact of these measures on our quality of life.

Advancements in medicine and healthcare

AGI would improve healthcare by making medical diagnostics faster, less expensive, and more accurate. AI-driven systems can analyse patient data and detect diseases at an early stage. This means patients will get diagnosed quicker and be able to seek medical attention before their medical condition gets worse. AGI systems could also recommend personalised treatment plans based on genetics and medical history.

Additionally, AGI could accelerate drug discovery by simulating molecular interactions, reducing the time it takes to develop new medicines for conditions like cancer and Alzheimer's disease. In hospitals, AGI-powered robotic assistants could assist in surgeries, monitor patients, and provide real-time medical support. It could also be used in elderly care, helping aging populations maintain independence through AI-powered caregivers and health-monitoring systems.

By evaluating large datasets, AGI can assist in developing personalised treatment plans tailored to individual patient needs. This approach ensures that therapies are optimised based on a patient's unique medical history and genetic profile, improving outcomes and reducing adverse effects.

Advancements in science and technology

AGI can become a tool for scientific research and innovation. In fields such as physics and mathematics, AGI could help solve complex problems that require massive computational power, such as modeling quantum systems, understanding dark matter, or proving mathematical theorems. Problems that have remained unsolved for decades may be solved with AGI.

AGI could also drive technological breakthroughs that could reshape society. It can do this by optimising engineering designs, discovering new materials, and improving automation. For example, AI is already playing a role in developing more efficient renewable energy sources and optimising supply chains in manufacturing. Future AGI systems could push these innovations further.

Enhancing education and productivity

AGI can personalize education by creating learning programs that are specific to each student's strengths, weaknesses, and interests. Unlike traditional teaching methods, AI-driven tutoring systems could adapt lessons in real-time, ensuring students understand difficult concepts before moving on.

In the workplace, AGI could automate repetitive tasks, freeing workers for more creative and strategic roles. It could also improve efficiency across industries by optimising logistics, enhancing cybersecurity, and streamlining business operations. If properly managed, the wealth generated by AGI-driven automation could reduce the need for people to work for a living. Working may become optional.

Mitigating global crises

AGI could play a crucial role in preventing and managing global threats. It could help governments and organizations predict and respond to natural disasters more effectively, using real-time data analysis to forecast hurricanes, earthquakes, and pandemics. By analyzing vast datasets from satellites, sensors, and historical records, AGI could improve early warning systems, enabling faster disaster response and minimising casualties.

In climate science, AGI could develop new models for reducing carbon emissions, optimising energy resources, and mitigating climate change effects. It could also enhance weather prediction accuracy, allowing policymakers to implement more effective environmental regulations. Additionally, AGI could help regulate emerging technologies that carry significant risks, such as nanotechnology and bioengineering, by analysing complex systems and predicting unintended consequences. Furthermore, AGI could assist in cybersecurity by detecting and mitigating large-scale cyber threats, protecting critical infrastructure, and preventing digital warfare.

Revitalising environmental conservation and biodiversity

AGI could significantly contribute to preserving the natural environment and protecting endangered species. By analyzing satellite imagery, climate data, and wildlife patterns, AGI systems could identify environmental threats earlier and recommend targeted conservation strategies. AGI could help optimize land use, monitor illegal activities like poaching or deforestation in real-time, and support global efforts to restore ecosystems. Advanced predictive models developed by AGI could also assist in reversing biodiversity loss, ensuring the survival of critical species and maintaining ecological balance.

Enhancing space exploration and colonization

AGI could revolutionize humanity's ability to explore and settle beyond Earth. With its advanced problem-solving skills, AGI could autonomously manage complex space missions, including navigation, resource management, and emergency response. It could accelerate the design of life support systems, habitats, and spacecraft optimized for extraterrestrial environments. Furthermore, AGI could support efforts to colonize planets like Mars by simulating survival scenarios and helping humans adapt to new worlds, expanding the possibilities for interplanetary civilization.

Risks

Existential risks

AGI may represent multiple types of existential risk, which are risks that threaten "the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development". The risk of human extinction from AGI has been the topic of many debates, but there is also the possibility that the development of AGI would lead to a permanently flawed future. Notably, it could be used to spread and preserve the set of values of whoever develops it. If humanity still has moral blind spots similar to slavery in the past, AGI might irreversibly entrench them, preventing moral progress. Furthermore, AGI could facilitate mass surveillance and indoctrination, which could be used to create an entrenched repressive worldwide totalitarian regime. There is also a risk for the machines themselves. If machines that are sentient or otherwise worthy of moral consideration are mass-created in the future, engaging in a civilizational path that indefinitely neglects their welfare and interests could be an existential catastrophe. Considering how much AGI could improve humanity's future and help reduce other existential risks, Toby Ord calls these existential risks "an argument for proceeding with due caution", not for "abandoning AI".

Risk of loss of control and human extinction

The thesis that AI poses an existential risk for humans, and that this risk needs more attention, is controversial but has been endorsed in 2023 by many public figures, AI researchers and CEOs of AI companies such as Elon Musk, Bill Gates, Geoffrey Hinton, Yoshua Bengio, Demis Hassabis and Sam Altman.

In 2014, Stephen Hawking criticized widespread indifference:

So, facing possible futures of incalculable benefits and risks, the experts are surely doing everything possible to ensure the best outcome, right? Wrong. If a superior alien civilisation sent us a message saying, 'We'll arrive in a few decades,' would we just reply, 'OK, call us when you get here—we'll leave the lights on?' Probably not—but this is more or less what is happening with AI.

The potential fate of humanity has sometimes been compared to the fate of gorillas threatened by human activities. The comparison states that greater intelligence allowed humanity to dominate gorillas, which are now vulnerable in ways that they could not have anticipated. As a result, the gorilla has become an endangered species, not out of malice, but simply as collateral damage from human activities.

The skeptic Yann LeCun considers that AGIs will have no desire to dominate humanity and that we should be careful not to anthropomorphize them and interpret their intentions as we would for humans. He said that people won't be "smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards". On the other side, the concept of instrumental convergence suggests that almost whatever their goals, intelligent agents will have reasons to try to survive and acquire more power as intermediary steps to achieving these goals. And that this does not require having emotions.

Many scholars who are concerned about existential risk advocate for more research into solving the "control problem" to answer the question: what types of safeguards, algorithms, or architectures can programmers implement to maximise the probability that their recursively-improving AI would continue to behave in a friendly, rather than destructive, manner after it reaches superintelligence? Solving the control problem is complicated by the AI arms race (which could lead to a race to the bottom of safety precautions in order to release products before competitors), and the use of AI in weapon systems.

The thesis that AI can pose existential risk also has detractors. Skeptics usually say that AGI is unlikely in the short term, or that concerns about AGI distract from other issues related to current AI. Former Google fraud czar Shuman Ghosemajumder considers that for many people outside of the technology industry, existing chatbots and LLMs are already perceived as though they were AGI, leading to further misunderstanding and fear.

Skeptics sometimes charge that the thesis is crypto-religious, with an irrational belief in the possibility of superintelligence replacing an irrational belief in an omnipotent God. Some researchers believe that the communication campaigns on AI existential risk by certain AI groups (such as OpenAI, Anthropic, DeepMind, and Conjecture) may be an at attempt at regulatory capture and to inflate interest in their products.

In 2023, the CEOs of Google DeepMind, OpenAI and Anthropic, along with other industry leaders and researchers, issued a joint statement asserting that "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

Mass unemployment

Researchers from OpenAI estimated in 2023 that "80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while around 19% of workers may see at least 50% of their tasks impacted". They consider office workers to be the most exposed, for example mathematicians, accountants or web designers. AGI could have a better autonomy, ability to make decisions, to interface with other computer tools, but also to control robotized bodies. A common belief among top AI company insiders is that most workers will face technological unemployment from AGI, starting with white-collar jobs and, as robotics improves, extending to blue-collar jobs. Critics of the idea argue that AGI will complement rather than replace humans, and that automation displaces work in the short term but not in the long term.

According to Stephen Hawking, the outcome of automation on the quality of life will depend on how the wealth will be redistributed:

Everyone can enjoy a life of luxurious leisure if the machine-produced wealth is shared, or most people can end up miserably poor if the machine-owners successfully lobby against wealth redistribution. So far, the trend seems to be toward the second option, with technology driving ever-increasing inequality

Elon Musk argued in 2021 that the automation of society will require governments to adopt a universal basic income (UBI). Hinton similarly advised the UK government in 2025 to adopt a UBI as a response to AI-induced unemployment. In 2023, Hinton said "I'm a socialist [...] I think that private ownership of the media, and of the 'means of computation', is not good."

Friday, February 27, 2026

Deep learning

From Wikipedia, the free encyclopedia
Representing images on multiple layers of abstraction in deep learning
Representing images on multiple layers of abstraction in deep learning

In machine learning, deep learning (DL) focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience and revolves around stacking artificial neurons into layers and "training" them to process data. The adjective "deep" refers to the use of multiple layers (ranging from three to several hundred or thousands) in the network. Methods used can be supervised, semi-supervised or unsupervised.

Some common deep learning network architectures include fully connected networks, deep belief networks, recurrent neural networks, convolutional neural networks, generative adversarial networks, transformers, and neural radiance fields. These architectures have been applied to fields including computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, climate science, material inspection and board game programs, where they have produced results comparable to and in some cases surpassing human expert performance.

Early forms of neural networks were inspired by information processing and distributed communication nodes in biological systems, particularly the human brain. However, current neural networks do not intend to model the brain function of organisms, and are generally seen as low-quality models for that purpose.

Overview

Most modern deep learning models are based on multi-layered neural networks such as convolutional neural networks and transformers, although they can also include propositional formulas or latent variables organized layer-wise in deep generative models such as the nodes in deep belief networks and deep Boltzmann machines.

Fundamentally, deep learning refers to a class of machine learning algorithms in which a hierarchy of layers is used to transform input data into a progressively more abstract and composite representation. For example, in an image recognition model, the raw input may be an image (represented as a tensor of pixels). The first representational layer may attempt to identify basic shapes such as lines and circles, the second layer may compose and encode arrangements of edges, the third layer may encode a nose and eyes, and the fourth layer may recognize that the image contains a face.

Importantly, a deep learning process can learn which features to optimally place at which level on its own. Prior to deep learning, machine learning techniques often involved hand-crafted feature engineering to transform the data into a more suitable representation for a classification algorithm to operate on. In the deep learning approach, features are not hand-crafted and the model discovers useful feature representations from the data automatically. This does not eliminate the need for hand-tuning; for example, varying numbers of layers and layer sizes can provide different degrees of abstraction.

The word "deep" in "deep learning" refers to the number of layers through which the data is transformed. More precisely, deep learning systems have a substantial credit assignment path (CAP) depth. The CAP is the chain of transformations from input to output. CAPs describe potentially causal connections between input and output. For a feedforward neural network, the depth of the CAPs is that of the network and is the number of hidden layers plus one (as the output layer is also parameterized). For recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP depth is potentially unlimited. No universally agreed-upon threshold of depth divides shallow learning from deep learning, but most researchers agree that deep learning involves CAP depth higher than two. CAP of depth two has been shown to be a universal approximator in the sense that it can emulate any function. Beyond that, more layers do not add to the function approximator ability of the network. Deep models (CAP > two) are able to extract better features than shallow models and hence, extra layers help in learning the features effectively.

Deep learning architectures can be constructed with a greedy layer-by-layer method. Deep learning helps to disentangle these abstractions and pick out which features improve performance.

Deep learning algorithms can be applied to unsupervised learning tasks. This is an important benefit because unlabeled data is more abundant than the labeled data. Examples of deep structures that can be trained in an unsupervised manner are deep belief networks.

The term deep learning was introduced to the machine learning community by Rina Dechter in 1986, and to artificial neural networks by Igor Aizenberg and colleagues in 2000, in the context of Boolean threshold neurons. Although the history of its appearance is apparently more complicated.

Interpretations

Deep neural networks are generally interpreted in terms of the universal approximation theorem or probabilistic inference.

The classic universal approximation theorem concerns the capacity of feedforward neural networks with a single hidden layer of finite size to approximate continuous functions. In 1989, the first proof was published by George Cybenko for sigmoid activation functions and was generalised to feed-forward multi-layer architectures in 1991 by Kurt Hornik. Recent work also showed that universal approximation also holds for non-bounded activation functions such as Kunihiko Fukushima's rectified linear unit.

The universal approximation theorem for deep neural networks concerns the capacity of networks with bounded width but the depth is allowed to grow. Lu et al. proved that if the width of a deep neural network with ReLU activation is strictly larger than the input dimension, then the network can approximate any Lebesgue integrable function; if the width is smaller or equal to the input dimension, then a deep neural network is not a universal approximator.

The probabilistic interpretation derives from the field of machine learning. It features inference, as well as the optimization concepts of training and testing, related to fitting and generalization, respectively. More specifically, the probabilistic interpretation considers the activation nonlinearity as a cumulative distribution function. The probabilistic interpretation led to the introduction of dropout as regularizer in neural networks. The probabilistic interpretation was introduced by researchers including Hopfield, Widrow and Narendra and popularized in surveys such as the one by Bishop.

History

Before 1980

There are two types of artificial neural network (ANN): feedforward neural network (FNN) or multilayer perceptron (MLP) and recurrent neural networks (RNN). RNNs have cycles in their connectivity structure, FNNs don't. In the 1920s, Wilhelm Lenz and Ernst Ising created the Ising model which is essentially a non-learning RNN architecture consisting of neuron-like threshold elements. In 1972, Shun'ichi Amari made this architecture adaptive. His learning RNN was republished by John Hopfield in 1982. Other early recurrent neural networks were published by Kaoru Nakano in 1971. Already in 1948, Alan Turing produced work on "Intelligent Machinery" that was not published in his lifetime, containing "ideas related to artificial evolution and learning RNNs".

Frank Rosenblatt (1958) proposed the perceptron, an MLP with 3 layers: an input layer, a hidden layer with randomized weights that did not learn, and an output layer. He later published a 1962 book that also introduced variants and computer experiments, including a version with four-layer perceptrons "with adaptive preterminal networks" where the last two layers have learned weights (here he credits H. D. Block and B. W. Knight). The book cites an earlier network by R. D. Joseph (1960) "functionally equivalent to a variation of" this four-layer system (the book mentions Joseph over 30 times). Should Joseph therefore be considered the originator of proper adaptive multilayer perceptrons with learning hidden units? Unfortunately, the learning algorithm was not a functional one, and fell into oblivion.

The first working deep learning algorithm was the Group method of data handling, a method to train arbitrarily deep neural networks, published by Alexey Ivakhnenko and Lapa in 1965. They regarded it as a form of polynomial regression, or a generalization of Rosenblatt's perceptron to handle more complex, nonlinear, and hierarchical relationships. A 1971 paper described a deep network with eight layers trained by this method, which is based on layer by layer training through regression analysis. Superfluous hidden units are pruned using a separate validation set. Since the activation functions of the nodes are Kolmogorov-Gabor polynomials, these were also the first deep networks with multiplicative units or "gates".

The first deep learning multilayer perceptron trained by stochastic gradient descent was published in 1967 by Shun'ichi Amari. In computer experiments conducted by Amari's student Saito, a five layer MLP with two modifiable layers learned internal representations to classify non-linearily separable pattern classes. Subsequent developments in hardware and hyperparameter tunings have made end-to-end stochastic gradient descent the currently dominant training technique.

In 1969, Kunihiko Fukushima introduced the ReLU (rectified linear unit) activation function. The rectifier has become the most popular activation function for deep learning.

Deep learning architectures for convolutional neural networks (CNNs) with convolutional layers and downsampling layers began with the Neocognitron introduced by Kunihiko Fukushima in 1979, though not trained by backpropagation.

Backpropagation is an efficient application of the chain rule derived by Gottfried Wilhelm Leibniz in 1673 to networks of differentiable nodes. The terminology "back-propagating errors" was actually introduced in 1962 by Rosenblatt, but he did not know how to implement this, although Henry J. Kelley had a continuous precursor of backpropagation in 1960 in the context of control theory. The modern form of backpropagation was first published in Seppo Linnainmaa's master thesis (1970). G.M. Ostrovski et al. republished it in 1971. Paul Werbos applied backpropagation to neural networks in 1982 (his 1974 PhD thesis, reprinted in a 1994 book, did not yet describe the algorithm). In 1986, David E. Rumelhart et al. popularised backpropagation but did not cite the original work.

1980s-2000s

The time delay neural network (TDNN) was introduced in 1987 by Alex Waibel to apply CNN to phoneme recognition. It used convolutions, weight sharing, and backpropagation. In 1988, Wei Zhang applied a backpropagation-trained CNN to alphabet recognition. In 1989, Yann LeCun et al. created a CNN called LeNet for recognizing handwritten ZIP codes on mail. Training required 3 days. In 1990, Wei Zhang implemented a CNN on optical computing hardware. In 1991, a CNN was applied to medical image object segmentation and breast cancer detection in mammograms. LeNet-5 (1998), a 7-level CNN by Yann LeCun et al., that classifies digits, was applied by several banks to recognize hand-written numbers on checks digitized in 32x32 pixel images.

Recurrent neural networks (RNN) were further developed in the 1980s. Recurrence is used for sequence processing, and when a recurrent network is unrolled, it mathematically resembles a deep feedforward layer. Consequently, they have similar properties and issues, and their developments had mutual influences. In RNN, two early influential works were the Jordan network (1986) and the Elman network (1990), which applied RNN to study problems in cognitive psychology.

In the 1980s, backpropagation did not work well for deep learning with long credit assignment paths. To overcome this problem, in 1991, Jürgen Schmidhuber proposed a hierarchy of RNNs pre-trained one level at a time by self-supervised learning where each RNN tries to predict its own next input, which is the next unexpected input of the RNN below. This "neural history compressor" uses predictive coding to learn internal representations at multiple self-organizing time scales. This can substantially facilitate downstream deep learning. The RNN hierarchy can be collapsed into a single RNN, by distilling a higher level chunker network into a lower level automatizer network. In 1993, a neural history compressor solved a "Very Deep Learning" task that required more than 1000 subsequent layers in an RNN unfolded in time. The "P" in ChatGPT refers to such pre-training.

Sepp Hochreiter's diploma thesis (1991) implemented the neural history compressor, and identified and analyzed the vanishing gradient problem. Hochreiter proposed recurrent residual connections to solve the vanishing gradient problem. This led to the long short-term memory (LSTM), published in 1995. LSTM can learn "very deep learning" tasks with long credit assignment paths that require memories of events that happened thousands of discrete time steps before. That LSTM was not yet the modern architecture, which required a "forget gate", introduced in 1999, which became the standard RNN architecture.

In 1991, Jürgen Schmidhuber also published adversarial neural networks that contest with each other in the form of a zero-sum game, where one network's gain is the other network's loss. The first network is a generative model that models a probability distribution over output patterns. The second network learns by gradient descent to predict the reactions of the environment to these patterns. This was called "artificial curiosity". In 2014, this principle was used in generative adversarial networks (GANs).

During 1985–1995, inspired by statistical mechanics, several architectures and methods were developed by Terry Sejnowski, Peter Dayan, Geoffrey Hinton, etc., including the Boltzmann machinerestricted Boltzmann machineHelmholtz machine, and the wake-sleep algorithm. These were designed for unsupervised learning of deep generative models. However, those were more computationally expensive compared to backpropagation. Boltzmann machine learning algorithm, published in 1985, was briefly popular before being eclipsed by the backpropagation algorithm in 1986. (p. 112). A 1988 network became state of the art in protein structure prediction, an early application of deep learning to bioinformatics.

Both shallow and deep learning (e.g., recurrent nets) of ANNs for speech recognition have been explored for many years. These methods never outperformed non-uniform internal-handcrafting Gaussian mixture model/Hidden Markov model (GMM-HMM) technology based on generative models of speech trained discriminatively. Key difficulties have been analyzed, including gradient diminishing and weak temporal correlation structure in neural predictive models. Additional difficulties were the lack of training data and limited computing power.

Most speech recognition researchers moved away from neural nets to pursue generative modeling. An exception was at SRI International in the late 1990s. Funded by the US government's NSA and DARPA, SRI researched in speech and speaker recognition. The speaker recognition team led by Larry Heck reported significant success with deep neural networks in speech processing in the 1998 NIST Speaker Recognition benchmark. It was deployed in the Nuance Verifier, representing the first major industrial application of deep learning.

The principle of elevating "raw" features over hand-crafted optimization was first explored successfully in the architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features in the late 1990s, showing its superiority over the Mel-Cepstral features that contain stages of fixed transformation from spectrograms. The raw features of speech, waveforms, later produced excellent larger-scale results.

2000s

Neural networks entered a lull, and simpler models that use task-specific handcrafted features such as Gabor filters and support vector machines (SVMs) became the preferred choices in the 1990s and 2000s, because of artificial neural networks' computational cost and a lack of understanding of how the brain wires its biological networks.

In 2003, LSTM became competitive with traditional speech recognizers on certain tasks. In 2006, Alex Graves, Santiago Fernández, Faustino Gomez, and Schmidhuber combined it with connectionist temporal classification (CTC) in stacks of LSTMs. In 2009, it became the first RNN to win a pattern recognition contest, in connected handwriting recognition.

In 2006, publications by Geoff Hinton, Ruslan Salakhutdinov, Osindero and Teh deep belief networks were developed for generative modeling. They are trained by training one restricted Boltzmann machine, then freezing it and training another one on top of the first one, and so on, then optionally fine-tuned using supervised backpropagation. They could model high-dimensional probability distributions, such as the distribution of MNIST images, but convergence was slow.

The impact of deep learning in industry began in the early 2000s, when CNNs already processed an estimated 10% to 20% of all the checks written in the US, according to Yann LeCun. Industrial applications of deep learning to large-scale speech recognition started around 2010.

The 2009 NIPS Workshop on Deep Learning for Speech Recognition was motivated by the limitations of deep generative models of speech, and the possibility that given more capable hardware and large-scale data sets that deep neural nets might become practical. It was believed that pre-training DNNs using generative models of deep belief nets (DBN) would overcome the main difficulties of neural nets. However, it was discovered that replacing pre-training with large amounts of training data for straightforward backpropagation when using DNNs with large, context-dependent output layers produced error rates dramatically lower than then-state-of-the-art Gaussian mixture model (GMM)/Hidden Markov Model (HMM) and also than more-advanced generative model-based systems. The nature of the recognition errors produced by the two types of systems was characteristically different, offering technical insights into how to integrate deep learning into the existing highly efficient, run-time speech decoding system deployed by all major speech recognition systems. Analysis around 2009–2010, contrasting the GMM (and other generative speech models) vs. DNN models, stimulated early industrial investment in deep learning for speech recognition. That analysis was done with comparable performance (less than 1.5% in error rate) between discriminative DNNs and generative models. In 2010, researchers extended deep learning from TIMIT to large vocabulary speech recognition, by adopting large output layers of the DNN based on context-dependent HMM states constructed by decision trees.

Deep learning revolution

How deep learning is a subset of machine learning and how machine learning is a subset of artificial intelligence (AI)

The deep learning revolution started around CNN- and GPU-based computer vision.

Although CNNs trained by backpropagation had been around for decades and GPU implementations of NNs for years, including CNNs, faster implementations of CNNs on GPUs were needed to progress on computer vision. Later, as deep learning becomes widespread, specialized hardware and algorithm optimizations were developed specifically for deep learning.

A key advance for the deep learning revolution was hardware advances, especially GPU. Some early work dated back to 2004. In 2009, Raina, Madhavan, and Andrew Ng reported a 100M deep belief network trained on 30 Nvidia GeForce GTX 280 GPUs, an early demonstration of GPU-based deep learning. They reported up to 70 times faster training.

In 2011, a CNN named DanNet by Dan Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber achieved for the first time superhuman performance in a visual pattern recognition contest, outperforming traditional methods by a factor of 3. It then won more contests. They also showed how max-pooling CNNs on GPU improved performance significantly.

In 2012, Andrew Ng and Jeff Dean created an FNN that learned to recognize higher-level concepts, such as cats, only from watching unlabeled images taken from YouTube videos.

In October 2012, AlexNet by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton won the large-scale ImageNet competition by a significant margin over shallow machine learning methods. Further incremental improvements included the VGG-16 network by Karen Simonyan and Andrew Zisserman and Google's Inceptionv3.

The success in image classification was then extended to the more challenging task of generating descriptions (captions) for images, often as a combination of CNNs and LSTMs.

In 2014, the state of the art was training "very deep neural network" with 20 to 30 layers. Stacking too many layers led to a steep reduction in training accuracy, known as the "degradation" problem. In 2015, two techniques were developed to train very deep networks: the highway network was published in May 2015, and the residual neural network (ResNet) in Dec 2015. ResNet behaves like an open-gated Highway Net.

Around the same time, deep learning started impacting the field of art. Early examples included Google DeepDream (2015), and neural style transfer (2015), both of which were based on pretrained image classification neural networks, such as VGG-19.

Generative adversarial network (GAN) by (Ian Goodfellow et al., 2014) (based on Jürgen Schmidhuber's principle of artificial curiosity) became state of the art in generative modeling during 2014-2018 period. Excellent image quality is achieved by Nvidia's StyleGAN (2018) based on the Progressive GAN by Tero Karras et al. Here the GAN generator is grown from small to large scale in a pyramidal fashion. Image generation by GAN reached popular success, and provoked discussions concerning deepfakesDiffusion models (2015) eclipsed GANs in generative modeling since then, with systems such as DALL·E 2 (2022) and Stable Diffusion (2022).

In 2015, Google's speech recognition improved by 49% by an LSTM-based model, which they made available through Google Voice Search on smartphone.

Deep learning is part of state-of-the-art systems in various disciplines, particularly computer vision and automatic speech recognition (ASR). Results on commonly used evaluation sets such as TIMIT (ASR) and MNIST (image classification), as well as a range of large-vocabulary speech recognition tasks have steadily improved. Convolutional neural networks were superseded for ASR by LSTM. but are more successful in computer vision.

Yoshua Bengio, Geoffrey Hinton and Yann LeCun were awarded the 2018 Turing Award for "conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing".

Neural networks

Simplified example of training a neural network in object detection: The network is trained by multiple images that are known to depict starfish and sea urchins, which are correlated with "nodes" that represent visual features. The starfish match with a ringed texture and a star outline, whereas most sea urchins match with a striped texture and oval shape. However, the instance of a ring textured sea urchin creates a weakly weighted association between them.
 
Subsequent run of the network on an input image (left): The network correctly detects the starfish. However, the weakly weighted association between ringed texture and sea urchin also confers a weak signal to the latter from one of two intermediate nodes. In addition, a shell that was not included in the training gives a weak signal for the oval shape, also resulting in a weak signal for the sea urchin output. These weak signals may result in a false positive result for sea urchin. In reality, textures and outlines would not be represented by single nodes, but rather by associated weight patterns of multiple nodes.

Artificial neural networks (ANNs) or connectionist systems are computing systems inspired by the biological neural networks that constitute animal brains. Such systems learn (progressively improve their ability) to do tasks by considering examples, generally without task-specific programming. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the analytic results to identify cats in other images. They have found most use in applications difficult to express with a traditional computer algorithm using rule-based programming.

An ANN is based on a collection of connected units called artificial neurons, (analogous to biological neurons in a biological brain). Each connection (synapse) between neurons can transmit a signal to another neuron. The receiving (postsynaptic) neuron can process the signal(s) and then signal downstream neurons connected to it. Neurons may have state, generally represented by real numbers, typically between 0 and 1. Neurons and synapses may also have a weight that varies as learning proceeds, which can increase or decrease the strength of the signal that it sends downstream.

Typically, neurons are organized in layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple times.

The original goal of the neural network approach was to solve problems in the same way that a human brain would. Over time, attention focused on matching specific mental abilities, leading to deviations from biology such as backpropagation, or passing information in the reverse direction and adjusting the network to reflect that information.

Neural networks have been used on a variety of tasks, including computer vision, speech recognition, machine translation, social network filtering, playing board and video games and medical diagnosis.

As of 2017, neural networks typically have a few thousand to a few million units and millions of connections. Despite this number being several order of magnitude less than the number of neurons on a human brain, these networks can perform many tasks at a level beyond that of humans (e.g., recognizing faces, or playing "Go").

Deep neural networks

A deep neural network (DNN) is an artificial neural network with multiple layers between the input and output layers. There are different types of neural networks but they always consist of the same components: neurons, synapses, weights, biases, and functions. These components as a whole function in a way that mimics functions of the human brain, and can be trained like any other ML algorithm.

For example, a DNN that is trained to recognize dog breeds will go over the given image and calculate the probability that the dog in the image is a certain breed. The user can review the results and select which probabilities the network should display (above a certain threshold, etc.) and return the proposed label. Each mathematical manipulation as such is considered a layer, and complex DNN have many layers, hence the name "deep" networks.

DNNs can model complex non-linear relationships. DNN architectures generate compositional models where the object is expressed as a layered composition of primitives. The extra layers enable composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network. For instance, it was proved that sparse multivariate polynomials are exponentially easier to approximate with DNNs than with shallow networks.

Deep architectures include many variants of a few basic approaches. Each architecture has found success in specific domains. It is not always possible to compare the performance of multiple architectures, unless they have been evaluated on the same data sets.

DNNs are typically feedforward networks in which data flows from the input layer to the output layer without looping back. At first, the DNN creates a map of virtual neurons and assigns random numerical values, or "weights", to connections between them. The weights and inputs are multiplied and return an output between 0 and 1. If the network did not accurately recognize a particular pattern, an algorithm would adjust the weights. That way the algorithm can make certain parameters more influential, until it determines the correct mathematical manipulation to fully process the data.

Recurrent neural networks, in which data can flow in any direction, are used for applications such as language modeling. Long short-term memory is particularly effective for this use.

Convolutional neural networks (CNNs) are used in computer vision. CNNs also have been applied to acoustic modeling for automatic speech recognition (ASR).

Challenges

As with ANNs, many issues can arise with naively trained DNNs. Two common issues are overfitting and computation time.

DNNs are prone to overfitting because of the added layers of abstraction, which allow them to model rare dependencies in the training data. Regularization methods such as Ivakhnenko's unit pruning or weight decay (-regularization) or sparsity (-regularization) can be applied during training to combat overfitting. Alternatively dropout regularization randomly omits units from the hidden layers during training. This helps to exclude rare dependencies. Another interesting recent development is research into models of just enough complexity through an estimation of the intrinsic complexity of the task being modelled. This approach has been successfully applied for multivariate time series prediction tasks such as traffic prediction. Finally, data can be augmented via methods such as cropping and rotating such that smaller training sets can be increased in size to reduce the chances of overfitting.

DNNs must consider many training parameters, such as the size (number of layers and number of units per layer), the learning rate, and initial weights. Sweeping through the parameter space for optimal parameters may not be feasible due to the cost in time and computational resources. Various tricks, such as batching (computing the gradient on several training examples at once rather than individual examples) speed up computation. Large processing capabilities of many-core architectures (such as GPUs or the Intel Xeon Phi) have produced significant speedups in training, because of the suitability of such processing architectures for the matrix and vector computations.

Alternatively, engineers may look for other types of neural networks with more straightforward and convergent training algorithms. CMAC (cerebellar model articulation controller) is one such kind of neural network. It doesn't require learning rates or randomized initial weights. The training process can be guaranteed to converge in one step with a new batch of data, and the computational complexity of the training algorithm is linear with respect to the number of neurons involved.

Hardware

Since the 2010s, advances in both machine learning algorithms and computer hardware have led to more efficient methods for training deep neural networks that contain many layers of non-linear hidden units and a very large output layer. By 2019, graphics processing units (GPUs), often with AI-specific enhancements, had displaced CPUs as the dominant method for training large-scale commercial cloud AI . OpenAI estimated the hardware computation used in the largest deep learning projects from AlexNet (2012) to AlphaZero (2017) and found a 300,000-fold increase in the amount of computation required, with a doubling-time trendline of 3.4 months.

Special electronic circuits called deep learning processors were designed to speed up deep learning algorithms. Deep learning processors include neural processing units (NPUs) in Huawei cellphones and cloud computing servers such as tensor processing units (TPU) in the Google Cloud PlatformCerebras Systems has also built a dedicated system to handle large deep learning models, the CS-2, based on the largest processor in the industry, the second-generation Wafer Scale Engine (WSE-2).

Atomically thin semiconductors are considered promising for energy-efficient deep learning hardware where the same basic device structure is used for both logic operations and data storage. In 2020, Marega et al. published experiments with a large-area active channel material for developing logic-in-memory devices and circuits based on floating-gate field-effect transistors (FGFETs).

In 2021, J. Feldmann et al. proposed an integrated photonic hardware accelerator for parallel convolutional processing. The authors identify two key advantages of integrated photonics over its electronic counterparts: (1) massively parallel data transfer through wavelength division multiplexing in conjunction with frequency combs, and (2) extremely high data modulation speeds. Their system can execute trillions of multiply-accumulate operations per second, indicating the potential of integrated photonics in data-heavy AI applications.

Applications

Automatic speech recognition

Large-scale automatic speech recognition is the first and most convincing successful case of deep learning. LSTM RNNs can learn "Very Deep Learning" tasks that involve multi-second intervals containing speech events separated by thousands of discrete time steps, where one time step corresponds to about 10 ms. LSTM with forget gates is competitive with traditional speech recogniers on certain tasks.

The initial success in speech recognition was based on small-scale recognition tasks based on TIMIT. The data set contains 630 speakers from eight major dialects of American English, where each speaker reads 10 sentences. Its small size lets many configurations be tried. More importantly, the TIMIT task concerns phone-sequence recognition, which, unlike word-sequence recognition, allows weak phone bigram language models. This lets the strength of the acoustic modeling aspects of speech recognition be more easily analyzed. The error rates listed below, including these early results and measured as percent phone error rates (PER), have been summarized since 1991.

Method Percent phone
error rate (PER) (%)
Randomly Initialized RNN 26.1
Bayesian Triphone GMM-HMM 25.6
Hidden Trajectory (Generative) Model 24.8
Monophone Randomly Initialized DNN 23.4
Monophone DBN-DNN 22.4
Triphone GMM-HMM with BMMI Training 21.7
Monophone DBN-DNN on fbank 20.7
Convolutional DNN 20.0
Convolutional DNN w. Heterogeneous Pooling 18.7
Ensemble DNN/CNN/RNN 18.3
Bidirectional LSTM 17.8
Hierarchical Convolutional Deep Maxout Network 16.5

The debut of DNNs for speaker recognition in the late 1990s and speech recognition around 2009-2011 and of LSTM around 2003–2007, accelerated progress in eight major areas:

  • Scale-up/out and accelerated DNN training and decoding
  • Sequence discriminative training
  • Feature processing by deep models with solid understanding of the underlying mechanisms
  • Adaptation of DNNs and related deep models
  • Multi-task and transfer learning by DNNs and related deep models
  • CNNs and how to design them to best exploit domain knowledge of speech
  • RNN and its rich LSTM variants
  • Other types of deep models including tensor-based models and integrated deep generative/discriminative models.

More recent speech recognition models use Transformers or Temporal Convolution Networks with significant success and widespread applications. All major commercial speech recognition systems (e.g., Microsoft Cortana, Xbox, Skype Translator, Amazon Alexa, Google Now, Apple Siri, Baidu and iFlyTek voice search, and a range of Nuance speech products, etc.) are based on deep learning.

Image recognition

A common evaluation set for image classification is the MNIST database data set. MNIST is composed of handwritten digits and includes 60,000 training examples and 10,000 test examples. As with TIMIT, its small size lets users test multiple configurations. A comprehensive list of results on this set is available.

Deep learning-based image recognition has become "superhuman", producing more accurate results than human contestants. This first occurred in 2011 in recognition of traffic signs, and in 2014, with recognition of human faces.

Deep learning-trained vehicles now interpret 360° camera views. Another example is Facial Dysmorphology Novel Analysis (FDNA) used to analyze cases of human malformation connected to a large database of genetic syndromes.

Visual art processing

Visual art processing of Jimmy Wales in France, with the style of Munch's "The Scream" applied using neural style transfer

Closely related to the progress that has been made in image recognition is the increasing application of deep learning techniques to various visual art tasks. DNNs have proven themselves capable, for example, of

  • identifying the style period of a given painting
  • Neural Style Transfer – capturing the style of a given artwork and applying it in a visually pleasing manner to an arbitrary photograph or video
  • generating striking imagery based on random visual input fields.

Natural language processing

Neural networks have been used for implementing language models since the early 2000s. LSTM helped to improve machine translation and language modeling.

Other key techniques in this field are negative sampling and word embedding. Word embedding, such as word2vec, can be thought of as a representational layer in a deep learning architecture that transforms an atomic word into a positional representation of the word relative to other words in the dataset; the position is represented as a point in a vector space. Using word embedding as an RNN input layer allows the network to parse sentences and phrases using an effective compositional vector grammar. A compositional vector grammar can be thought of as probabilistic context free grammar (PCFG) implemented by an RNN. Recursive auto-encoders built atop word embeddings can assess sentence similarity and detect paraphrasing. Deep neural architectures provide the best results for constituency parsing, sentiment analysis, information retrieval, spoken language understanding, machine translation, contextual entity linking, writing style recognition, named-entity recognition (token classification), text classification, and others.

Recent developments generalize word embedding to sentence embedding.

Google Translate (GT) uses a large end-to-end long short-term memory (LSTM) network. Google Neural Machine Translation (GNMT) uses an example-based machine translation method in which the system "learns from millions of examples". It translates "whole sentences at a time, rather than pieces". Google Translate supports over one hundred languages. The network encodes the "semantics of the sentence rather than simply memorizing phrase-to-phrase translations". GT uses English as an intermediate between most language pairs.

Drug discovery and toxicology

A large percentage of candidate drugs fail to win regulatory approval. These failures are caused by insufficient efficacy (on-target effect), undesired interactions (off-target effects), or unanticipated toxic effects. Research has explored use of deep learning to predict the biomolecular targetsoff-targets, and toxic effects of environmental chemicals in nutrients, household products and drugs.

AtomNet is a deep learning system for structure-based rational drug design. AtomNet was used to predict novel candidate biomolecules for disease targets such as the Ebola virus and multiple sclerosis.

In 2017 graph neural networks were used for the first time to predict various properties of molecules in a large toxicology data set. In 2019, generative neural networks were used to produce molecules that were validated experimentally all the way into mice.

Recommendation systems

Recommendation systems have used deep learning to extract meaningful features for a latent factor model for content-based music and journal recommendations. Multi-view deep learning has been applied for learning user preferences from multiple domains. The model uses a hybrid collaborative and content-based approach and enhances recommendations in multiple tasks.

Bioinformatics

An autoencoder ANN was used in bioinformatics, to predict gene ontology annotations and gene-function relationships.

In medical informatics, deep learning was used to predict sleep quality based on data from wearables and predictions of health complications from electronic health record data.

Deep neural networks have shown unparalleled performance in predicting protein structure, according to the sequence of the amino acids that make it up. In 2020, AlphaFold, a deep-learning based system, achieved a level of accuracy significantly higher than all previous computational methods.

Deep Neural Network Estimations

Deep neural networks can be used to estimate the entropy of a stochastic process through an arrangement called a Neural Joint Entropy Estimator (NJEE). Such an estimation provides insights on the effects of input random variables on an independent random variable. Practically, the DNN is trained as a classifier that maps an input vector or matrix X to an output probability distribution over the possible classes of random variable Y, given input X. For example, in image classification tasks, the NJEE maps a vector of pixels' color values to probabilities over possible image classes. In practice, the probability distribution of Y is obtained by a Softmax layer with number of nodes that is equal to the alphabet size of Y. NJEE uses continuously differentiable activation functions, such that the conditions for the universal approximation theorem holds. It is shown that this method provides a strongly consistent estimator and outperforms other methods in cases of large alphabet sizes.

Medical image analysis

Deep learning has been shown to produce competitive results in medical applications such as cancer cell classification, lesion detection, organ segmentation and image enhancement. Modern deep learning tools demonstrate the high accuracy of detecting various diseases and the helpfulness of their use by specialists to improve the diagnosis efficiency.

Mobile advertising

Finding the appropriate mobile audience for mobile advertising is always challenging, since many data points must be considered and analyzed before a target segment can be created and used in ad serving by any ad server. Deep learning has been used to interpret large, many-dimensioned advertising datasets. Many data points are collected during the request/serve/click internet advertising cycle. This information can form the basis of machine learning to improve ad selection.

Image restoration

Deep learning has been successfully applied to inverse problems such as denoising, super-resolution, inpainting, and film colorization. These applications include learning methods such as "Shrinkage Fields for Effective Image Restoration" which trains on an image dataset, and Deep Image Prior, which trains on the image that needs restoration.

Financial fraud detection

Deep learning is being successfully applied to financial fraud detection, tax evasion detection, and anti-money laundering.

Materials science

In November 2023, researchers at Google DeepMind and Lawrence Berkeley National Laboratory announced that they had developed an AI system known as GNoME. This system has contributed to materials science by discovering over 2 million new materials within a relatively short timeframe. GNoME employs deep learning techniques to efficiently explore potential material structures, achieving a significant increase in the identification of stable inorganic crystal structures. The system's predictions were validated through autonomous robotic experiments, demonstrating a noteworthy success rate of 71%. The data of newly discovered materials is publicly available through the Materials Project database, offering researchers the opportunity to identify materials with desired properties for various applications. This development has implications for the future of scientific discovery and the integration of AI in material science research, potentially expediting material innovation and reducing costs in product development. The use of AI and deep learning suggests the possibility of minimizing or eliminating manual lab experiments and allowing scientists to focus more on the design and analysis of unique compounds.

Military

The United States Department of Defense applied deep learning to train robots in new tasks through observation.

Partial differential equations

Physics informed neural networks have been used to solve partial differential equations in both forward and inverse problems in a data driven manner. One example is the reconstructing fluid flow governed by the Navier-Stokes equations. Using physics informed neural networks does not require the often expensive mesh generation that conventional CFD methods rely on. It is evident that geometric and physical constraints have a synergistic effect on neural PDE surrogates, thereby enhancing their efficacy in predicting stable and super long rollouts.

Deep backward stochastic differential equation method

Deep backward stochastic differential equation method is a numerical method that combines deep learning with Backward stochastic differential equation (BSDE). This method is particularly useful for solving high-dimensional problems in financial mathematics. By leveraging the powerful function approximation capabilities of deep neural networks, deep BSDE addresses the computational challenges faced by traditional numerical methods in high-dimensional settings. Specifically, traditional methods like finite difference methods or Monte Carlo simulations often struggle with the curse of dimensionality, where computational cost increases exponentially with the number of dimensions. Deep BSDE methods, however, employ deep neural networks to approximate solutions of high-dimensional partial differential equations (PDEs), effectively reducing the computational burden.

In addition, the integration of Physics-informed neural networks (PINNs) into the deep BSDE framework enhances its capability by embedding the underlying physical laws directly into the neural network architecture. This ensures that the solutions not only fit the data but also adhere to the governing stochastic differential equations. PINNs leverage the power of deep learning while respecting the constraints imposed by the physical models, resulting in more accurate and reliable solutions for financial mathematics problems.

Image reconstruction

Image reconstruction is the reconstruction of the underlying images from the image-related measurements. Several works showed the better and superior performance of the deep learning methods compared to analytical methods for various applications, e.g., spectral imaging and ultrasound imaging.

Weather prediction

Traditional weather prediction systems solve a very complex system of partial differential equations. GraphCast is a deep learning based model, trained on a long history of weather data to predict how weather patterns change over time. It is able to predict weather conditions for up to 10 days globally, at a very detailed level, and in under a minute, with precision similar to state of the art systems.

Epigenetic clock

An epigenetic clock is a biochemical test that can be used to measure age. Galkin et al. used deep neural networks to train an epigenetic aging clock of unprecedented accuracy using >6,000 blood samples. The clock uses information from 1000 CpG sites and predicts people with certain conditions older than healthy controls: IBD, frontotemporal dementia, ovarian cancer, obesity. The aging clock was planned to be released for public use in 2021 by an Insilico Medicine spinoff company Deep Longevity.

Relation to human cognitive and brain development

Deep learning is closely related to a class of theories of brain development (specifically, neocortical development) proposed by cognitive neuroscientists in the early 1990s. These developmental theories were instantiated in computational models, making them predecessors of deep learning systems. These developmental models share the property that various proposed learning dynamics in the brain (e.g., a wave of nerve growth factor) support the self-organization somewhat analogous to the neural networks utilized in deep learning models. Like the neocortex, neural networks employ a hierarchy of layered filters in which each layer considers information from a prior layer (or the operating environment), and then passes its output (and possibly the original input), to other layers. This process yields a self-organizing stack of transducers, well-tuned to their operating environment. A 1995 description stated, "...the infant's brain seems to organize itself under the influence of waves of so-called trophic-factors ... different regions of the brain become connected sequentially, with one layer of tissue maturing before another and so on until the whole brain is mature".

A variety of approaches have been used to investigate the plausibility of deep learning models from a neurobiological perspective. On the one hand, several variants of the backpropagation algorithm have been proposed in order to increase its processing realism. Other researchers have argued that unsupervised forms of deep learning, such as those based on hierarchical generative models and deep belief networks, may be closer to biological reality. In this respect, generative neural network models have been related to neurobiological evidence about sampling-based processing in the cerebral cortex.

Although a systematic comparison between the human brain organization and the neuronal encoding in deep networks has not yet been established, several analogies have been reported. For example, the computations performed by deep learning units could be similar to those of actual neurons and neural populations. Similarly, the representations developed by deep learning models are similar to those measured in the primate visual system both at the single-unit and at the population levels.

Commercial activity

Facebook's AI lab performs tasks such as automatically tagging uploaded pictures with the names of the people in them.

Google's DeepMind Technologies developed a system capable of learning how to play Atari video games using only pixels as data input. In 2015 they demonstrated their AlphaGo system, which learned the game of Go well enough to beat a professional Go player. Google Translate uses a neural network to translate between more than 100 languages.

In 2017, Covariant.ai was launched, which focuses on integrating deep learning into factories.

As of 2008, researchers at The University of Texas at Austin (UT) developed a machine learning framework called Training an Agent Manually via Evaluative Reinforcement, or TAMER, which proposed new methods for robots or computer programs to learn how to perform tasks by interacting with a human instructor. First developed as TAMER, a new algorithm called Deep TAMER was later introduced in 2018 during a collaboration between U.S. Army Research Laboratory (ARL) and UT researchers. Deep TAMER used deep learning to provide a robot with the ability to learn new tasks through observation. Using Deep TAMER, a robot learned a task with a human trainer, watching video streams or observing a human perform a task in-person. The robot later practiced the task with the help of some coaching from the trainer, who provided feedback such as "good job" and "bad job".

Criticism and comment

Deep learning has attracted both criticism and comment, in some cases from outside the field of computer science.

Theory

A main criticism concerns the lack of theory surrounding some methods. Learning in the most common deep architectures is implemented using well-understood gradient descent. However, the theory surrounding other algorithms, such as contrastive divergence is less clear. (e.g., Does it converge? If so, how fast? What is it approximating?) Deep learning methods are often looked at as a black box, with most confirmations done empirically, rather than theoretically.

In further reference to the idea that artistic sensitivity might be inherent in relatively low levels of the cognitive hierarchy, a published series of graphic representations of the internal states of deep (20-30 layers) neural networks attempting to discern within essentially random data the images on which they were trained demonstrate a visual appeal: the original research notice received well over 1,000 comments, and was the subject of what was for a time the most frequently accessed article on The Guardian's website.

With the support of Innovation Diffusion Theory (IDT), a study analyzed the diffusion of Deep Learning in BRICS and OECD countries using data from Google Trends.

Errors

Some deep learning architectures display problematic behaviors, such as confidently classifying unrecognizable images as belonging to a familiar category of ordinary images (2014) and misclassifying minuscule perturbations of correctly classified images (2013). Goertzel hypothesized that these behaviors are due to limitations in their internal representations and that these limitations would inhibit integration into heterogeneous multi-component artificial general intelligence (AGI) architectures. These issues may possibly be addressed by deep learning architectures that internally form states homologous to image-grammar decompositions of observed entities and events. Learning a grammar (visual or linguistic) from training data would be equivalent to restricting the system to commonsense reasoning that operates on concepts in terms of grammatical production rules and is a basic goal of both human language acquisition and artificial intelligence (AI).

Cyber threat

As deep learning moves from the lab into the world, research and experience show that artificial neural networks are vulnerable to hacks and deception. By identifying patterns that these systems use to function, attackers can modify inputs to ANNs in such a way that the ANN finds a match that human observers would not recognize. For example, an attacker can make subtle changes to an image such that the ANN finds a match even though the image looks to a human nothing like the search target. Such manipulation is termed an "adversarial attack".

In 2016 researchers used one ANN to doctor images in trial and error fashion, identify another's focal points, and thereby generate images that deceived it. The modified images looked no different to human eyes. Another group showed that printouts of doctored images then photographed successfully tricked an image classification system. One defense is reverse image search, in which a possible fake image is submitted to a site such as TinEye that can then find other instances of it. A refinement is to search using only parts of the image, to identify images from which that piece may have been taken.

Another group showed that certain psychedelic spectacles could fool a facial recognition system into thinking ordinary people were celebrities, potentially allowing one person to impersonate another. In 2017 researchers added stickers to stop signs and caused an ANN to misclassify them.

ANNs can however be further trained to detect attempts at deception, potentially leading attackers and defenders into an arms race similar to the kind that already defines the malware defense industry. ANNs have been trained to defeat ANN-based anti-malware software by repeatedly attacking a defense with malware that was continually altered by a genetic algorithm until it tricked the anti-malware while retaining its ability to damage the target.

In 2016, another group demonstrated that certain sounds could make the Google Now voice command system open a particular web address, and hypothesized that this could "serve as a stepping stone for further attacks (e.g., opening a web page hosting drive-by malware)".

In "data poisoning", false data is continually smuggled into a machine learning system's training set to prevent it from achieving mastery.

Data collection ethics

The deep learning systems that are trained using supervised learning often rely on data that is created or annotated by humans, or both. It has been argued that not only low-paid clickwork (such as on Amazon Mechanical Turk) is regularly deployed for this purpose, but also implicit forms of human microwork that are often not recognized as such. The philosopher Rainer Mühlhoff distinguishes five types of "machinic capture" of human microwork to generate training data: (1) gamification (the embedding of annotation or computation tasks in the flow of a game), (2) "trapping and tracking" (e.g. CAPTCHAs for image recognition or click-tracking on Google search results pages), (3) exploitation of social motivations (e.g. tagging faces on Facebook to obtain labeled facial images), (4) information mining (e.g. by leveraging quantified-self devices such as activity trackers) and (5) clickwork.

Evolution of sexual reproduction

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Evolution_of_sexual_reproduction   ...