August 22, 2002 by Peter Voss
Original link: http://www.kurzweilai.net/essentials-of-general-intelligence-the-direct-path-to-agi
General intelligence comprises the essential, domain-independent skills necessary for acquiring a wide range of domain-specific knowledge — the ability to learn anything. Achieving this with “artificial general intelligence” (AGI) requires a highly adaptive, general-purpose system that can autonomously acquire an extremely wide range of specific knowledge and skills and can improve its own cognitive ability through self-directed learning. This chapter in the forthcoming book, Real AI: New Approaches to Artificial General Intelligence, describes the requirements and conceptual design of a prototype AGI system.
Original link: http://www.kurzweilai.net/essentials-of-general-intelligence-the-direct-path-to-agi
General intelligence comprises the essential, domain-independent skills necessary for acquiring a wide range of domain-specific knowledge — the ability to learn anything. Achieving this with “artificial general intelligence” (AGI) requires a highly adaptive, general-purpose system that can autonomously acquire an extremely wide range of specific knowledge and skills and can improve its own cognitive ability through self-directed learning. This chapter in the forthcoming book, Real AI: New Approaches to Artificial General Intelligence, describes the requirements and conceptual design of a prototype AGI system.
1. Introduction
This paper explores the concept of ‘artificial general intelligence’ (AGI) – its nature, importance, and how best to achieve it. Our[1] theoretical model posits that general intelligence comprises a limited number of distinct, yet highly integrated, foundational functional components. Successful implementation of this model will yield a highly adaptive, general-purpose system that can autonomously acquire an extremely wide range of specific knowledge and skills. Moreover, it will be able to improve its own cognitive ability through self-directed learning. We believe that, given the right design, current hardware/ software technology is adequate for engineering practical AGI systems. Our current implementation of a functional prototype is described below.
The idea of ‘general intelligence’ is quite controversial; I do not substantially engage this debate here but rather take the existence of such non domain-specific abilities as a given (Gottfredson 1998). It must also be noted that this essay focuses primarily on low-level (i.e. roughly animal level) cognitive ability. Higher-level functionality, while an integral part of our model, is only addressed peripherally. Finally, certain algorithmic details are omitted for reasons of proprietary ownership.
2. General Intelligence
Intelligence can be defined simply as an entity’s ability to achieve goals – with greater intelligence coping with more complex and novel situations. Complexity ranges from the trivial – thermostats and mollusks (that in most contexts don’t even justify the label ‘intelligence’) – to the fantastically complex; autonomous flight control systems and humans.
Adaptivity, the ability to deal with changing and novel requirements, also covers a wide spectrum: from rigid, narrowly domain-specific to highly flexible, general purpose. Furthermore, flexibility can be defined in terms of scope and permanence – how much, and how often it changes. Imprinting is an example of limited scope and high permanence, while innovative, abstract problem solving is at the other end of the spectrum. While entities with high adaptivity and flexibility are clearly superior – they can potentially learn to achieve any possible goal – there is a hefty efficiency price to be paid: For example, had Deep Blue also been designed to learn language, direct airline traffic, and do medical diagnosis, it would not have become Chess champion (all other things being equal).
General Intelligence comprises the essential, domain-independent skills necessary for acquiring a wide range of domain-specific knowledge (data & skills) – i.e. the ability to learn anything (in principle). More specifically, this learning ability needs to be autonomous, goal-directed, and highly adaptive:
- Autonomous — Learning occurs both automatically, through exposure to sense data (unsupervised), and through bi-directional interaction with the environment, including exploration/ experimentation (self-supervised).
- Goal-directed – Learning is directed (autonomously) towards achieving varying and novel goals and sub-goals — be they ‘hard-wired’, externally specified, or self-generated. Goal-directedness also implies very selective learning and data acquisition (from a massively data-rich, noisy, complex environment).
- Adaptive – Learning is cumulative, integrative, contextual and adjusts to changing goals and environments. General adaptivity not only copes with gradual changes, but also seeds and facilitates the acquisition of totally novel abilities.
For example, given the correct set of basic core capabilities, an AGI system should be able to learn to recognize and categorize a wide range of novel perceptual patterns that are acquired via different senses, in many different environments and contexts. Additionally, it should be able to autonomously learn appropriate, goal-directed responses to such input contexts (given some feedback mechanism).
We take this concept to be valid not only for high-level human intelligence, but for lower-level animal-like ability. The degree of ‘generality’ (i.e., adaptability) varies along a continuum from genetically ‘hard-coded’ responses (no adaptability), to high-level animal flexibility (significant learning ability as in, say, a dog), and finally to self-aware human general learning ability.
Core Requirements for General Intelligence
General intelligence, as described above, demands a number of irreducible features and capabilities. In order to proactively accumulate knowledge from various (and/ or changing) environments, it requires:
- Senses to obtain features from ‘the world’ (virtual or actual),
- A coherent means for storing knowledge obtained this way, and
- Adaptive output/ actuation mechanisms (both static and dynamic).
Any practical applications of AGI (and certainly any real-time uses) must inherently be able to process temporal data as patterns in time – not just as static patterns with a time dimension. Furthermore, AGIs must cope with data from different sense probes (e.g., visual, auditory, and data), and deal with such attributes as: noisy, scalar, unreliable, incomplete, multi-dimensional (both space/ time dimensional, and having a large number of simultaneous features), etc. Fuzzy pattern matching helps deal with pattern variability and noise.
Another essential requirement of general intelligence is to cope with an overabundance of data. Reality presents massively more features and detail than is (contextually) relevant, or that can be usefully processed. This is why the system needs to have some control over what input data is selected for analysis and learning – both in terms of which data, and also the degree of detail. Senses (‘probes’) are needed not only for selection and focus, but also in order to ground concepts – to give them (reality-based) meaning.
While input data needs to be severely limited by focus and selection, it is also extremely important to obtain multiple views of reality – data from different feature extractors or senses. Provided that these different input patterns are properly associated, they can help to provide context for each other, aid recognition, and add meaning.
In addition to being able to sense via its multiple, adaptive input groups and probes, the AGI must also be able to act on the world – be it for exploration, experimentation, communication, or to perform useful actions. These mechanisms need to provide both static and dynamic output (states and behavior). They too, need to be adaptive and capable of learning.
Underlying all of this functionality is pattern processing. What is more, not only are sensing and action based on generic patterns, but so is internal cognitive activity. In fact, even high-level abstract thought, language, and formal reasoning – abilities outside the scope of our current project – are ‘just’ higher-order elaborations of this (Margolis 1987).
Advantages of Intelligence being General
The advantages of general intelligence are almost too obvious to merit listing; how many of us would dream of giving up our ability to adapt and learn new things? In the context of artificial intelligence this issue takes on a new significance.
There exists an inexhaustible demand for computerized systems that can assist humans in complex tasks that are highly repetitive, dangerous, or that require knowledge, senses or abilities that its users may not possess (e.g., expert knowledge, ‘photographic’ recall, overcoming disabilities, etc.). These applications stretch across almost all domains of human endeavor.
Currently, these needs are filled primarily by systems engineered specifically for each domain and application (e.g., expert systems). Problems of cost, lead-time, reliability, and the lack of adaptability to new and unforeseen situations, severely limit market potential. Adaptive AGI technology, as described in this paper, promises to significantly reduce these limitations and to open up these markets. It specifically implies –
- That systems can learn (and be taught) a wide spectrum of data and functionality
- They can adapt to changing data, environments and uses/ goals
- This can be achieved without program changes – capabilities are learned, not coded.
- Significantly reduce system ‘brittleness’[2] through fuzzy pattern matching and adaptive learning – increasing robustness in the face of changing and unanticipated conditions or data.
- Learn autonomously, by automatically accumulating knowledge about new environments through exploration.
- Allow systems to be operator-trained to identify new objects and patterns; to respond to situations in specific ways, and to acquire new behaviors.
- Eliminate programming in many applications. Systems can be employed in many different environments, and with different parameters simply through self-training.
- Facilitate easy deployment in new domains. A general intelligence engine with pluggable custom input/ output probes allows rapid and inexpensive implementation of specialized applications.
The fact that no (artificial!) systems with these capabilities currently exist seems to imply that it is very hard (or impossible) to achieve these objectives. However, I believe that, as with other examples of human discovery and invention, the solution will seem rather obvious in retrospect. The trick is correctly choosing a few critical development options.
3. Shortcuts to AGI
When explaining Artificial General Intelligence to the uninitiated one often hears the remark that, surely, everyone in AI is working to achieve general intelligence. This indicates how deeply misunderstood intelligence is. While it is true that eventually conventional (domain-specific) research efforts will converge with those of AGI, without deliberate guidance this is likely to be a long, inefficient process. High-level intelligence must be adaptive, must be general – yet very little work is being done to specifically identify what general intelligence is, what it requires, and how to achieve it.
In addition to understanding general intelligence, AGI design also requires an appreciation of the differences between artificial (synthetic) and biological intelligence, and between designed and evolved systems.
Our particular approach to achieving AGI capitalizes on extensive analysis of these issues, and on an incremental development path that aims to minimize development effort (time and cost), technical complexity, and overall project risks. In particular, we are focusing on engineering a series of functional (but low-resolution/ capacity) proof-of-concept prototypes. Performance issues specifically related to commercialization are assigned to separate development tracks. Furthermore, our initial effort concentrates on identifying and implementing the most general and foundational components first, leaving high-level cognition such as abstract thought, language, and formal logic for later development (more on that later). We also focus more on selective, unsupervised, dynamic, incremental, interactive learning; on noisy, complex, analog data; and on integrating entity features and concept attributes in one comprehensive network.
While our project may not be the only one proceeding on this particular path, it is clear that by far the majority of AI work being done today follows a substantially different overall approach. Our work focuses on:
- General rather than domain-specific cognitive ability
- Acquired knowledge and skills, versus loaded databases and coded skills
- Bi-directional, real-time interaction, versus batch processing
- Adaptive attention (focus & selection), versus human pre-selected data
- Core support for dynamic patterns, versus static data
- Unsupervised and self-supervised, versus supervised learning
- Adaptive, self-organizing data structures, versus fixed neural nets or databases
- Contextual, grounded concepts, versus hard-coded, symbolic concepts
- Explicitly engineering functionality, versus evolving it
- Conceptual design, versus reverse-engineering
- General proof-of-concept, versus specific real applications development
- Animal level cognition, versus abstract thought, language, and formal logic.
General rather than domain-specific cognitive ability. The advantages listed in the previous section flow from the fact that generally intelligent systems can ultimately learn any specialized knowledge and skills possible – human intelligence is the proof! The reverse is obviously not true.
A complete, well-designed AGI’s ability to acquire domain-specific capabilities is limited only by processing and storage capacity. What is more, much of its learning will be autonomous – without teachers, and certainly without explicit programming. This approach implements (and capitalizes on) the essence of ‘Seed AI’ – systems with a limited, but carefully chosen set of basic, initial capabilities that allow them (in a ‘bootstrapping’ process) to dramatically increase their knowledge and skills through self-directed learning and adaptation. By concentrating on carefully designing the seed of intelligence, and then nursing it to maturity, one essentially bootstraps intelligence. In our AGI design this self-improvement takes two distinct forms/ phases:
- Coding the basic skills that allow the system to acquire a large amount of specific knowledge.
- The system reaching sufficient intelligence and conceptual understanding of its own design, to enable it to deliberately improve its own design.
An important feature of our design is that there are no traditional databases containing knowledge, nor programs encoding learned skills: All acquired knowledge is integrated into an adaptive central knowledge/ skills network. Patterns representing knowledge are associated in a manner that facilitates conceptualization and sensitivity to context. Naturally, such a design is potentially far less prone to brittleness, and more resiliently fault-tolerant.
Bi-directional, real-time interaction, versus batch processing. Adaptive learning systems must be able to interact bi-directionally with the environment – virtual or real. They must both sense data and act/ react on an ongoing basis. Many AI systems do all of their learning in batch mode and have little or no ability to learn incrementally. Such systems cannot easily adjust to changing environments or requirements – in many cases they are unable to adapt beyond the initial training set without reprogramming or retraining.
In addition to real-time perception and learning, intelligent systems must also be able to act. Three distinct areas of action capability are required:
- Acting on the ‘world’ – be it to communicate, to navigate or explore, or to manipulate some external function or device in order to achieve goals.
- Controlling or modifying the system’s internal parameters (such as learning rate or noise tolerance, etc.) in order to set or improve functionality.
- Controlling the system’s sense input parameters such as focus, selection, resolution (granularity) as well as adjusting feature extraction parameters.
Outside guidance and training can obviously speed learning; however, AGI systems must inherently be designed to acquire knowledge by themselves. In particular, they need to control what input data is processed – where specifically to obtain data, in how much detail, and in what format. Absent this capability the system will either be overwhelmed by irrelevant data or, conversely, be unable to obtain crucial information, or get it in the required format. Naturally, such data focus and selection mechanisms must themselves be adaptive.
Core support for dynamic patterns, versus static data. Temporal pattern processing is another fundamental requirement of interactive intelligence. At least three aspects of AGI rely on it: perception needs to learn/ recognize dynamic entities and sequences, action usually comprises complex behavior, and cognition (internal processing) is inherently temporal. In spite of this obvious need for intrinsic support for dynamic patterns, many AI systems only process static data; temporal sequences, if supported at all, are often converted (‘flattened’) externally to eliminate the time dimension. Real-time temporal pattern processing is technically quite challenging, so it is not surprising that most designs try to avoid it.
Unsupervised and self-supervised, versus supervised learning. Auto-adaptive systems such as AGIs require comprehensive capabilities to learn without supervision. Such teacher-independent knowledge and skill acquisition falls into two broad categories: unsupervised (data-driven, bottom-up), and self-supervised (goal-driven, top-down). Ideally these two modes of learning should seamlessly integrate with each other – and of course, also with other, supervised methods.
Here, as in other design choices, general adaptive systems are harder to design and tune than more specialized, unchanging ones. We see this particularly clearly in the overwhelming focus on back-propagation[3] in artificial neural network (ANN) development. Relatively little research aims at better understanding and improving incremental, autonomous learning. Our own design places heavy emphasis on these aspects.
Adaptive, self-organizing data structures, versus fixed neural nets or databases. Another core requirement imposed by data/ goal-driven, real-time learning is having a flexible, self-organizing data structure. On the one hand, knowledge representation must be highly integrated, while on the other hand it must be able to adapt to changing data densities (and other properties), and to varying goals or solutions. Our AGI encodes all acquired knowledge and skills in one integrated network-like structure. This central repository features a flexible, dynamically self-organizing topology. The vast majority of other AI designs rely either on loosely-coupled data objects or agents, or on fixed network topologies and pre-defined ontologies, data hierarchies or database layouts. This often severely limits their self-learning ability, adaptivity and robustness, or creates massive communication bottlenecks or other performance overhead.
Contextual, grounded concepts, versus hard-coded, symbolic concepts. Concepts are probably the most important design aspect of AGI; in fact, one can say that ‘high-level intelligence is conceptual intelligence’. Core characteristics of concepts include their ability to represent ultra-high-dimensional fuzzy sets that are grounded in reality, yet fluid with regard to context. In other words, they encode related sets of complex, coherent, multi-dimensional patterns that represent features of entities. Concepts obtain their grounding (and thus their meaning) by virtue of patterns emanating from features sensed directly from entities that exist in reality. Because concepts are defined by value ranges within each feature dimension (sometimes in complex relationships), some kind of fuzzy pattern matching is essential. In addition, the scope of concepts must be fluid; they must be sensitive and adaptive to both environmental and goal contexts.
Autonomous concept formation is one of the key tests of intelligence. The many AI systems based on hard-coded or human-defined concepts fail this fundamental test. Furthermore, systems that do not derive their concepts via interactive perception are unable to ground their knowledge in reality, and thus lack crucial meaning. Finally, concept structures whose activation cannot be modulated by context and degree of fit are unable to capture the subtlety and fluidity of intelligent generalization. In combination, these limitations will cripple any aspiring AGI.
Explicitly engineering (and learning) functionality, versus evolving it. Design by evolution is extremely inefficient – whether in nature or in computer science. Moreover, evolutionary solutions are generally opaque; optimized only to some specified ‘cost function’, not comprehensibility, modularity, or maintainability. Furthermore, evolutionary learning also requires more data or trials than are available in everyday problem solving.
Genetic and evolutionary programming do have their uses – they are powerful tools that can be used to solve very specific problems, such as optimization of large sets of variables; however they generally are not appropriate for creating large systems of infrastructures. Artificially evolving general intelligence directly seems particularly problematic because there is no known function measuring such capability along a single continuum – and absent such direction, evolution doesn’t know what to optimize. One approach to deal with this problem is to try to coax intelligence out of a complex ecology of competing agents – essentially replaying natural evolution.
Overall, it seems that genetic programming techniques are appropriate when one runs out of specific engineering ideas. Here is a short summary of advantages of explicitly engineered functionality:
- Designs can directly capitalize on and encode the designer’s knowledge and insights.
- Designs have comprehensible design documentation.
- Designs can be more far more modular – less need for multiple functionality and high inter-dependency of sub-systems than found in evolved systems.
- Systems can have a more flow-chart like, logical design – evolution has no foresight.
- They can be designed with debugging aids – evolution didn’t need that.
- These features combine to make systems easier to understand, debug, interface, and – importantly – for multiple teams to simultaneously work on the design.
Similarly, in creating artificial intelligence it makes sense to capitalize on our human intellectual and engineering strengths – to ignore design parameters unique to biological systems, instead of struggling to copy nature’s designs. Designs explicitly engineered to achieve desired functionality are much easier to understand, debug, modify, and enhance. Furthermore, using known and existing technology allows us to best leverage existing resources. So why limit ourselves to the single solution to intelligence created by a blind, unconscious Watchmaker with his own agenda (survival in an evolutionary environment very different from that of today)?
Intelligent machines designed from scratch carry neither the evolutionary baggage, nor the additional complexity for epigenesis, reproduction, and integrated self-repair of biological brains. Obviously this doesn’t imply that we can learn nothing from studying brains, just that we don’t have to limit ourselves to biological feasibility in our designs. Our (currently) only working example of high-level general intelligence (the brain) provides a crucial conceptual model of cognition, and can clearly inspire numerous specific design features.
Here are some desirable cognitive features that can be included in an AGI design that would not (and in some cases, could not) exist in a reverse-engineered brain:
- More effective control of neurochemistry (‘emotional states’)
- Selecting the appropriate degree of logical thinking versus intuition
- More effective control over focus and attention
- Being able to learn instantly, on demand
- Direct and rapid interfacing with databases, the Internet, and other machines – potentially having instant access to all available knowledge
- Optional ‘photographic’ memory and recall (‘playback’) on all senses!
- Better control over remembering and forgetting (freezing important knowledge, and being able to unlearn)
- The ability to accurately backtrack and review thought and decision processes (retrace and explore logic pathways)
- Patterns, nodes and links can easily be tagged (labeled) and categorized
- The ability to optimize the design for the available hardware instead of being forced to conform to the brain’s requirements
- The ability to utilize the best existing algorithms and software techniques – irrespective of whether they are biologically plausible
- Custom designed AGI (unlike brains) can have a simple speed/ capacity upgrade path
- The possibility of comprehensive integration with other AI systems (like expert systems, robotics, specialized sense pre-processors, and problem solvers)
- The ability to construct AGIs that are highly optimized for specific domains
- Node, link, and internal parameter data is available as ‘input data’ (full introspection)
- Design specifications are available (to the designer and to the AGI itself!)
- Seed AI design: A machine can inherently be designed to more easily understand and improve its own functioning – thus bootstrapping intelligence to ever higher levels.
- Concentrating on proof-of-concept prototypes, not commercial performance. This includes working at low data resolution and volume, and putting aside optimization. Scalability is addressed only at a theoretical level, and not necessarily implemented.
- Working with radically-reduced sense and motor capabilities. The fact that deaf, blind, and severely paralyzed people can attain high intelligence (Helen Keller, Stephen Hawking) indicates that these are not essential to developing AGI.
- Coping with complexity through a willingness to experiment and implement poorly understood algorithms – i.e. using an engineering approach. Using self-tuning feedback loops to minimize free parameters.
- Not being sidetracked by attempting to match the performance of domain-specific designs – focusing more on how capabilities are achieved (e.g. learned conceptualization, instead of programmed or manually specified concepts) rather than raw performance.
- Developing and testing in virtual environments, not physical implementations. Most aspects of AGI can be fully evaluated without the overhead (time, money, and complexity) of robotics.
The core challenge of AGI is achieving the robust, adaptive conceptual learning ability of higher primates or young children. If human level intelligence is the goal, then pursuing robotics, language, or formal logic (at this stage) is a costly sideshow – whether motivated by misunderstanding the problem, or by commercial or ‘political’ considerations.
Summary. While our project leans heavily on research done in many specialized disciplines, it is one of the few efforts dedicated to integrating such interdisciplinary knowledge with the specific goal of developing general artificial intelligence. We firmly believe that many of the issues raised above are crucial to the early achievement of truly intelligent adaptive learning systems.
4. Foundational Cognitive Capabilities
General intelligence requires a number of foundational cognitive abilities. At a first approximation, it must be able to –
- Remember and recognize patterns representing coherent features of reality
- Relate such patterns by various similarities, differences, and associations
- Learn and perform a variety of actions
- Evaluate and encode feedback from a goal system
- Autonomously adjust its system control parameters.
Pattern learning, matching, completion, and recall. The primary method of pattern acquisition consists of a proprietary adaptation of lazy learning (Aha 1997, Yip 1997). Our implementation stores feature patterns (static and dynamic) with adaptive fuzzy tolerances that subsequently determine how similar patterns are processed. Our recognition algorithm matches patterns on a competitive winner-take-all basis, as a set or aggregate of similar patterns, or by forced choice. It also offers inherent support for pattern completion, and recall (where appropriate).
Data accumulation and forgetting. Because our system learns patterns incrementally, mechanism are needed for consolidating and pruning excess data. Sensed patterns (or sub-patterns) that fall within a dynamically set noise/ error tolerance of existing ones are automatically consolidated by a hebbian-like mechanism that we call ‘nudging’. This algorithm also accumulates certain statistical information. On the other hand, patterns that turn out not to be important (as judged by various criteria) are deleted.
Categorization and clustering. Vector-coded feature patterns are acquired in real-time and stored in a highly adaptive network structure. This central self-organizing repository automatically clusters data in hyper-dimensional vector-space. Our matching algorithm’s ability to recall patterns by any dimension provides inherent support for flexible, dynamic categorization. Additional categorization mechanisms facilitate grouping patterns by additional parameters, associations, or functions.
Pattern hierarchies and associations. Patterns of perceptual features do not stand in isolation – they are derived from coherent external reality. Encoding relationships between patterns serves the crucial functions of added meaning, context, and anticipation. Our system captures low-level, perception-driven pattern associations such as: sequential or coincidental in time, nearby in space, related by feature group or sense modality. Additional relationships are encoded at higher levels of the network, including actuation layers. This overall structure somewhat resembles the ‘dual network’ described by Goertzel (1993).
Pattern priming and activation spreading. The core function of association links is to prime[4] related nodes. This helps to disambiguate pattern matching, and to select contextual alternatives. In the case where activation is particularly strong and perceptual activity is low, stored patterns will be ‘recognized’ spontaneously. Both the scope and decay rate of such activation spreading are controlled adaptively. These dynamics combine with the primary, perception-driven activation to form the system’s short-term memory.
Action patterns. Adaptive action circuits are used to control parameters in the following three domains:
1) Senses, including adjustable feature extractors, focus and selection mechanisms
2) Output actuators for navigation and manipulation
3) Meta-cognition and internal controls.
Different actions states and behaviors (action sequences) for each of these control outputs can be created at design time (using a configuration script) or acquired interactively. Real-time learning occurs either by means of explicit teaching, or autonomously through random exploration. Once acquired, these actions can be tied to specific perceptual stimuli or whole contexts through various stimulus-response mechanisms. These S-R links (both activation and inhibition) are dynamically modified through ongoing reinforcement learning.
Meta-cognitive control. In addition to adaptive perception and action functionality, an AGI design must also allow for extensive monitoring and control of overall system parameters and functions. Any complex interactive learning system contains numerous crucial control parameters such as noise tolerance, learning and exploration rates, priorities and goal management, and a myriad others. Not only must the system be able to adaptively control these many interactive vectors, it must also appropriately manage its various cognitive functions (such as recognition, recall, action, etc.). Our design deals with these requirements by means of a highly adaptive introspection/ control ‘probe’.
High-level intelligence. Our AGI model posits that no additional foundational functions are necessary for higher-level cognition. Abstract thought, language, and logical thinking are all elaborations of core abilities. This controversial point is elaborated on further on.
5. An AGI in the making
The functional prototype currently under development at Adaptive A.I. Inc. aims to embody all the abovementioned choices, requirements, and features. Our development path is as follows:
1) Development framework
2) Memory core and interface structure
3) Individual foundational cognitive components
4) Integrated low-level cognition
5) Increasing level of functionality.
The software comprises an AGI engine framework with the following basic components:
- A set of pluggable, programmable (virtual) sensors and actuators (called ‘probes’)
- A central pattern store/ engine including all data and cognitive algorithms
- A configurable, dynamic 2D virtual world, plus various training and diagnostic tools.
While our design includes several novel, and proprietary algorithms, our key innovation is the particular selection and integration of established technologies and prior insights.
AGI Engine Architecture & Design Features
Our AGI engine (which provides this foundational cognitive ability) can logically be divided into three parts (See figure above.):
- Cognitive core
- Control/ interface logic
- Input/ output probes
The components listed below have been specifically designed with features required for adaptive general intelligence in (ultimately) real environments. Among other things, they deal with a great variety and volume of static and dynamic data, cope with fuzzy and uncertain data and goals, foster coherent integrated representations of reality, and – most of all – promote adaptivity.
Cognitive Core: This is the central repository of all static and dynamic data patterns – including all learned cognitive and behavioral states and sequences. All data is stored in a single, integrated node-link structure. The design innovates the specific encoding of pattern ‘fuzziness’ (in addition to other attributes). The core allows for several node/ link types with differing dynamics to help define the network’s cognitive structure.
The network’s topology is dynamically self-organizing – a feature inspired by ‘Growing Neural Gas’ design (Fritzke 1995). This allows network density to adjust to actual data feature and/ or goal requirements. Various adaptive local and global parameters further define network structure and dynamics in real time.
Control and Interface Logic: An overall control system coordinates the network’s execution cycle, drives various cognitive and housekeeping algorithms, and controls/ adapts system parameters. Via an Interface Manager, it also communicates data and control information to and from the probes.
Probes: The Interface Manager provides for dynamic addition and configuration of probes. Key design features of the probe architecture include the ability to have programmable feature extractors, variable data resolution, and focus & selection mechanisms. Such mechanisms for data selection are imperative for general intelligence: even moderately complex environments have a richness of data that far exceeds any system’s ability to usefully process.
The system handles a very wide variety of data types and control signal requirements – including those for visual, sound, and raw data (e.g., database, internet, keyboard), as well as various output actuators. A novel ‘system probe’ provides the system with monitoring and control of its internal states (a form of meta-cognition). Additional probes – either custom interfaces with other systems or additional real-world sensors/ actuators – can easily be added to the system.
Development Environment/ Language/ Hardware. The complete AGI engine plus associated support programs are implemented in (Object Oriented) C# under Microsoft’s .NET framework. The system is designed for optional remoting of various components, thus allowing for some distributed processing. Current tests show that practical (proof-of-concept) prototype performance can be achieved on a single, conventional PC (2 Ghz, 512 Meg). Even a non-performance-tuned implementation can process several complex patterns per second on a database of well over a million stored features.
6. From Algorithms to General Intelligence
This section covers some of our near-term research and development; it aims to illustrate our expected path toward meaningful general intelligence. While this work barely approaches higher-level animal cognition (exceeding it in some aspects, but falling far short in others such as sensory-motor skills), we take it to be a crucial step in proving the validity and practicality of our model. Furthermore, the actual functionality achieved should be highly competitive, if not unique, in applications where significant autonomous adaptivity and data selection, lack of brittleness, dynamic pattern processing, flexible actuation, and self-supervised learning are central requirements.
General intelligence doesn’t comprise one single, brilliant knock-out invention or design feature; instead, it emerges from the synergetic integration of a number of essential fundamental components. On the structural side, the system must integrate sense inputs, memory, and actuators, while on the functional side various learning, recognition, recall and action capabilities must operate seamlessly on a wide range of static and dynamic patterns. In addition, these cognitive abilities must be conceptual and contextual – they must be able to generalize knowledge, and interpret it against different backgrounds.
A key milestone in our project is testing the integrated functionality of the basic cognitive components within our overall AGI framework. A number of custom-developed, highly-configurable test utilities are used to test the cohesive functioning of the whole system. This automated training and evaluation is supplemented by manual experimentation in numerous different environments and applications. Experience gained by these tests helps to refine the complex dynamics of interacting algorithms and parameters.
One of the general difficulties with AGI development is to determine absolute measures of success. Part of the reason is that this field is still nascent, and thus no agreed definitions, let alone tests or measures of low-level general intelligence exist. As we proceed with our project we expect to develop ever more effective protocols and metrics for assessing cognitive ability. Our system’s performance evaluation is guided by this description: ‘General intelligence comprises the ability to acquire (and adapt) the knowledge and skills required for achieving a wide range of goals in a variety of domains.’
- In this context, ‘acquisition’ includes all of the following: automatic, via sense inputs (feature/ data driven); explicitly taught; discovered through exploration or experimentation; internal processes (e.g., association, categorization, statistics, etc.).
- ‘Adaptation’ implies that new knowledge is integrated appropriately.
- ‘Knowledge and skills’ refer to all kinds of data and abilities (states and behaviors) that the system acquires for the short or long term.
Sample Test Domains for Initial Performance Criteria
Adaptive Security Monitor. This system scans video monitors and alarm panels that oversee a secure area (say, factory, office building, etc.), and responds appropriately to abnormal conditions. Note, this is somewhat similar to a site monitoring application at MIT (Grimson 1998).
This simulation calls for a visual environment that contains a lot of detail but has only limited dynamic activity – this is its normal state (green). Two levels of abnormality exist: (i) minor, or known disturbance (yellow); (ii) major, or unknown disturbance (red).
The system must initially learn the normal state by simple exposure (automatically scanning the environment) at different resolutions (detail). It must also learn ‘yellow’ conditions by being shown a number of samples (some at high resolution). All other states must output ‘red’.
Standard operation is to continuously scan the environment at low resolution. If any abnormal condition is detected the system must learn to change to higher resolution in order to discriminate between ‘yellow’ and ‘red’.
The system must adapt to changes in the environment (and totally different environments) by simple exposure training.
Sight Assistant. The system controls a movable ‘eye’ (by voice command) that enables the identification (by voice output) of at least a hundred different objects in the world. A trainer will dynamically teach the system new names, associations, and eye movement commands.
The visual probe can select among different scenes (simulating rooms) and focus on different parts of each scene. The scenes depict objects of varying attributes: color, size, shape, various dynamics, etc. (and combinations of these), against different backgrounds.
Initial training will be to attach simple sound commands to maneuver the ‘eye’, and to associate word labels with selected objects. The system must then reliably execute voice commands and respond with appropriate identification (if any). Additional functionality could be to have the system scan the various scenes when idle, and to automatically report selected important objects.
Object identification must cover a wide spectrum of different attribute combinations and tolerances. The system must easily learn new scenes, objects, words and associations, and also adapt to changes in any of these variables.
Maze Explorer. A (virtual) entity explores a moderately complex environment. It discovers what types of objects aid or hinder its objectives, while learning to navigate this dynamic world. It can also be trained to perform certain behaviors.
The virtual world is filled with a great number of different objects (see previous example). In addition, some of these objects move in space at varying speeds and dynamics, and may be solid and/ or immovable. Groups of different kinds of objects have pre-assigned attributes that indicate negative or positive. The AGI engine controls the direction and speed of an entity in this virtual world. Its goal is to learn to navigate around immovable and negative objects to reliably reach hidden positives.
The system can also be trained to respond to operator commands to perform behaviors of varying degrees of complexity (for example, actions similar to ‘tricks’ one might teach a dog). This ‘Maze Explorer’ can easily be set up to deal with fairly complex tasks.
Towards Increased Intelligence
Clearly, the tasks described above do not by themselves represent any kind of breakthrough in artificial intelligence research. They have been achieved many times before. However, what we do believe to be significant and unique is the achievement of these various tasks without any task-specific programming or parameterization. It is not what is being done, but how it is done.
Development beyond these basic proof-of-concept tests will advance in two directions: 1) to significantly increase resolution, data volume, and complexity in applications similar to the tests; 2) to add higher-level functionality. In addition to work aimed at further developing and proving our general intelligence model, there are also numerous practical enhancements that can be done. These would include implementing multi-processor and network versions, and integrating our system with databases or with other existing AI technology such as expert systems, voice recognition, robotics, or sense modules with specialized feature extractors.
By far the most important of these future developments concern higher-level ability. Here is a partial list of action items, all of which are derived from lower-level foundations:
- Spread activation and retain context over extended period
- Support more complex internal temporal patterns, both for enhanced recognition and anticipation, and for cognitive and action sequences
- Internal activation feedback for processing without input
- Deduction, achieved through selective concept activation
- Advanced categorization by arbitrary dimensions
- Learning of more complex behavior
- Abstract and merged concept formation
- Structured language acquisition
- Increased awareness and control of internal states (introspection)
- Learning logic and other problem-solving methodologies.
Many different approaches to AI exist; some of the differences are straight forward while others are subtle and hinge on difficult philosophical issues. As such the exact placement of our work relative to that of others is difficult and, indeed, open to debate. Our view that ‘intelligence is a property of an entity that engages in two way interaction with an external environment’, technically puts us in the area of ‘agent systems’ (Russel 1995). However, our emphasis on a connectionist rather than classical approach to cognitive modeling, places our work in the field of ’embodied cognitive science’. (See Pfeifer and Scheier 1999 for a comprehensive overview.)
While our approach is similar to other research in embodied cognitive science, in some respects our goals are substantively different. A key difference is our belief that a core set of cognitive abilities working together is sufficient to produce general intelligence. This is in marked contrast to others in embodied cognitive science who consider intelligence to be necessarily specific to a set of problems within a given environment. In other words, they believe that autonomous agents always exist in ecological niches. As such they focus their research on building very limited systems that effectively deal with only a small number of problems within a specific limited environment. Almost all work in the area follows this — see Braitenberg (1984), Brooks (1994) or Arbib (1992) for just a few well known examples. Their stance contradicts the fact that humans possess general intelligence; we are able to effectively deal with a wide range of problems that are significantly beyond anything that could be called our ‘ecological niche’.
Perhaps the closest project to ours that is strictly in the area of embodied cognitive science is the Cog project at MIT (Brooks 1993). The project aims to understand the dynamics of human interaction by the construction of a human-like robot complete with upper torso, a head, eyes, arms and hands. While this project is significantly more ambitious than other projects in terms of the level and complexity of the system’s dynamics and abilities, the system is still essentially niche focused (elementary human social and physical interaction) when compared to our own efforts at general intelligence.
Probably the closest work to ours in the sense that it also aims to achieve general rather than niche intelligence is the Novamente project under the direction of Ben Goertzel. (The project was formerly known as Webmind — see Goertzel 1997, 2001.) Novamente relies on a hybrid of low-level neural net-like dynamics for activation spreading and concept priming, coupled with high-level semantic constructs to represent a variety of logical, causal and spatial-temporal relations. While the semantics of the system’s internal state are relatively easy to understand compared to a strictly connectionist approach, the classical elements in the system’s design open the door to many of the fundamental problems that have plagued classical AI over the last fifty years. For example, high-level semantics require a complex meta-logic contained in hard coded high-level reasoning and other high-level cognitive systems. These high-level systems contain significant implicit semantics that may not be grounded in environmental interaction but are rather hard coded by the designer – thus causing symbol grounding problems (Harnad 1990). The relatively fixed, high-level methods of knowledge representation and manipulation that this approach entails are also prone to ‘frame of reference’ (McCarthy and Hayes 1969; Pylyshyn 1987) and ‘brittleness’ problems. In a strictly embodied cognitive science approach, as we have taken, all knowledge is derived from agent-environment interaction thus avoiding these long-standing problems of classical AI.
Andy Clark (1997) is another researcher whose model closely resembles our own, but there are no implementations specifically based on his theoretical work. Igor Aleksander’s (now dormant) MAGNUS project (1996) also incorporated many key AGI concepts that we have identified, but it was severely limited by a classical AI, finite-state machine approach. Valeriy Nenov and Michael Dyer of UCLA (1994) used ‘massively’ parallel hardware (a CM-2 Connection Machine) to implement a virtual, interactive perceptual design close to our own, but with a more rigid, pre-programmed structure. Unfortunately, this ambitious, ground-breaking work has since been abandoned. The project was probably severely hampered by limited (at the time) hardware.
Moving further away from embodied cognitive science to purely classical research in general intelligence, perhaps the best known system is the Cyc project being pursued by Lenat (1990). Essentially Lenat sees general intelligence as being ‘common sense’. He hopes to achieve this goal by adding many millions of facts about the world into a huge database. After many years of work and millions of dollars in funding there is still a long way to go as the sheer number of facts that humans know about the world is truly staggering. We doubt that a very large database of basic facts is enough to give a computer much general intelligence – the mechanisms for autonomous knowledge acquisition are missing. Being a classical approach to AI this also suffers from the fundamental problems of classical AI listed above. For example the symbol grounding problem again: if facts about cats and dogs are just added to a database that the computer can use even though it has never seen or interacted with an animal, are those concepts really meaningful to the system? While his project also claims to pursue ‘general intelligence’, it is really very different from our own, both in its approach and in the difficulties it faces.
Analysis of AI’s ongoing failure to overcome its long-standing limitations reveals that it is not so much that Artificial General Intelligence has been tried and that it has failed, but rather that the field has largely been abandoned – be it for theoretical, historic, or commercial reasons. Certainly, our particular type of approach, as detailed in previous sections, is receiving scant attention.
8. Fast-track AGI – Why so Rare?
Widespread application of AI has been hampered by a number of core limitations that have plagued the field since the beginning, namely:
- The expense and delay of custom programming individual applications
- Systems’ inability to automatically learn from experience, or to be user teachable/ trainable
- Reliability and performance issues caused by ‘brittleness’ (the inability of systems to automatically adapt to changing requirements, or data outside of a predefined range)
- Their limited intelligence and common sense.
General intelligence is the key to achieving robust autonomous systems that can learn and adapt to a wide range of uses. It is also the cornerstone of self-improving, or Seed AI – using basic abilities to bootstrap higher-level ones. This essay identified foundational components of general intelligence, as well as crucial considerations particular to the effective development of the artificial variety. It highlighted the fact that very few researchers are actually following this most direct route to AGI.
If the approach outlined above is so promising, then why is has it received so little attention? Why is hardly anyone actually working on it?
A short answer: Of all the people working in the field called ‘AI’,
- 80% don’t believe in the concept of General Intelligence (but instead, in a large collection of specific skills and knowledge)
- Of those that do, 80% don’t believe that artificial, human-level intelligence is possible – either ever, or for a long, long time
- Of those that do, 80% work on domain-specific AI projects for commercial or academic-political reasons (results are more immediate)
- Of those left, 80% have a poor conceptual framework…
A great number of researchers reject the validity or importance of ‘general intelligence’. For many, controversies in psychology (such as those stoked by The Bell Curve) make this an unpopular, if not taboo subject. Others, conditioned by decades of domain-specific work, simply do not see the benefits of Seed AI – solving the problems only once.
Of those that do not in principle object to general intelligence, many don’t believe that AGI is possible – in their life-time, or ever. Some hold this position because they themselves tried and failed ‘in their youth’. Others believe that AGI is not the best approach to achieving ‘AI’, or are at a total loss on how to go about it. Very few researchers have actually studied the problem from our (the general intelligence/ Seed AI) perspective. Some are actually trying to reverse-engineer the brain – one function at a time. There are also those who have moral objections, or who are afraid of it.
Of course, a great many are so focused on particular, narrow aspects of intelligence that they simply don’t get around to looking at the big picture – they leave it to others to make it happen. It is also important to note that there are often strong financial and institutional pressures to pursue specialized AI.
All of the above combine to create a dynamic where Real AI is not ‘fashionable’ – getting little respect, funding, and support – further reducing the number of people drawn into it!
These should be more than enough reasons to account for the dearth of AGI progress. But it gets worse. Researchers actually trying to build AGI systems are further hampered by a myriad of misconceptions, poor choices, and lack of resources (funding and research). Many of the technical issues were explored previously (See sections 3 and 7.), but a few others are worth mentioning:
Epistemology. Models of AGI can only be as good as their underlying theory of knowledge – the nature of knowledge, and how it relates to reality. The realization that high-level intelligence is based on conceptual representation of reality underpins design decisions such as adaptive, fuzzy vector encoding, and an interactive, embodied approach. Other consequences are the need for sense-based focus and selection, and contextual activation. The central importance of a highly-integrated pattern network – especially including dynamic ones – becomes obvious on understanding the relationship between entities, attributes, concepts, actions, and thoughts. These and several other insights lay the foundation for solving problems related to grounding, brittleness, and common sense. Finally, there is still a lot of unnecessary confusion about the relationship between concepts and symbols. A dynamic that continues to handicap AI is the lingering schism between traditionalists and connectionists. This unfortunately helps to perpetuate a false dichotomy between explicit symbols/ schema, and incomprehensible patterns.
Theory of Mind. Another area of concern is sloppy formulation and poor understanding of several key concepts: consciousness, intelligence, volition, meaning, emotions, common sense, and ‘qualia’. The fact that hundreds of AI researchers attend conferences every year where key speakers proclaim that ‘we don’t understand consciousness (or qualia, or whatever), and will probably never understand it’ indicates just how pervasive this problem is. Marvin Minsky’s characterization of consciousness being a ‘suitcase word’[6] is correct. Let’s just unpack it!
Errors like these are often behind research going off at a tangent relative to stated long-term goals. Two examples are an undue emphasis on biological feasibility, and the belief that embodied intelligence cannot be virtual, that it has to be implemented in physical robots.
Cognitive psychology. It goes without saying that a proper understanding of the concept ‘intelligence’ is key to engineering it. In addition to epistemology, several areas of cognitive psychology are crucial to unraveling its meaning. Misunderstanding intelligence has led to some costly disappointments, such as manually accumulating huge amounts of largely useless data (knowledge without meaning), efforts to achieve intelligence by combining masses of dumb agents, or trying to obtain meaningful conversation from an isolated network of symbols.
Project focus. The few projects that do pursue AGI based on relatively sound models run yet another risk: they can easily lose focus. Sometimes commercial considerations hijack a project’s direction, while others get sidetracked by (relatively) irrelevant technical issues, such as trying to match an unrealistically high level of performance, fixating on biological feasibility of design, or attempting to implement high-level functions before their time. A clearly mapped-out developmental path to human-level intelligence can serve as a powerful antidote to losing sight of ‘the big picture’. A vision of how to get from ‘here’ to ‘there’ also helps to maintain motivation in such a difficult endeavor.
Research support. AGI utilizes, or more precisely, is an integration of a large number of existing AI technologies. Unfortunately, many of the most crucial areas are sadly under-researched. They include:
- Incremental, real-time, unsupervised/ self-supervised learning (vs. back-propagation)
- Integrated support for temporal patterns
- Dynamically-adaptive neural network topologies
- Self-tuning of system parameters, integrating bottom-up (data driven) and top-down (goal/ meta-cognition driven) auto-adaptation
- Sense probes with auto-adaptive feature extractors.
Cost and difficulty. Achieving high-level AGI will be hard. However, it will not be nearly as difficult as most experts think. A key element of ‘Real AI’ theory (and its implementation) is to concentrate on the essentials of intelligence. Seed AI becomes a manageable problem – in some respects much simpler than other mainstream AI goals – by eliminating huge areas of difficult, but inessential AI complexity. Once we get the crucial fundamental functionality working, much of the additional ‘intelligence’ (ability) required is taught or learned, not programmed. Having said this, I do believe that very substantial resources will be required to scale up the system to human-level storage and processing capacity. However, the far more moderate initial prototypes will serve as proof-of-concept for AGI while potentially seeding a large number of practical new applications.
9. Conclusion
Understanding general intelligence and identifying its essential components are key to building next-generation AI systems – systems that are far less expensive, yet significantly more capable. In addition to concentrating on general learning abilities, a fast-track approach should also seek a path of least resistance – one that capitalizes on human engineering strengths and available technology. Sometimes, this involves selecting the AI road less traveled.
We believe that the theoretical model, cognitive components, and framework described above, joined with our other strategic design decisions provide a solid basis for achieving practical AGI capabilities in the foreseeable future. Successful implementation will significantly address many traditional problems of AI. Potential benefits include:
- Minimizing initial environment-specific programming (through self-adaptive configuration)
- Substantially reducing ongoing software changes, because a large amount of additional functionality and knowledge will be acquired autonomously via self-supervised learning
- Greatly increasing the scope of applications, as users teach and train additional capabilities
- Improved flexibility and robustness resulting from systems’ ability to adapt to changing data patterns, environments and goals.
References
- Aha, D.W. (Ed.) (1997). Lazy Learning. Artificial Intelligence Review,11:1-5 Kluwer Academic Publishers
- Aleksander, I. (1996). Impossible Minds. Imperial College Press
- Arbib, M.A. (1992). Schema theory. In S. C. Shapiro (Ed.), Encyclopedia of Artificial Intelligence, 2nd ed (pp. 1427-1443). John Wiley.
- Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. MIT Press.
- Brooks, R.A., and Stein, L. A. (1993). Building brains for bodies. Memo 1439, Artificial Intelligence Lab, Massachusetts Institute of Technology
- Brooks, R.A. (1994). Coherent behavior from many adaptive processes. In D. Cliff, P. Husbands, J.A. Meyer, and S.W. Wilson (Eds.), From animals to animats: Proceedings of the third International Conference on Simulation of Adaptive Behavior (421-430).MIT Press.
- Churchland, P.M. (1995). The Engine of Reason, the Seat of the Soul: A Philosophical Journey into the Brain. MIT Press
- Clark, A. (1997. Being There: Putting Brain, Body and World Together Again. MIT Press
- Fritzke, B. (1995). A growing neural gas network learns topologies. In Tesauro, G., Touretzky, D. S., and Leen, T. K. (Eds.), Advances in Neural Information Processing Systems 7 (pp. 625-632). MIT Press.
- Goertzel, B. (1997). From complexity to creativity: Explorations in evolutionary, autopoietic, and cognitive dynamics. Plenum Press.
- Goertzel, B. (2001). Creating internet intelligence: Wild computing, distributed digital consciousness, and the emerging global brain Plenum Press.
- Goldstone, R.L. (1998). Perceptual Learning. Annual Review of Psychology, 49, 585-612.
- Gottfredson, L.S. (1998). The general intelligence factor. [Special Issue]. Scientific American, 9(4), 2, 24-29.
- Grimson, W.E.L., Stauffer, C., Lee L., Romano R. (1998). Using Adaptive Tracking to Classify and Monitor Activities in a Site. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 22-31, 1998
- Harnad, S. (1990). The symbol grounding problem. Physica D, 42, 335-346.
- Kelley, D. (1986). The Evidence of the Senses Louisiana State University Press
- Kosko, B. (1997). Fuzzy Engineering. Prentice Hall
- Lenat, D.B., Guha, R.V.(1990). Building Large Knowledge Based Systems. Addison-Wesley.
- Margolis, H. (1987). Patterns, Thinking, and Cognition: A Theory of Judgment. University of Chicago Press
- McCarthy, J. and Hayes, P.J.(1969). Some philosophical problems from the standpoint of artificial intelligence. Machine Intelligence, 4, 463-502.
- Nenov, V.I. and Dyer, M.G. (1994). Language Learning via Perceptual/ Motor Association: A Massively Parallel Model. In: Kitano, H., Hendler, J.A. (Eds.), Massively Parallel Artificial Intelligence (pp. 203-245) AAAI Press/The MIT Press.
- Pfeifer, R., and Scheier, C. (1999). Understanding intelligence. MIT Press.
- Pylyshyn, Z.W.(Ed.)(1987). The Robot’s Dilemma: The frame problem in A.I.. Ablex.
- Rand, A. (1990). Introduction to Objectivist Epistemology. Meridian
- Russell, S.J., Norvig, P.(1995). Artificial Intelligence: A modern approach. Prentice Hall.
- Wang, P. (1995). Non-axiomatic reasoning system: Exploring the essence of intelligence. PhD thesis, Indiana University.
- Yip, K., and Sussman, G.J. (1997). Sparse Representations for Fast, One-shot learning. Proc. of National Conference on Artificial Intelligence, July 1997.