A Medley of Potpourri

Tuesday, June 1, 2021

AI winter

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/AI_winter

In the history of artificial intelligence, an AI winter is a period of reduced funding and interest in artificial intelligence research. The term was coined by analogy to the idea of a nuclear winter. The field has experienced several hype cycles, followed by disappointment and criticism, followed by funding cuts, followed by renewed interest years or decades later.

The term first appeared in 1984 as the topic of a public debate at the annual meeting of AAAI (then called the "American Association of Artificial Intelligence"). It is a chain reaction that begins with pessimism in the AI community, followed by pessimism in the press, followed by a severe cutback in funding, followed by the end of serious research. At the meeting, Roger Schank and Marvin Minsky—two leading AI researchers who had survived the "winter" of the 1970s—warned the business community that enthusiasm for AI had spiraled out of control in the 1980s and that disappointment would certainly follow. Three years later, the billion-dollar AI industry began to collapse.

Hype is common in many emerging technologies, such as the railway mania or the dot-com bubble. The AI winter was a result of such hype, due to over-inflated promises by developers, unnaturally high expectations from end-users, and extensive promotion in the media. Despite the rise and fall of AI's reputation, it has continued to develop new and successful technologies. AI researcher Rodney Brooks would complain in 2002 that "there's this stupid myth out there that AI has failed, but AI is around you every second of the day." In 2005, Ray Kurzweil agreed: "Many observers still think that the AI winter was the end of the story and that nothing since has come of the AI field. Yet today many thousands of AI applications are deeply embedded in the infrastructure of every industry."

Enthusiasm and optimism about AI has generally increased since its low point in the early 1990s. Beginning about 2012, interest in artificial intelligence (and especially the sub-field of machine learning) from the research and corporate communities led to a dramatic increase in funding and investment.

Overview

There were two major winters in 1974–1980 and 1987–1993 and several smaller episodes, including the following:

1966: failure of machine translation
1970: abandonment of connectionism
Period of overlapping trends:
- 1971–75: DARPA's frustration with the Speech Understanding Research program at Carnegie Mellon University
- 1973: large decrease in AI research in the United Kingdom in response to the Lighthill report
- 1973–74: DARPA's cutbacks to academic AI research in general
1987: collapse of the LISP machine market
1988: cancellation of new spending on AI by the Strategic Computing Initiative
1993: resistance to new expert systems deployment and maintenance
1990s: end of the Fifth Generation computer project's original goals

Early episodes

Machine translation and the ALPAC report of 1966

During the Cold War, the US government was particularly interested in the automatic, instant translation of Russian documents and scientific reports. The government aggressively supported efforts at machine translation starting in 1954. At the outset, the researchers were optimistic. Noam Chomsky's new work in grammar was streamlining the translation process and there were "many predictions of imminent 'breakthroughs'".

Briefing for US Vice President Gerald Ford in 1973 on the junction-grammar-based computer translation model

However, researchers had underestimated the profound difficulty of word-sense disambiguation. In order to translate a sentence, a machine needed to have some idea what the sentence was about, otherwise it made mistakes. An apocryphal example is "the spirit is willing but the flesh is weak." Translated back and forth with Russian, it became "the vodka is good but the meat is rotten." Similarly, "out of sight, out of mind" became "blind idiot". Later researchers would call this the commonsense knowledge problem.

By 1964, the National Research Council had become concerned about the lack of progress and formed the Automatic Language Processing Advisory Committee (ALPAC) to look into the problem. They concluded, in a famous 1966 report, that machine translation was more expensive, less accurate and slower than human translation. After spending some 20 million dollars, the NRC ended all support. Careers were destroyed and research ended.

Machine translation is still an open research problem in the 21st century, which has met with some success.

The abandonment of connectionism in 1969

Some of the earliest work in AI used networks or circuits of connected units to simulate intelligent behavior. Examples of this kind of work, called "connectionism", include Walter Pitts and Warren McCullough's first description of a neural network for logic and Marvin Minsky's work on the SNARC system. In the late 1950s, most of these approaches were abandoned when researchers began to explore symbolic reasoning as the essence of intelligence, following the success of programs like the Logic Theorist and the General Problem Solver.

However, one type of connectionist work continued: the study of perceptrons, invented by Frank Rosenblatt, who kept the field alive with his salesmanship and the sheer force of his personality. He optimistically predicted that the perceptron "may eventually be able to learn, make decisions, and translate languages". Mainstream research into perceptrons came to an abrupt end in 1969, when Marvin Minsky and Seymour Papert published the book Perceptrons, which was perceived as outlining the limits of what perceptrons could do.

Connectionist approaches were abandoned for the next decade or so. While important work, such as Paul Werbos' discovery of backpropagation, continued in a limited way, major funding for connectionist projects was difficult to find in the 1970s and early 1980s. The "winter" of connectionist research came to an end in the middle 1980s, when the work of John Hopfield, David Rumelhart and others revived large scale interest in neural networks. Rosenblatt did not live to see this, however, as he died in a boating accident shortly after Perceptrons was published.

The setbacks of 1974

The Lighthill report

In 1973, professor Sir James Lighthill was asked by the UK Parliament to evaluate the state of AI research in the United Kingdom. His report, now called the Lighthill report, criticized the utter failure of AI to achieve its "grandiose objectives." He concluded that nothing being done in AI couldn't be done in other sciences. He specifically mentioned the problem of "combinatorial explosion" or "intractability", which implied that many of AI's most successful algorithms would grind to a halt on real world problems and were only suitable for solving "toy" versions.

The report was contested in a debate broadcast in the BBC "Controversy" series in 1973. The debate "The general purpose robot is a mirage" from the Royal Institution was Lighthill versus the team of Donald Michie, John McCarthy and Richard Gregory. McCarthy later wrote that "the combinatorial explosion problem has been recognized in AI from the beginning".

The report led to the complete dismantling of AI research in England. AI research continued in only a few universities (Edinburgh, Essex and Sussex). Research would not revive on a large scale until 1983, when Alvey (a research project of the British Government) began to fund AI again from a war chest of £350 million in response to the Japanese Fifth Generation Project (see below). Alvey had a number of UK-only requirements which did not sit well internationally, especially with US partners, and lost Phase 2 funding.

DARPA's early 1970s funding cuts

During the 1960s, the Defense Advanced Research Projects Agency (then known as "ARPA", now known as "DARPA") provided millions of dollars for AI research with few strings attached. J. C. R. Licklider, the founding director of DARPA's computing division, believed in "funding people, not projects" and he and several successors allowed AI's leaders (such as Marvin Minsky, John McCarthy, Herbert A. Simon or Allen Newell) to spend it almost any way they liked.

This attitude changed after the passage of Mansfield Amendment in 1969, which required DARPA to fund "mission-oriented direct research, rather than basic undirected research". Pure undirected research of the kind that had gone on in the 1960s would no longer be funded by DARPA. Researchers now had to show that their work would soon produce some useful military technology. AI research proposals were held to a very high standard. The situation was not helped when the Lighthill report and DARPA's own study (the American Study Group) suggested that most AI research was unlikely to produce anything truly useful in the foreseeable future. DARPA's money was directed at specific projects with identifiable goals, such as autonomous tanks and battle management systems. By 1974, funding for AI projects was hard to find.

AI researcher Hans Moravec blamed the crisis on the unrealistic predictions of his colleagues: "Many researchers were caught up in a web of increasing exaggeration. Their initial promises to DARPA had been much too optimistic. Of course, what they delivered stopped considerably short of that. But they felt they couldn't in their next proposal promise less than in the first one, so they promised more." The result, Moravec claims, is that some of the staff at DARPA had lost patience with AI research. "It was literally phrased at DARPA that 'some of these people were going to be taught a lesson [by] having their two-million-dollar-a-year contracts cut to almost nothing!'" Moravec told Daniel Crevier.

While the autonomous tank project was a failure, the battle management system (the Dynamic Analysis and Replanning Tool) proved to be enormously successful, saving billions in the first Gulf War, repaying all of DARPAs investment in AI and justifying DARPA's pragmatic policy.

The SUR debacle

DARPA was deeply disappointed with researchers working on the Speech Understanding Research program at Carnegie Mellon University. DARPA had hoped for, and felt it had been promised, a system that could respond to voice commands from a pilot. The SUR team had developed a system which could recognize spoken English, but only if the words were spoken in a particular order. DARPA felt it had been duped and, in 1974, they cancelled a three million dollar a year contract.

Many years later, several successful commercial speech recognition systems would use the technology developed by the Carnegie Mellon team (such as hidden Markov models) and the market for speech recognition systems would reach $4 billion by 2001.

The setbacks of the late 1980s and early 1990s

The collapse of the LISP machine market

In the 1980s, a form of AI program called an "expert system" was adopted by corporations around the world. The first commercial expert system was XCON, developed at Carnegie Mellon for Digital Equipment Corporation, and it was an enormous success: it was estimated to have saved the company 40 million dollars over just six years of operation. Corporations around the world began to develop and deploy expert systems and by 1985 they were spending over a billion dollars on AI, most of it to in-house AI departments. An industry grew up to support them, including software companies like Teknowledge and Intellicorp (KEE), and hardware companies like Symbolics and LISP Machines Inc. who built specialized computers, called LISP machines, that were optimized to process the programming language LISP, the preferred language for AI.

In 1987, three years after Minsky and Schank's prediction, the market for specialized LISP-based AI hardware collapsed. Workstations by companies like Sun Microsystems offered a powerful alternative to LISP machines and companies like Lucid offered a LISP environment for this new class of workstations. The performance of these general workstations became an increasingly difficult challenge for LISP Machines. Companies like Lucid and Franz LISP offered increasingly powerful versions of LISP that were portable to all UNIX systems. For example, benchmarks were published showing workstations maintaining a performance advantage over LISP machines. Later desktop computers built by Apple and IBM would also offer a simpler and more popular architecture to run LISP applications on. By 1987, some of them had become as powerful as the more expensive LISP machines. The desktop computers had rule-based engines such as CLIPS available. These alternatives left consumers with no reason to buy an expensive machine specialized for running LISP. An entire industry worth half a billion dollars was replaced in a single year.

By the early 1990s, most commercial LISP companies had failed, including Symbolics, LISP Machines Inc., Lucid Inc., etc. Other companies, like Texas Instruments and Xerox, abandoned the field. A small number of customer companies (that is, companies using systems written in LISP and developed on LISP machine platforms) continued to maintain systems. In some cases, this maintenance involved the assumption of the resulting support work.

Slowdown in deployment of expert systems

By the early 1990s, the earliest successful expert systems, such as XCON, proved too expensive to maintain. They were difficult to update, they could not learn, they were "brittle" (i.e., they could make grotesque mistakes when given unusual inputs), and they fell prey to problems (such as the qualification problem) that had been identified years earlier in research in nonmonotonic logic. Expert systems proved useful, but only in a few special contexts. Another problem dealt with the computational hardness of truth maintenance efforts for general knowledge. KEE used an assumption-based approach (see NASA, TEXSYS) supporting multiple-world scenarios that was difficult to understand and apply.

The few remaining expert system shell companies were eventually forced to downsize and search for new markets and software paradigms, like case-based reasoning or universal database access. The maturation of Common Lisp saved many systems such as ICAD which found application in knowledge-based engineering. Other systems, such as Intellicorp's KEE, moved from LISP to a C++ (variant) on the PC and helped establish object-oriented technology (including providing major support for the development of UML.

The end of the Fifth Generation project

In 1981, the Japanese Ministry of International Trade and Industry set aside $850 million for the Fifth Generation computer project. Their objectives were to write programs and build machines that could carry on conversations, translate languages, interpret pictures, and reason like human beings. By 1991, the impressive list of goals penned in 1981 had not been met. According to HP Newquist in The Brain Makers, "On June 1, 1992, The Fifth Generation Project ended not with a successful roar, but with a whimper." As with other AI projects, expectations had run much higher than what was actually possible.

Strategic Computing Initiative cutbacks

In 1983, in response to the fifth generation project, DARPA again began to fund AI research through the Strategic Computing Initiative. As originally proposed the project would begin with practical, achievable goals, which even included artificial general intelligence as long-term objective. The program was under the direction of the Information Processing Technology Office (IPTO) and was also directed at supercomputing and microelectronics. By 1985 it had spent $100 million and 92 projects were underway at 60 institutions, half in industry, half in universities and government labs. AI research was generously funded by the SCI.

Jack Schwarz, who ascended to the leadership of IPTO in 1987, dismissed expert systems as "clever programming" and cut funding to AI "deeply and brutally", "eviscerating" SCI. Schwarz felt that DARPA should focus its funding only on those technologies which showed the most promise, in his words, DARPA should "surf", rather than "dog paddle", and he felt strongly AI was not "the next wave". Insiders in the program cited problems in communication, organization and integration. A few projects survived the funding cuts, including pilot's assistant and an autonomous land vehicle (which were never delivered) and the DART battle management system, which (as noted above) was successful.

Developments post-AI winter

A survey of reports from the early 2000s suggests that AI's reputation was still less than stellar:

Alex Castro, quoted in The Economist, 7 June 2007: "[Investors] were put off by the term 'voice recognition' which, like 'artificial intelligence', is associated with systems that have all too often failed to live up to their promises."
Patty Tascarella in Pittsburgh Business Times, 2006: "Some believe the word 'robotics' actually carries a stigma that hurts a company's chances at funding."
John Markoff in the New York Times, 2005: "At its low point, some computer scientists and software engineers avoided the term artificial intelligence for fear of being viewed as wild-eyed dreamers."

Many researchers in AI in the mid 2000s deliberately called their work by other names, such as informatics, machine learning, analytics, knowledge-based systems, business rules management, cognitive systems, intelligent systems, intelligent agents or computational intelligence, to indicate that their work emphasizes particular tools or is directed at a particular sub-problem. Although this may be partly because they consider their field to be fundamentally different from AI, it is also true that the new names help to procure funding by avoiding the stigma of false promises attached to the name "artificial intelligence".

AI integration

In the late 1990s and early 21st century, AI technology became widely used as elements of larger systems, but the field is rarely credited for these successes. In 2006, Nick Bostrom explained that "a lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore." Rodney Brooks stated around the same time that "there's this stupid myth out there that AI has failed, but AI is around you every second of the day."

Technologies developed by AI researchers have achieved commercial success in a number of domains, such as machine translation, data mining, industrial robotics, logistics, speech recognition, banking software, medical diagnosis, and Google's search engine.

Fuzzy logic controllers have been developed for automatic gearboxes in automobiles (the 2006 Audi TT, VW Touareg and VW Caravelle feature the DSP transmission which utilizes fuzzy logic, a number of Škoda variants (Škoda Fabia) also currently include a fuzzy logic-based controller). Camera sensors widely utilize fuzzy logic to enable focus.

Heuristic search and data analytics are both technologies that have developed from the evolutionary computing and machine learning subdivision of the AI research community. Again, these techniques have been applied to a wide range of real world problems with considerable commercial success.

Data analytics technology utilizing algorithms for the automated formation of classifiers that were developed in the supervised machine learning community in the 1990s (for example, TDIDT, Support Vector Machines, Neural Nets, IBL) are now used pervasively by companies for marketing survey targeting and discovery of trends and features in data sets.

AI funding

Researchers and economists frequently judged the status of an AI winter by reviewing which AI projects were being funded, how much and by whom. Trends in funding are often set by major funding agencies in the developed world. Currently, DARPA and a civilian funding program called EU-FP7 provide much of the funding for AI research in the US and European Union.

As of 2007, DARPA was soliciting AI research proposals under a number of programs including The Grand Challenge Program, Cognitive Technology Threat Warning System (CT2WS), "Human Assisted Neural Devices (SN07-43)", "Autonomous Real-Time Ground Ubiquitous Surveillance-Imaging System (ARGUS-IS)" and "Urban Reasoning and Geospatial Exploitation Technology (URGENT)"

Perhaps best known is DARPA's Grand Challenge Program which has developed fully automated road vehicles that can successfully navigate real world terrain in a fully autonomous fashion.

DARPA has also supported programs on the Semantic Web with a great deal of emphasis on intelligent management of content and automated understanding. However James Hendler, the manager of the DARPA program at the time, expressed some disappointment with the government's ability to create rapid change, and moved to working with the World Wide Web Consortium to transition the technologies to the private sector.

The EU-FP7 funding program provides financial support to researchers within the European Union. In 2007–2008, it was funding AI research under the Cognitive Systems: Interaction and Robotics Programme (€193m), the Digital Libraries and Content Programme (€203m) and the FET programme (€185m).

Current "AI spring"

A marked increase in AI funding, development, deployment, and commercial use has led to the idea of the AI winter being long over. Concerns are occasionally raised that a new AI winter could be triggered by overly ambitious or unrealistic promises by prominent AI scientists or overpromising on the part of commercial vendors.

The successes of the current "AI spring" are advances in language translation (in particular, Google Translate), image recognition (spurred by the ImageNet training database) as commercialized by Google Image Search, and in game-playing systems such as AlphaZero (chess champion) and AlphaGo (go champion), and Watson (Jeopardy champion). Most of these advances have occurred since 2010.

Underlying causes behind AI winters

Several explanations have been put forth for the cause of AI winters in general. As AI progressed from government-funded applications to commercial ones, new dynamics came into play. While hype is the most commonly cited cause, the explanations are not necessarily mutually exclusive.

Hype

The AI winters can be partly understood as a sequence of over-inflated expectations and subsequent crash seen in stock-markets and exemplified by the railway mania and dotcom bubble. In a common pattern in the development of new technology (known as hype cycle), an event, typically a technological breakthrough, creates publicity which feeds on itself to create a "peak of inflated expectations" followed by a "trough of disillusionment". Since scientific and technological progress can't keep pace with the publicity-fueled increase in expectations among investors and other stakeholders, a crash must follow. AI technology seems to be no exception to this rule.

For example, in the 1960s the realization that computers could simulate 1-layer neural networks led to a neural-network hype cycle that lasted until the 1969 publication of the book Perceptrons which severely limited the set of problems that could be optimally solved by 1-layer networks. In 1985 the realization that neural networks could be used to solve optimization problems, as a result of famous papers by Hopfield and Tank, together with the threat of Japan's 5th-generation project, led to renewed interest and application.

Institutional factors

Another factor is AI's place in the organisation of universities. Research on AI often takes the form of interdisciplinary research. AI is therefore prone to the same problems other types of interdisciplinary research face. Funding is channeled through the established departments and during budget cuts, there will be a tendency to shield the "core contents" of each department, at the expense of interdisciplinary and less traditional research projects.

Economic factors

Downturns in a country's national economy cause budget cuts in universities. The "core contents" tendency worsens the effect on AI research and investors in the market are likely to put their money into less risky ventures during a crisis. Together this may amplify an economic downturn into an AI winter. It is worth noting that the Lighthill report came at a time of economic crisis in the UK, when universities had to make cuts and the question was only which programs should go.

Insufficient computing capability

Early in the computing history the potential for neural networks was understood but it has never been realized. Fairly simple networks require significant computing capacity even by today's standards.

Empty pipeline

It is common to see the relationship between basic research and technology as a pipeline. Advances in basic research give birth to advances in applied research, which in turn leads to new commercial applications. From this it is often argued that a lack of basic research will lead to a drop in marketable technology some years down the line. This view was advanced by James Hendler in 2008, when he claimed that the fall of expert systems in the late '80s was not due to an inherent and unavoidable brittleness of expert systems, but to funding cuts in basic research in the 1970s. These expert systems advanced in the 1980s through applied research and product development, but, by the end of the decade, the pipeline had run dry and expert systems were unable to produce improvements that could have overcome this brittleness and secured further funding.

Failure to adapt

The fall of the LISP machine market and the failure of the fifth generation computers were cases of expensive advanced products being overtaken by simpler and cheaper alternatives. This fits the definition of a low-end disruptive technology, with the LISP machine makers being marginalized. Expert systems were carried over to the new desktop computers by for instance CLIPS, so the fall of the LISP machine market and the fall of expert systems are strictly speaking two separate events. Still, the failure to adapt to such a change in the outside computing milieu is cited as one reason for the 1980s AI winter.

Arguments and debates on past and future of AI

Several philosophers, cognitive scientists and computer scientists have speculated on where AI might have failed and what lies in its future. Hubert Dreyfus highlighted flawed assumptions of AI research in the past and, as early as 1966, correctly predicted that the first wave of AI research would fail to fulfill the very public promises it was making. Other critics like Noam Chomsky have argued that AI is headed in the wrong direction, in part because of its heavy reliance on statistical techniques. Chomsky's comments fit into a larger debate with Peter Norvig, centered around the role of statistical methods in AI. The exchange between the two started with comments made by Chomsky at a symposium at MIT to which Norvig wrote a response.

Expert system

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Expert_system

A Symbolics Lisp Machine: an early platform for expert systems.

In artificial intelligence, an expert system is a computer system emulating the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning through bodies of knowledge, represented mainly as if–then rules rather than through conventional procedural code. The first expert systems were created in the 1970s and then proliferated in the 1980s. Expert systems were among the first truly successful forms of artificial intelligence (AI) software. An expert system is divided into two subsystems: the inference engine and the knowledge base. The knowledge base represents facts and rules. The inference engine applies the rules to the known facts to deduce new facts. Inference engines can also include explanation and debugging abilities.

History

Early development

Soon after the dawn of modern computers in the late 1940s – early 1950s, researchers started realizing the immense potential these machines had for modern society. One of the first challenges was to make such machine capable of “thinking” like humans. In particular, making these machines capable of making important decisions the way humans do. The medical / healthcare field presented the tantalizing challenge to enable these machines to make medical diagnostic decisions.

Thus, in the late 1950s, right after the information age had fully arrived, researchers started experimenting with the prospect of using computer technology to emulate human decision-making. For example, biomedical researchers started creating computer-aided systems for diagnostic applications in medicine and biology. These early diagnostic systems used patients’ symptoms and laboratory test results as inputs to generate a diagnostic outcome. These systems were often described as the early forms of expert systems. However, researchers had realized that there were significant limitations when using traditional methods such as flow-charts statistical pattern-matching, or probability theory.

Formal introduction & later developments

This previous situation gradually led to the development of expert systems, which used knowledge-based approaches. These expert systems in medicine were the MYCIN expert system, the INTERNIST-I expert system and later, in the middle of the 1980s, the CADUCEUS.

Expert systems were formally introduced around 1965 by the Stanford Heuristic Programming Project led by Edward Feigenbaum, who is sometimes termed the "father of expert systems"; other key early contributors were Bruce Buchanan and Randall Davis. The Stanford researchers tried to identify domains where expertise was highly valued and complex, such as diagnosing infectious diseases (Mycin) and identifying unknown organic molecules (Dendral). The idea that "intelligent systems derive their power from the knowledge they possess rather than from the specific formalisms and inference schemes they use" – as Feigenbaum said – was at the time a significant step forward, since the past research had been focused on heuristic computational methods, culminating in attempts to develop very general-purpose problem solvers (foremostly the conjunct work of Allen Newell and Herbert Simon). Expert systems became some of the first truly successful forms of artificial intelligence (AI) software.

Research on expert systems was also active in France. While in the US the focus tended to be on rules-based systems, first on systems hard coded on top of LISP programming environments and then on expert system shells developed by vendors such as Intellicorp, in France research focused more on systems developed in Prolog. The advantage of expert system shells was that they were somewhat easier for nonprogrammers to use. The advantage of Prolog environments was that they were not focused only on if-then rules; Prolog environments provided a much better realization of a complete first order logic environment.

In the 1980s, expert systems proliferated. Universities offered expert system courses and two thirds of the Fortune 500 companies applied the technology in daily business activities. Interest was international with the Fifth Generation Computer Systems project in Japan and increased research funding in Europe.

In 1981, the first IBM PC, with the PC DOS operating system, was introduced. The imbalance between the high affordability of the relatively powerful chips in the PC, compared to the much more expensive cost of processing power in the mainframes that dominated the corporate IT world at the time, created a new type of architecture for corporate computing, termed the client-server model. Calculations and reasoning could be performed at a fraction of the price of a mainframe using a PC. This model also enabled business units to bypass corporate IT departments and directly build their own applications. As a result, client server had a tremendous impact on the expert systems market. Expert systems were already outliers in much of the business world, requiring new skills that many IT departments did not have and were not eager to develop. They were a natural fit for new PC-based shells that promised to put application development into the hands of end users and experts. Until then, the main development environment for expert systems had been high end Lisp machines from Xerox, Symbolics, and Texas Instruments. With the rise of the PC and client server computing, vendors such as Intellicorp and Inference Corporation shifted their priorities to developing PC based tools. Also, new vendors, often financed by venture capital (such as Aion Corporation, Neuron Data, Exsys, and many others), started appearing regularly.

The first expert system to be used in a design capacity for a large-scale product was the SID (Synthesis of Integral Design) software program, developed in 1982. Written in LISP, SID generated 93% of the VAX 9000 CPU logic gates. Input to the software was a set of rules created by several expert logic designers. SID expanded the rules and generated software logic synthesis routines many times the size of the rules themselves. Surprisingly, the combination of these rules resulted in an overall design that exceeded the capabilities of the experts themselves, and in many cases out-performed the human counterparts. While some rules contradicted others, top-level control parameters for speed and area provided the tie-breaker. The program was highly controversial, but used nevertheless due to project budget constraints. It was terminated by logic designers after the VAX 9000 project completion.

During the years before the middle of the 1970s, the expectations of what expert systems can accomplish in many fields tended to be extremely optimistic. At the beginning of these early studies, researchers were hoping to develop entirely automatic (i.e., completely computerized) expert systems. The expectations of people of what computers can do were frequently too idealistic. This situation radically changed after Richard M. Karp published his breakthrough paper: “Reducibility among Combinatorial Problems” in the early 1970s. Thanks to Karp's work it became clear that there are certain limitations and possibilities when one designs computer algorithms. His findings describe what computers can do and what they cannot do. Many of the computational problems related to this type of expert systems have certain pragmatic limitations. These findings laid down the groundwork that led to the next developments in the field.

In the 1990s and beyond, the term expert system and the idea of a standalone AI system mostly dropped from the IT lexicon. There are two interpretations of this. One is that "expert systems failed": the IT world moved on because expert systems did not deliver on their over hyped promise. The other is the mirror opposite, that expert systems were simply victims of their success: as IT professionals grasped concepts such as rule engines, such tools migrated from being standalone tools for developing special purpose expert systems, to being one of many standard tools. Many of the leading major business application suite vendors (such as SAP, Siebel, and Oracle) integrated expert system abilities into their suite of products as a way of specifying business logic – rule engines are no longer simply for defining the rules an expert would use but for any type of complex, volatile, and critical business logic; they often go hand in hand with business process automation and integration environments.

Current approaches to expert systems

The limitations of the previous type of expert systems have urged researchers to develop new types of approaches. They have developed more efficient, flexible, and powerful approaches in order to simulate the human decision-making process. Some of the approaches that researchers have developed are based on new methods of artificial intelligence (AI), and in particular in machine learning and data mining approaches with a feedback mechanism. Recurrent neural networks often take advantage of such mechanisms. Related is the discussion on the disadvantages section.

Modern systems can incorporate new knowledge more easily and thus update themselves easily. Such systems can generalize from existing knowledge better and deal with vast amounts of complex data. Related is the subject of big data here. Sometimes these type of expert systems are called “intelligent systems.”

Software architecture

Illustrating example of backward chaining from a 1990 Master's Thesis

An expert system is an example of a knowledge-based system. Expert systems were the first commercial systems to use a knowledge-based architecture. A knowledge-based system is essentially composed of two sub-systems: the knowledge base and the inference engine.

The knowledge base represents facts about the world. In early expert systems such as Mycin and Dendral, these facts were represented mainly as flat assertions about variables. In later expert systems developed with commercial shells, the knowledge base took on more structure and used concepts from object-oriented programming. The world was represented as classes, subclasses, and instances and assertions were replaced by values of object instances. The rules worked by querying and asserting values of the objects.

The inference engine is an automated reasoning system that evaluates the current state of the knowledge-base, applies relevant rules, and then asserts new knowledge into the knowledge base. The inference engine may also include abilities for explanation, so that it can explain to a user the chain of reasoning used to arrive at a particular conclusion by tracing back over the firing of rules that resulted in the assertion.

There are mainly two modes for an inference engine: forward chaining and backward chaining. The different approaches are dictated by whether the inference engine is being driven by the antecedent (left hand side) or the consequent (right hand side) of the rule. In forward chaining an antecedent fires and asserts the consequent. For example, consider the following rule:

R1:{\mathit {Man}}(x)\implies {\mathit {Mortal}}(x)

A simple example of forward chaining would be to assert Man(Socrates) to the system and then trigger the inference engine. It would match R1 and assert Mortal(Socrates) into the knowledge base.

Backward chaining is a bit less straight forward. In backward chaining the system looks at possible conclusions and works backward to see if they might be true. So if the system was trying to determine if Mortal(Socrates) is true it would find R1 and query the knowledge base to see if Man(Socrates) is true. One of the early innovations of expert systems shells was to integrate inference engines with a user interface. This could be especially powerful with backward chaining. If the system needs to know a particular fact but does not, then it can simply generate an input screen and ask the user if the information is known. So in this example, it could use R1 to ask the user if Socrates was a Man and then use that new information accordingly.

The use of rules to explicitly represent knowledge also enabled explanation abilities. In the simple example above if the system had used R1 to assert that Socrates was Mortal and a user wished to understand why Socrates was mortal they could query the system and the system would look back at the rules which fired to cause the assertion and present those rules to the user as an explanation. In English, if the user asked "Why is Socrates Mortal?" the system would reply "Because all men are mortal and Socrates is a man". A significant area for research was the generation of explanations from the knowledge base in natural English rather than simply by showing the more formal but less intuitive rules.

As expert systems evolved, many new techniques were incorporated into various types of inference engines. Some of the most important of these were:

Truth maintenance. These systems record the dependencies in a knowledge-base so that when facts are altered, dependent knowledge can be altered accordingly. For example, if the system learns that Socrates is no longer known to be a man it will revoke the assertion that Socrates is mortal.
Hypothetical reasoning. In this, the knowledge base can be divided up into many possible views, a.k.a. worlds. This allows the inference engine to explore multiple possibilities in parallel. For example, the system may want to explore the consequences of both assertions, what will be true if Socrates is a Man and what will be true if he is not?
Uncertainty systems. One of the first extensions of simply using rules to represent knowledge was also to associate a probability with each rule. So, not to assert that Socrates is mortal, but to assert Socrates may be mortal with some probability value. Simple probabilities were extended in some systems with sophisticated mechanisms for uncertain reasoning, such as Fuzzy logic, and combination of probabilities.
Ontology classification. With the addition of object classes to the knowledge base, a new type of reasoning was possible. Along with reasoning simply about object values, the system could also reason about object structures. In this simple example, Man can represent an object class and R1 can be redefined as a rule that defines the class of all men. These types of special purpose inference engines are termed classifiers. Although they were not highly used in expert systems, classifiers are very powerful for unstructured volatile domains, and are a key technology for the Internet and the emerging Semantic Web.

Advantages

The goal of knowledge-based systems is to make the critical information required for the system to work explicit rather than implicit. In a traditional computer program the logic is embedded in code that can typically only be reviewed by an IT specialist. With an expert system the goal was to specify the rules in a format that was intuitive and easily understood, reviewed, and even edited by domain experts rather than IT experts. The benefits of this explicit knowledge representation were rapid development and ease of maintenance.

Ease of maintenance is the most obvious benefit. This was achieved in two ways. First, by removing the need to write conventional code, many of the normal problems that can be caused by even small changes to a system could be avoided with expert systems. Essentially, the logical flow of the program (at least at the highest level) was simply a given for the system, simply invoke the inference engine. This also was a reason for the second benefit: rapid prototyping. With an expert system shell it was possible to enter a few rules and have a prototype developed in days rather than the months or year typically associated with complex IT projects.

A claim for expert system shells that was often made was that they removed the need for trained programmers and that experts could develop systems themselves. In reality, this was seldom if ever true. While the rules for an expert system were more comprehensible than typical computer code, they still had a formal syntax where a misplaced comma or other character could cause havoc as with any other computer language. Also, as expert systems moved from prototypes in the lab to deployment in the business world, issues of integration and maintenance became far more critical. Inevitably demands to integrate with, and take advantage of, large legacy databases and systems arose. To accomplish this, integration required the same skills as any other type of system.

Disadvantages

The most common disadvantage cited for expert systems in the academic literature is the knowledge acquisition problem. Obtaining the time of domain experts for any software application is always difficult, but for expert systems it was especially difficult because the experts were by definition highly valued and in constant demand by the organization. As a result of this problem, a great deal of research in the later years of expert systems was focused on tools for knowledge acquisition, to help automate the process of designing, debugging, and maintaining rules defined by experts. However, when looking at the life-cycle of expert systems in actual use, other problems – essentially the same problems as those of any other large system – seem at least as critical as knowledge acquisition: integration, access to large databases, and performance.

Performance could be especially problematic because early expert systems were built using tools (such as earlier Lisp versions) that interpreted code expressions without first compiling them. This provided a powerful development environment, but with the drawback that it was virtually impossible to match the efficiency of the fastest compiled languages (such as C). System and database integration were difficult for early expert systems because the tools were mostly in languages and platforms that were neither familiar to nor welcome in most corporate IT environments – programming languages such as Lisp and Prolog, and hardware platforms such as Lisp machines and personal computers. As a result, much effort in the later stages of expert system tool development was focused on integrating with legacy environments such as COBOL and large database systems, and on porting to more standard platforms. These issues were resolved mainly by the client-server paradigm shift, as PCs were gradually accepted in the IT environment as a legitimate platform for serious business system development and as affordable minicomputer servers provided the processing power needed for AI applications.

Another major challenge of expert systems emerges when the size of the knowledge base increases. This causes the processing complexity to increase. For instance, when an expert system with 100 million rules was envisioned as the ultimate expert system, it became obvious that such system would be too complex and it would face too many computational problems. An inference engine would have to be able to process huge numbers of rules to reach a decision.

How to verify that decision rules are consistent with each other is also a challenge when there are too many rules. Usually such problem leads to a satisfiability (SAT) formulation. This is a well-known NP-complete problem Boolean satisfiability problem. If we assume only binary variables, say n of them, and then the corresponding search space is of size 2 $^{n}$ . Thus, the search space can grow exponentially.

There are also questions on how to prioritize the use of the rules in order to operate more efficiently, or how to resolve ambiguities (for instance, if there are too many else-if sub-structures within a single rule) and so on.

Other problems are related to the overfitting and overgeneralization effects when using known facts and trying to generalize to other cases not described explicitly in the knowledge base. Such problems exist with methods that employ machine learning approaches too.

Another problem related to the knowledge base is how to make updates of its knowledge quickly and effectively. Also how to add a new piece of knowledge (i.e., where to add it among many rules) is challenging. Modern approaches that rely on machine learning methods are easier in this regard.

Because of the above challenges, it became clear that new approaches to AI were required instead of rule-based technologies. These new approaches are based on the use of machine learning techniques, along with the use of feedback mechanisms.

The key challenges that expert systems in medicine (if one considers computer-aided diagnostic systems as modern expert systems), and perhaps in other application domains, include issues related to aspects such as: big data, existing regulations, healthcare practice, various algorithmic issues, and system assessment.

Applications

Hayes-Roth divides expert systems applications into 10 categories illustrated in the following table. The example applications were not in the original Hayes-Roth table, and some of them arose well afterward. Any application that is not footnoted is described in the Hayes-Roth book. Also, while these categories provide an intuitive framework to describe the space of expert systems applications, they are not rigid categories, and in some cases an application may show traits of more than one category.

Category	Problem addressed	Examples
Interpretation	Inferring situation descriptions from sensor data	Hearsay (speech recognition), PROSPECTOR
Prediction	Inferring likely consequences of given situations	Preterm Birth Risk Assessment
Diagnosis	Inferring system malfunctions from observables	CADUCEUS, MYCIN, PUFF, Mistral, Eydenet, Kaleidos
Design	Configuring objects under constraints	Dendral, Mortgage Loan Advisor, R1 (DEC VAX Configuration), SID (DEC VAX 9000 CPU)
Planning	Designing actions	Mission Planning for Autonomous Underwater Vehicle
Monitoring	Comparing observations to plan vulnerabilities	REACTOR
Debugging	Providing incremental solutions for complex problems	SAINT, MATHLAB, MACSYMA
Repair	Executing a plan to administer a prescribed remedy	Toxic Spill Crisis Management
Instruction	Diagnosing, assessing, and repairing student behavior	SMH.PAL, Intelligent Clinical Training, STEAMER
Control	Interpreting, predicting, repairing, and monitoring system behaviors	Real Time Process Control, Space Shuttle Mission Control

Hearsay was an early attempt at solving voice recognition through an expert systems approach. For the most part this category of expert systems was not all that successful. Hearsay and all interpretation systems are essentially pattern recognition systems—looking for patterns in noisy data. In the case of Hearsay recognizing phonemes in an audio stream. Other early examples were analyzing sonar data to detect Russian submarines. These kinds of systems proved much more amenable to a neural network AI solution than a rule-based approach.

CADUCEUS and MYCIN were medical diagnosis systems. The user describes their symptoms to the computer as they would to a doctor and the computer returns a medical diagnosis.

Dendral was a tool to study hypothesis formation in the identification of organic molecules. The general problem it solved—designing a solution given a set of constraints—was one of the most successful areas for early expert systems applied to business domains such as salespeople configuring Digital Equipment Corporation (DEC) VAX computers and mortgage loan application development.

SMH.PAL is an expert system for the assessment of students with multiple disabilities.

Mistral is an expert system to monitor dam safety, developed in the 1990s by Ismes (Italy). It gets data from an automatic monitoring system and performs a diagnosis of the state of the dam. Its first copy, installed in 1992 on the Ridracoli Dam (Italy), is still operational 24/7/365. It has been installed on several dams in Italy and abroad (e.g., Itaipu Dam in Brazil), and on landslide sites under the name of Eydenet, and on monuments under the name of Kaleidos. Mistral is a registered trade mark of CESI.

Semantic network

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Semantic_network

Example of a semantic network

A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges, which represent semantic relations between concepts, mapping or connecting semantic fields. A semantic network may be instantiated as, for example, a graph database or a concept map.

Typical standardized semantic networks are expressed as semantic triples.

Semantic networks are used in natural language processing applications such as semantic parsing and word-sense disambiguation.

History

Examples of the use of semantic networks in logic, directed acyclic graphs as a mnemonic tool, dates back centuries. The earliest documented use being the Greek philosopher Porphyry's commentary on Aristotle's categories in the third century AD.

In computing history, "Semantic Nets" for the propositional calculus were first implemented for computers by Richard H. Richens of the Cambridge Language Research Unit in 1956 as an "interlingua" for machine translation of natural languages. Although the importance of this work and the CLRU was only belatedly realized.

Semantic networks were also independently implemented by Robert F. Simmons and Sheldon Klein, using the first order predicate calculus as a base, after being inspired by a demonstration of Victor Yngve. The "line of research was originated by the first President of the Association [Association for Computational Linguistics], Victor Yngve, who in 1960 had published descriptions of algorithms for using a phrase structure grammar to generate syntactically well-formed nonsense sentences. Sheldon Klein and I about 1962-1964 were fascinated by the technique and generalized it to a method for controlling the sense of what was generated by respecting the semantic dependencies of words as they occurred in text." Other researchers, most notably M. Ross Quillian and others at System Development Corporation helped contribute to their work in the early 1960s as part of the SYNTHEX project. It's from these publications at SDC that most modern derivatives of the term "semantic network" cite as their background. Later prominent works were done by Allan M. Collins and Quillian (e.g., Collins and Quillian; Collins and Loftus Quillian). Still later in 2006, Hermann Helbig fully described MultiNet.

In the late 1980s, two Netherlands universities, Groningen and Twente, jointly began a project called Knowledge Graphs, which are semantic networks but with the added constraint that edges are restricted to be from a limited set of possible relations, to facilitate algebras on the graph. In the subsequent decades, the distinction between semantic networks and knowledge graphs was blurred. In 2012, Google gave their knowledge graph the name Knowledge Graph. The Semantic Link Network was systematically studied as a social semantics networking method. Its basic model consists of semantic nodes, semantic links between nodes, and a semantic space that defines the semantics of nodes and links and reasoning rules on semantic links. The systematic theory and model was published in 2004. This research direction can trace to the definition of inheritance rules for efficient model retrieval in 1998 and the Active Document Framework ADF. Since 2003, research has developed toward social semantic networking. This work is a systematic innovation at the age of the World Wide Web and global social networking rather than an application or simple extension of the Semantic Net (Network). Its purpose and scope are different from that of the Semantic Net (or network). The rules for reasoning and evolution and automatic discovery of implicit links play an important role in the Semantic Link Network. Recently it has been developed to support Cyber-Physical-Social Intelligence. It was used for creating a general summarization method. The self-organised Semantic Link Network was integrated with a multi-dimensional category space to form a semantic space to support advanced applications with multi-dimensional abstractions and self-organised semantic links It has been verified that Semantic Link Network play an important role in understanding and representation through text summarisation applications. Semantic Link Network has been extended from cyberspace to cyber-physical-social space. Competition relation and symbiosis relation as well as their roles in evolving society were studied in the emerging topic: Cyber-Physical-Social Intelligence

More specialized forms of semantic networks has been created for specific use. For example, in 2008, Fawsy Bendeck's PhD thesis formalized the Semantic Similarity Network (SSN) that contains specialized relationships and propagation algorithms to simplify the semantic similarity representation and calculations.

Basics of semantic networks

A semantic network is used when one has knowledge that is best understood as a set of concepts that are related to one another.

Most semantic networks are cognitively based. They also consist of arcs and nodes which can be organized into a taxonomic hierarchy. Semantic networks contributed ideas of spreading activation, inheritance, and nodes as proto-objects.

Examples

In Lisp

The following code shows an example of a semantic network in the Lisp programming language using an association list.

(setq *database*
'((canary  (is-a bird)
           (color yellow)
           (size small))
  (penguin (is-a bird)
           (movement swim))
  (bird    (is-a vertebrate)
           (has-part wings)
           (reproduction egg-laying))))

To extract all the information about the "canary" type, one would use the assoc function with a key of "canary".

WordNet

An example of a semantic network is WordNet, a lexical database of English. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. Some of the most common semantic relations defined are meronymy (A is a meronym of B if A is part of B), holonymy (B is a holonym of A if B contains A), hyponymy (or troponymy) (A is subordinate of B; A is kind of B), hypernymy (A is superordinate of B), synonymy (A denotes the same as B) and antonymy (A denotes the opposite of B).

WordNet properties have been studied from a network theory perspective and compared to other semantic networks created from Roget's Thesaurus and word association tasks. From this perspective the three of them are a small world structure.

Other examples

It is also possible to represent logical descriptions using semantic networks such as the existential graphs of Charles Sanders Peirce or the related conceptual graphs of John F. Sowa. These have expressive power equal to or exceeding standard first-order predicate logic. Unlike WordNet or other lexical or browsing networks, semantic networks using these representations can be used for reliable automated logical deduction. Some automated reasoners exploit the graph-theoretic features of the networks during processing.

Other examples of semantic networks are Gellish models. Gellish English with its Gellish English dictionary, is a formal language that is defined as a network of relations between concepts and names of concepts. Gellish English is a formal subset of natural English, just as Gellish Dutch is a formal subset of Dutch, whereas multiple languages share the same concepts. Other Gellish networks consist of knowledge models and information models that are expressed in the Gellish language. A Gellish network is a network of (binary) relations between things. Each relation in the network is an expression of a fact that is classified by a relation type. Each relation type itself is a concept that is defined in the Gellish language dictionary. Each related thing is either a concept or an individual thing that is classified by a concept. The definitions of concepts are created in the form of definition models (definition networks) that together form a Gellish Dictionary. A Gellish network can be documented in a Gellish database and is computer interpretable.

SciCrunch is a collaboratively edited knowledge base for scientific resources. It provides unambiguous identifiers (Research Resource IDentifiers or RRIDs) for software, lab tools etc. and it also provides options to create links between RRIDs and from communities.

Another example of semantic networks, based on category theory, is ologs. Here each type is an object, representing a set of things, and each arrow is a morphism, representing a function. Commutative diagrams also are prescribed to constrain the semantics.

In the social sciences people sometimes use the term semantic network to refer to co-occurrence networks. The basic idea is that words that co-occur in a unit of text, e.g. a sentence, are semantically related to one another. Ties based on co-occurrence can then be used to construct semantic networks.

Software tools

There are also elaborate types of semantic networks connected with corresponding sets of software tools used for lexical knowledge engineering, like the Semantic Network Processing System (SNePS) of Stuart C. Shapiro or the MultiNet paradigm of Hermann Helbig, especially suited for the semantic representation of natural language expressions and used in several NLP applications.

Semantic networks are used in specialized information retrieval tasks, such as plagiarism detection. They provide information on hierarchical relations in order to employ semantic compression to reduce language diversity and enable the system to match word meanings, independently from sets of words used.

The Knowledge Graph proposed by Google in 2012 is actually an application of semantic network in search engine.

Modeling multi-relational data like semantic networks in low-dimensional spaces through forms of embedding has benefits in expressing entity relationships as well as extracting relations from mediums like text. There are many approaches to learning these embeddings, notably using Bayesian clustering frameworks or energy-based frameworks, and more recently, TransE (NIPS 2013). Applications of embedding knowledge base data include Social network analysis and Relationship extraction.

Semantic technology

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Semantic_technology

Simplistic example of the sort of semantic net used in Semantic Web technology

The ultimate goal of semantic technology is to help machines understand data. To enable the encoding of semantics with the data, well-known technologies are RDF (Resource Description Framework) and OWL (Web Ontology Language). These technologies formally represent the meaning involved in information. For example, ontology can describe concepts, relationships between things, and categories of things. These embedded semantics with the data offer significant advantages such as reasoning over data and dealing with heterogeneous data sources.

Overview

In software, semantic technology encodes meanings separately from data and content files, and separately from application code. This enables machines as well as people to understand, share and reason with them at execution time. With semantic technologies, adding, changing and implementing new relationships or interconnecting programs in a different way can be just as simple as changing the external model that these programs share.

With traditional information technology, on the other hand, meanings and relationships must be predefined and "hard wired" into data formats and the application program code at design time. This means that when something changes, previously unexchanged information needs to be exchanged, or two programs need to interoperate in a new way, the humans must get involved.

Off-line, the parties must define and communicate between them the knowledge needed to make the change, and then recode the data structures and program logic to accommodate it, and then apply these changes to the database and the application. Then, and only then, can they implement the changes.

Semantic technologies are "meaning-centered". They include subjects but not limited to:

encode/decode of semantic representation,
knowledge graph embedding relationships,
auto-recognition of topics and concepts,
information and meaning extraction,
semantic data integration, and
taxonomies/classification.

Given a question, semantic technologies can directly search topics, concepts, associations that span a vast number of sources.

Semantic technologies provide an abstraction layer above existing IT technologies that enables bridging and interconnection of data, content, and processes. Second, from the portal perspective, semantic technologies can be thought of as a new level of depth that provides far more intelligent, capable, relevant, and responsive interaction than with information technologies alone.

Search This Blog

Tuesday, June 1, 2021

AI winter

Overview

Early episodes

Machine translation and the ALPAC report of 1966

The abandonment of connectionism in 1969

The setbacks of 1974

The Lighthill report

DARPA's early 1970s funding cuts

The SUR debacle

The setbacks of the late 1980s and early 1990s

The collapse of the LISP machine market

Slowdown in deployment of expert systems

The end of the Fifth Generation project

Strategic Computing Initiative cutbacks

Developments post-AI winter

AI integration

AI funding

Current "AI spring"

Underlying causes behind AI winters

Hype

Institutional factors

Economic factors

Insufficient computing capability

Empty pipeline

Failure to adapt

Arguments and debates on past and future of AI

Expert system

History

Early development

Formal introduction & later developments

Current approaches to expert systems

Software architecture

Advantages

Disadvantages

Applications

Semantic network

History

Basics of semantic networks

Examples

In Lisp

WordNet

Other examples

Software tools

Semantic technology

Overview

Geodesic