A Medley of Potpourri

Monday, September 28, 2020

Intelligence amplification

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Intelligence_amplification

Intelligence amplification (IA) (also referred to as cognitive augmentation, machine augmented intelligence and enhanced intelligence) refers to the effective use of information technology in augmenting human intelligence. The idea was first proposed in the 1950s and 1960s by cybernetics and early computer pioneers.

IA is sometimes contrasted with AI (artificial intelligence), that is, the project of building a human-like intelligence in the form of an autonomous technological system such as a computer or robot. AI has encountered many fundamental obstacles, practical as well as theoretical, which for IA seem moot, as it needs technology merely as an extra support for an autonomous intelligence that has already proven to function. Moreover, IA has a long history of success, since all forms of information technology, from the abacus to writing to the Internet, have been developed basically to extend the information processing capabilities of the human mind (see extended mind and distributed cognition).

Major contributions

William Ross Ashby: Intelligence Amplification

The term intelligence amplification (IA) has enjoyed a wide currency since William Ross Ashby wrote of "amplifying intelligence" in his Introduction to Cybernetics (1956). Related ideas were explicitly proposed as an alternative to Artificial Intelligence by Hao Wang from the early days of automatic theorem provers.

.."problem solving" is largely, perhaps entirely, a matter of appropriate selection. Take, for instance, any popular book of problems and puzzles. Almost every one can be reduced to the form: out of a certain set, indicate one element.... It is, in fact, difficult to think of a problem, either playful or serious, that does not ultimately require an appropriate selection as necessary and sufficient for its solution.
It is also clear that many of the tests used for measuring "intelligence" are scored essentially according to the candidate's power of appropriate selection. ... Thus it is not impossible that what is commonly referred to as "intellectual power" may be equivalent to "power of appropriate selection". Indeed, if a talking Black Box were to show high power of appropriate selection in such matters—so that, when given difficult problems it persistently gave correct answers—we could hardly deny that it was showing the 'behavioral' equivalent of "high intelligence".
If this is so, and as we know that power of selection can be amplified, it seems to follow that intellectual power, like physical power, can be amplified. Let no one say that it cannot be done, for the gene-patterns do it every time they form a brain that grows up to be something better than the gene-pattern could have specified in detail. What is new is that we can now do it synthetically, consciously, deliberately.

Ashby, W.R., An Introduction to Cybernetics, Chapman and Hall, London, UK, 1956. Reprinted, Methuen and Company, London, UK, 1964.

J. C. R. Licklider: Man-Computer Symbiosis

"Man-Computer Symbiosis" is a key speculative paper published in 1960 by psychologist/computer scientist J.C.R. Licklider, which envisions that mutually-interdependent, "living together", tightly-coupled human brains and computing machines would prove to complement each other's strengths to a high degree:

Man-computer symbiosis is a subclass of man-machine systems. There are many man-machine systems. At present, however, there are no man-computer symbioses. The purposes of this paper are to present the concept and, hopefully, to foster the development of man-computer symbiosis by analyzing some problems of interaction between men and computing machines, calling attention to applicable principles of man-machine engineering, and pointing out a few questions to which research answers are needed. The hope is that, in not too many years, human brains and computing machines will be coupled together very tightly, and that the resulting partnership will think as no human brain has ever thought and process data in a way not approached by the information-handling machines we know today.
Licklider, J.C.R., "Man-Computer Symbiosis", IRE Transactions on Human Factors in Electronics, vol. HFE-1, 4-11, Mar 1960.

In Licklider's vision, many of the pure artificial intelligence systems envisioned at the time by over-optimistic researchers would prove unnecessary. (This paper is also seen by some historians as marking the genesis of ideas about computer networks which later blossomed into the Internet).

Douglas Engelbart: Augmenting Human Intellect

Licklider's research was similar in spirit to his DARPA contemporary and protégé Douglas Engelbart. Both had a view of how computers could be used that was both at odds with the then-prevalent views (which saw them as devices principally useful for computations), and key proponents of the way in which computers are now used (as generic adjuncts to humans).

Engelbart reasoned that the state of our current technology controls our ability to manipulate information, and that fact in turn will control our ability to develop new, improved technologies. He thus set himself to the revolutionary task of developing computer-based technologies for manipulating information directly, and also to improve individual and group processes for knowledge-work.

Engelbart's philosophy and research agenda is most clearly and directly expressed in the 1962 research report: Augmenting Human Intellect: A Conceptual Framework The concept of network augmented intelligence is attributed to Engelbart based on this pioneering work.

"Increasing the capability of a man to approach a complex problem situation, to gain comprehension to suit his particular needs, and to derive solutions to problems.
Increased capability in this respect is taken to mean a mixture of the following: more-rapid comprehension, better comprehension, the possibility of gaining a useful degree of comprehension in a situation that previously was too complex, speedier solutions, better solutions, and the possibility of finding solutions to problems that before seemed insolvable. And by complex situations we include the professional problems of diplomats, executives, social scientists, life scientists, physical scientists, attorneys, designers--whether the problem situation exists for twenty minutes or twenty years.
We do not speak of isolated clever tricks that help in particular situations. We refer to a way of life in an integrated domain where hunches, cut-and-try, intangibles, and the human feel for a situation usefully co-exist with powerful concepts, streamlined terminology and notation, sophisticated methods, and high-powered electronic aids."

Engelbart, D.C., Augmenting Human Intellect: A Conceptual Framework, Summary Report AFOSR-3233, Stanford Research Institute, Menlo Park, CA, October 1962.

Engelbart subsequently implemented these concepts in his Augmented Human Intellect Research Center at SRI International, developing essentially an intelligence amplifying system of tools (NLS) and co-evolving organizational methods, in full operational use by the mid-1960s within the lab. As intended, his R&D team experienced increasing degrees of intelligence amplification, as both rigorous users and rapid-prototype developers of the system. For a sampling of research results, see their 1968 Mother of All Demos.

Later contributions

Howard Rheingold worked at Xerox PARC in the 1980s and was introduced to both Bob Taylor and Douglas Engelbart; Rheingold wrote about "mind amplifiers" in his 1985 book, Tools for Thought.

Arnav Kapur working at MIT wrote about human-AI coalescence: how AI can be integrated into human condition as part of "human self": as a tertiary layer to the human brain to augment human cognition. He demonstrates this using a peripheral nerve-computer interface, AlterEgo, which enables a human user to silently and internally converse with a personal AI.

Shan Carter and Michael Nielsen introduce the concept of artificial intelligence augmentation (AIA): the use of AI systems to help develop new methods for intelligence augmentation. They contrast cognitive outsourcing (AI as an oracle, able to solve some large class of problems with better-than-human performance) with cognitive transformation (changing the operations and representations we use to think). A calculator is an example of the former; a spreadsheet of the latter.

Artificial general intelligence

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Artificial_general_intelligence

Artificial general intelligence (AGI) is the hypothetical intelligence of a machine that has the capacity to understand or learn any intellectual task that a human being can. It is a primary goal of some artificial intelligence research and a common topic in science fiction and futures studies. AGI can also be referred to as strong AI, full AI, or general intelligent action. Some academic sources reserve the term "strong AI" for machines that can experience consciousness. Today's AI is speculated to be decades away from AGI.

In contrast to strong AI, weak AI (also called narrow AI) is not intended to perform human cognitive abilities, rather, weak AI is limited to the use of software to study or accomplish specific problem solving or reasoning tasks.

As of 2017, over forty organizations are actively researching AGI.

Requirements

Various criteria for intelligence have been proposed (most famously the Turing test) but to date, there is no definition that satisfies everyone. However, there is wide agreement among artificial intelligence researchers that intelligence is required to do the following:

reason, use strategy, solve puzzles, and make judgments under uncertainty;
represent knowledge, including commonsense knowledge;
plan;
learn;
communicate in natural language;
and integrate all these skills towards common goals.

Other important capabilities include the ability to sense (e.g. see) and the ability to act (e.g. move and manipulate objects) in the world where intelligent behaviour is to be observed. This would include an ability to detect and respond to hazard. Many interdisciplinary approaches to intelligence (e.g. cognitive science, computational intelligence and decision making) tend to emphasise the need to consider additional traits such as imagination (taken as the ability to form mental images and concepts that were not programmed in) and autonomy. Computer based systems that exhibit many of these capabilities do exist (e.g. see computational creativity, automated reasoning, decision support system, robot, evolutionary computation, intelligent agent), but not yet at human levels.

Tests for confirming human-level AGI

The following tests to confirm human-level AGI have been considered:

The Turing Test (Turing): A machine and a human both converse sight unseen with a second human, who must evaluate which of the two is the machine, which passes the test if it can fool the evaluator a significant fraction of the time. Note: Turing does not prescribe what should qualify as intelligence, only that knowing that it is a machine should disqualify it.
The Coffee Test (Wozniak): A machine is required to enter an average American home and figure out how to make coffee: find the coffee machine, find the coffee, add water, find a mug, and brew the coffee by pushing the proper buttons.
The Robot College Student Test (Goertzel): A machine enrolls in a university, taking and passing the same classes that humans would, and obtaining a degree.
The Employment Test (Nilsson): A machine works an economically important job, performing at least as well as humans in the same job.

Problems requiring AGI to solve

The most difficult problems for computers are informally known as "AI-complete" or "AI-hard", implying that solving them is equivalent to the general aptitude of human intelligence, or strong AI, beyond the capabilities of a purpose-specific algorithm.

AI-complete problems are hypothesised to include general computer vision, natural language understanding, and dealing with unexpected circumstances while solving any real world problem.

AI-complete problems cannot be solved with current computer technology alone, and also require human computation. This property could be useful, for example, to test for the presence of humans, as CAPTCHAs aim to do; and for computer security to repel brute-force attacks.

History

Classical AI

Modern AI research began in the mid 1950s.³ The first generation of AI researchers were convinced that artificial general intelligence was possible and that it would exist in just a few decades. AI pioneer Herbert A. Simon wrote in 1965: "machines will be capable, within twenty years, of doing any work a man can do." Their predictions were the inspiration for Stanley Kubrick and Arthur C. Clarke's character HAL 9000, who embodied what AI researchers believed they could create by the year 2001.

AI pioneer Marvin Minsky was a consultant on the project of making HAL 9000 as realistic as possible according to the consensus predictions of the time; Crevier quotes him as having said on the subject in 1967, "Within a generation ... the problem of creating 'artificial intelligence' will substantially be solved," although Minsky states that he was misquoted.

However, in the early 1970s, it became obvious that researchers had grossly underestimated the difficulty of the project. Funding agencies became skeptical of AGI and put researchers under increasing pressure to produce useful "applied AI". As the 1980s began, Japan's Fifth Generation Computer Project revived interest in AGI, setting out a ten-year timeline that included AGI goals like "carry on a casual conversation". In response to this and the success of expert systems, both industry and government pumped money back into the field. However, confidence in AI spectacularly collapsed in the late 1980s, and the goals of the Fifth Generation Computer Project were never fulfilled.

For the second time in 20 years, AI researchers who had predicted the imminent achievement of AGI had been shown to be fundamentally mistaken. By the 1990s, AI researchers had gained a reputation for making vain promises. They became reluctant to make predictions at all and to avoid any mention of "human level" artificial intelligence for fear of being labeled "wild-eyed dreamer[s]."

Narrow AI research

In the 1990s and early 21st century, mainstream AI achieved far greater commercial success and academic respectability by focusing on specific sub-problems where they can produce verifiable results and commercial applications, such as artificial neural networks and statistical machine learning. These "applied AI" systems are now used extensively throughout the technology industry, and research in this vein is very heavily funded in both academia and industry. Currently, development on this field is considered an emerging trend, and a mature stage is expected to happen in more than 10 years.

Most mainstream AI researchers hope that strong AI can be developed by combining the programs that solve various sub-problems. Hans Moravec wrote in 1988:

"I am confident that this bottom-up route to artificial intelligence will one day meet the traditional top-down route more than half way, ready to provide the real world competence and the commonsense knowledge that has been so frustratingly elusive in reasoning programs. Fully intelligent machines will result when the metaphorical golden spike is driven uniting the two efforts."

However, even this fundamental philosophy has been disputed; for example, Stevan Harnad of Princeton concluded his 1990 paper on the Symbol Grounding Hypothesis by stating:

"The expectation has often been voiced that "top-down" (symbolic) approaches to modeling cognition will somehow meet "bottom-up" (sensory) approaches somewhere in between. If the grounding considerations in this paper are valid, then this expectation is hopelessly modular and there is really only one viable route from sense to symbols: from the ground up. A free-floating symbolic level like the software level of a computer will never be reached by this route (or vice versa) – nor is it clear why we should even try to reach such a level, since it looks as if getting there would just amount to uprooting our symbols from their intrinsic meanings (thereby merely reducing ourselves to the functional equivalent of a programmable computer)."

Modern artificial general intelligence research

The term "artificial general intelligence" was used as early as 1997, by Mark Gubrud in a discussion of the implications of fully automated military production and operations. The term was re-introduced and popularized by Shane Legg and Ben Goertzel around 2002. The research objective is much older, for example Doug Lenat's Cyc project (that began in 1984), and Allen Newell's Soar project are regarded as within the scope of AGI. AGI research activity in 2006 was described by Pei Wang and Ben Goertzel as "producing publications and preliminary results". The first summer school in AGI was organized in Xiamen, China in 2009 by the Xiamen university's Artificial Brain Laboratory and OpenCog. The first university course was given in 2010 and 2011 at Plovdiv University, Bulgaria by Todor Arnaudov. MIT presented a course in AGI in 2018, organized by Lex Fridman and featuring a number of guest lecturers.

However, as yet, most AI researchers have devoted little attention to AGI, with some claiming that intelligence is too complex to be completely replicated in the near term. However, a small number of computer scientists are active in AGI research, and many of this group are contributing to a series of AGI conferences. The research is extremely diverse and often pioneering in nature. In the introduction to his book, Goertzel says that estimates of the time needed before a truly flexible AGI is built vary from 10 years to over a century, but the consensus in the AGI research community seems to be that the timeline discussed by Ray Kurzweil in The Singularity is Near (i.e. between 2015 and 2045) is plausible.

However, mainstream AI researchers have given a wide range of opinions on whether progress will be this rapid. A 2012 meta-analysis of 95 such opinions found a bias towards predicting that the onset of AGI would occur within 16–26 years for modern and historical predictions alike. It was later found that the dataset listed some experts as non-experts and vice versa.

Organizations explicitly pursuing AGI include the Swiss AI lab IDSIA, Nnaisense, Vicarious, Maluuba, the OpenCog Foundation, Adaptive AI, LIDA, and Numenta and the associated Redwood Neuroscience Institute. In addition, organizations such as the Machine Intelligence Research Institute and OpenAI have been founded to influence the development path of AGI. Finally, projects such as the Human Brain Project have the goal of building a functioning simulation of the human brain. A 2017 survey of AGI categorized forty-five known "active R&D projects" that explicitly or implicitly (through published research) research AGI, with the largest three being DeepMind, the Human Brain Project, and OpenAI.

In 2017, researchers Feng Liu, Yong Shi and Ying Liu conducted intelligence tests on publicly available and freely accessible weak AI such as Google AI or Apple's Siri and others. At the maximum, these AI reached an IQ value of about 47, which corresponds approximately to a six-year-old child in first grade. An adult comes to about 100 on average. Similar tests had been carried out in 2014, with the IQ score reaching a maximum value of 27.

In 2019, video game programmer and aerospace engineer John Carmack announced plans to research AGI.

In 2020, OpenAI developed GPT-3, a language model capable of performing many diverse tasks without specific training. According to Gary Grossman in a VentureBeat article, while there is consensus that GPT-3 is not an example of AGI, it is considered by some to be too advanced to classify as a narrow AI system.

Processing power needed to simulate a brain

Whole brain emulation

A popular discussed approach to achieving general intelligent action is whole brain emulation. A low-level brain model is built by scanning and mapping a biological brain in detail and copying its state into a computer system or another computational device. The computer runs a simulation model so faithful to the original that it will behave in essentially the same way as the original brain, or for all practical purposes, indistinguishably. Whole brain emulation is discussed in computational neuroscience and neuroinformatics, in the context of brain simulation for medical research purposes. It is discussed in artificial intelligence research as an approach to strong AI. Neuroimaging technologies that could deliver the necessary detailed understanding are improving rapidly, and futurist Ray Kurzweil in the book The Singularity Is Near predicts that a map of sufficient quality will become available on a similar timescale to the required computing power.

Early estimates

Estimates of how much processing power is needed to emulate a human brain at various levels (from Ray Kurzweil, and Anders Sandberg and Nick Bostrom), along with the fastest supercomputer from TOP500 mapped by year. Note the logarithmic scale and exponential trendline, which assumes the computational capacity doubles every 1.1 years. Kurzweil believes that mind uploading will be possible at neural simulation, while the Sandberg, Bostrom report is less certain about where consciousness arises.

For low-level brain simulation, an extremely powerful computer would be required. The human brain has a huge number of synapses. Each of the 10¹¹ (one hundred billion) neurons has on average 7,000 synaptic connections (synapses) to other neurons. It has been estimated that the brain of a three-year-old child has about 10¹⁵ synapses (1 quadrillion). This number declines with age, stabilizing by adulthood. Estimates vary for an adult, ranging from 10¹⁴ to 5×10¹⁴ synapses (100 to 500 trillion). An estimate of the brain's processing power, based on a simple switch model for neuron activity, is around 10¹⁴ (100 trillion) synaptic updates per second (SUPS). In 1997, Kurzweil looked at various estimates for the hardware required to equal the human brain and adopted a figure of 10¹⁶ computations per second (cps). (For comparison, if a "computation" was equivalent to one "floating point operation" – a measure used to rate current supercomputers – then 10¹⁶ "computations" would be equivalent to 10 petaFLOPS, achieved in 2011). He used this figure to predict the necessary hardware would be available sometime between 2015 and 2025, if the exponential growth in computer power at the time of writing continued.

Modelling the neurons in more detail

The artificial neuron model assumed by Kurzweil and used in many current artificial neural network implementations is simple compared with biological neurons. A brain simulation would likely have to capture the detailed cellular behaviour of biological neurons, presently understood only in the broadest of outlines. The overhead introduced by full modeling of the biological, chemical, and physical details of neural behaviour (especially on a molecular scale) would require computational powers several orders of magnitude larger than Kurzweil's estimate. In addition the estimates do not account for glial cells, which are at least as numerous as neurons, and which may outnumber neurons by as much as 10:1, and are now known to play a role in cognitive processes.

Current research

There are some research projects that are investigating brain simulation using more sophisticated neural models, implemented on conventional computing architectures. The Artificial Intelligence System project implemented non-real time simulations of a "brain" (with 10¹¹ neurons) in 2005. It took 50 days on a cluster of 27 processors to simulate 1 second of a model. The Blue Brain project used one of the fastest supercomputer architectures in the world, IBM's Blue Gene platform, to create a real time simulation of a single rat neocortical column consisting of approximately 10,000 neurons and 10⁸ synapses in 2006. A longer-term goal is to build a detailed, functional simulation of the physiological processes in the human brain: "It is not impossible to build a human brain and we can do it in 10 years," Henry Markram, director of the Blue Brain Project said in 2009 at the TED conference in Oxford. There have also been controversial claims to have simulated a cat brain. Neuro-silicon interfaces have been proposed as an alternative implementation strategy that may scale better.

Hans Moravec addressed the above arguments ("brains are more complicated", "neurons have to be modeled in more detail") in his 1997 paper "When will computer hardware match the human brain?". He measured the ability of existing software to simulate the functionality of neural tissue, specifically the retina. His results do not depend on the number of glial cells, nor on what kinds of processing neurons perform where.

The actual complexity of modeling biological neurons has been explored in OpenWorm project that was aimed on complete simulation of a worm that has only 302 neurons in its neural network (among about 1000 cells in total). The animal's neural network has been well documented before the start of the project. However, although the task seemed simple at the beginning, the models based on a generic neural network did not work. Currently, the efforts are focused on precise emulation of biological neurons (partly on the molecular level), but the result cannot be called a total success yet. Even if the number of issues to be solved in a human-brain-scale model is not proportional to the number of neurons, the amount of work along this path is obvious.

Criticisms of simulation-based approaches

A fundamental criticism of the simulated brain approach derives from embodied cognition where human embodiment is taken as an essential aspect of human intelligence. Many researchers believe that embodiment is necessary to ground meaning. If this view is correct, any fully functional brain model will need to encompass more than just the neurons (i.e., a robotic body). Goertzel proposes virtual embodiment (like in Second Life), but it is not yet known whether this would be sufficient.

Desktop computers using microprocessors capable of more than 10⁹ cps (Kurzweil's non-standard unit "computations per second", see above) have been available since 2005. According to the brain power estimates used by Kurzweil (and Moravec), this computer should be capable of supporting a simulation of a bee brain, but despite some interest no such simulation exists. There are at least three reasons for this:

The neuron model seems to be oversimplified (see next section).
There is insufficient understanding of higher cognitive processes to establish accurately what the brain's neural activity, observed using techniques such as functional magnetic resonance imaging, correlates with.
Even if our understanding of cognition advances sufficiently, early simulation programs are likely to be very inefficient and will, therefore, need considerably more hardware.
The brain of an organism, while critical, may not be an appropriate boundary for a cognitive model. To simulate a bee brain, it may be necessary to simulate the body, and the environment. The Extended Mind thesis formalizes the philosophical concept, and research into cephalopods has demonstrated clear examples of a decentralized system.

In addition, the scale of the human brain is not currently well-constrained. One estimate puts the human brain at about 100 billion neurons and 100 trillion synapses. Another estimate is 86 billion neurons of which 16.3 billion are in the cerebral cortex and 69 billion in the cerebellum. Glial cell synapses are currently unquantified but are known to be extremely numerous.

Strong AI and consciousness

In 1980, philosopher John Searle coined the term "strong AI" as part of his Chinese room argument. He wanted to distinguish between two different hypotheses about artificial intelligence:

An artificial intelligence system can think and have a mind. (The word "mind" has a specific meaning for philosophers, as used in "the mind body problem" or "the philosophy of mind".)
An artificial intelligence system can (only) act like it thinks and has a mind.

The first one is called "the strong AI hypothesis" and the second is "the weak AI hypothesis" because the first one makes the stronger statement: it assumes something special has happened to the machine that goes beyond all its abilities that we can test. Searle referred to the "strong AI hypothesis" as "strong AI". This usage is also common in academic AI research and textbooks.

The weak AI hypothesis is equivalent to the hypothesis that artificial general intelligence is possible. According to Russell and Norvig, "Most AI researchers take the weak AI hypothesis for granted, and don't care about the strong AI hypothesis."

In contrast to Searle, Ray Kurzweil uses the term "strong AI" to describe any artificial intelligence system that acts like it has a mind, regardless of whether a philosopher would be able to determine if it actually has a mind or not. In science fiction, AGI is associated with traits such as consciousness, sentience, sapience, and self-awareness observed in living beings. However, according to Searle, it is an open question whether general intelligence is sufficient for consciousness. "Strong AI" (as defined above by Kurzweil) should not be confused with Searle's "strong AI hypothesis." The strong AI hypothesis is the claim that a computer which behaves as intelligently as a person must also necessarily have a mind and consciousness. AGI refers only to the amount of intelligence that the machine displays, with or without a mind.

Consciousness

There are other aspects of the human mind besides intelligence that are relevant to the concept of strong AI which play a major role in science fiction and the ethics of artificial intelligence:

consciousness: To have subjective experience and thought.
self-awareness: To be aware of oneself as a separate individual, especially to be aware of one's own thoughts.
sentience: The ability to "feel" perceptions or emotions subjectively.
sapience: The capacity for wisdom.

These traits have a moral dimension, because a machine with this form of strong AI may have rights, analogous to the rights of non-human animals. As such, preliminary work has been conducted on approaches to integrating full ethical agents with existing legal and social frameworks. These approaches have focused on the legal position and rights of 'strong' AI.

However, Bill Joy, among others, argues a machine with these traits may be a threat to human life or dignity. It remains to be shown whether any of these traits are necessary for strong AI. The role of consciousness is not clear, and currently there is no agreed test for its presence. If a machine is built with a device that simulates the neural correlates of consciousness, would it automatically have self-awareness? It is also possible that some of these properties, such as sentience, naturally emerge from a fully intelligent machine, or that it becomes natural to ascribe these properties to machines once they begin to act in a way that is clearly intelligent. For example, intelligent action may be sufficient for sentience, rather than the other way around.

Artificial consciousness research

Although the role of consciousness in strong AI/AGI is debatable, many AGI researchers regard research that investigates possibilities for implementing consciousness as vital. In an early effort Igor Aleksander argued that the principles for creating a conscious machine already existed but that it would take forty years to train such a machine to understand language.

Possible explanations for the slow progress of AI research

Since the launch of AI research in 1956, the growth of this field has slowed down over time and has stalled the aims of creating machines skilled with intelligent action at the human level. A possible explanation for this delay is that computers lack a sufficient scope of memory or processing power. In addition, the level of complexity that connects to the process of AI research may also limit the progress of AI research.

While most AI researchers believe strong AI can be achieved in the future, there are some individuals like Hubert Dreyfus and Roger Penrose who deny the possibility of achieving strong AI. John McCarthy was one of various computer scientists who believe human-level AI will be accomplished, but a date cannot accurately be predicted.

Conceptual limitations are another possible reason for the slowness in AI research. AI researchers may need to modify the conceptual framework of their discipline in order to provide a stronger base and contribution to the quest of achieving strong AI. As William Clocksin wrote in 2003: "the framework starts from Weizenbaum's observation that intelligence manifests itself only relative to specific social and cultural contexts".

Furthermore, AI researchers have been able to create computers that can perform jobs that are complicated for people to do, such as mathematics, but conversely they have struggled to develop a computer that is capable of carrying out tasks that are simple for humans to do, such as walking (Moravec's paradox). A problem described by David Gelernter is that some people assume thinking and reasoning are equivalent. However, the idea of whether thoughts and the creator of those thoughts are isolated individually has intrigued AI researchers.

The problems that have been encountered in AI research over the past decades have further impeded the progress of AI. The failed predictions that have been promised by AI researchers and the lack of a complete understanding of human behaviors have helped diminish the primary idea of human-level AI. Although the progress of AI research has brought both improvement and disappointment, most investigators have established optimism about potentially achieving the goal of AI in the 21st century.

Other possible reasons have been proposed for the lengthy research in the progress of strong AI. The intricacy of scientific problems and the need to fully understand the human brain through psychology and neurophysiology have limited many researchers in emulating the function of the human brain in computer hardware. Many researchers tend to underestimate any doubt that is involved with future predictions of AI, but without taking those issues seriously, people can then overlook solutions to problematic questions.

Clocksin says that a conceptual limitation that may impede the progress of AI research is that people may be using the wrong techniques for computer programs and implementation of equipment. When AI researchers first began to aim for the goal of artificial intelligence, a main interest was human reasoning. Researchers hoped to establish computational models of human knowledge through reasoning and to find out how to design a computer with a specific cognitive task.

The practice of abstraction, which people tend to redefine when working with a particular context in research, provides researchers with a concentration on just a few concepts. The most productive use of abstraction in AI research comes from planning and problem solving. Although the aim is to increase the speed of a computation, the role of abstraction has posed questions about the involvement of abstraction operators.

A possible reason for the slowness in AI relates to the acknowledgement by many AI researchers that heuristics is a section that contains a significant breach between computer performance and human performance. The specific functions that are programmed to a computer may be able to account for many of the requirements that allow it to match human intelligence. These explanations are not necessarily guaranteed to be the fundamental causes for the delay in achieving strong AI, but they are widely agreed by numerous researchers.

There have been many AI researchers that debate over the idea whether machines should be created with emotions. There are no emotions in typical models of AI and some researchers say programming emotions into machines allows them to have a mind of their own. Emotion sums up the experiences of humans because it allows them to remember those experiences. David Gelernter writes, "No computer will be creative unless it can simulate all the nuances of human emotion." This concern about emotion has posed problems for AI researchers and it connects to the concept of strong AI as its research progresses into the future.

Controversies and dangers

Feasibility

As of August 2020, AGI remains speculative as no such system has been demonstrated yet. Opinions vary both on whether and when artificial general intelligence will arrive, at all. At one extreme, AI pioneer Herbert A. Simon speculated in 1965: "machines will be capable, within twenty years, of doing any work a man can do". However, this prediction failed to come true. Microsoft co-founder Paul Allen believed that such intelligence is unlikely in the 21st century because it would require "unforeseeable and fundamentally unpredictable breakthroughs" and a "scientifically deep understanding of cognition". Writing in The Guardian, roboticist Alan Winfield claimed the gulf between modern computing and human-level artificial intelligence is as wide as the gulf between current space flight and practical faster-than-light spaceflight.

AI experts' views on the feasibility of AGI wax and wane, and may have seen a resurgence in the 2010s. Four polls conducted in 2012 and 2013 suggested that the median guess among experts for when they would be 50% confident AGI would arrive was 2040 to 2050, depending on the poll, with the mean being 2081. Of the experts, 16.5% answered with "never" when asked the same question but with a 90% confidence instead. Further current AGI progress considerations can be found below Tests for confirming human-level AGI and IQ-tests AGI.

Potential threat to human existence

The thesis that AI poses an existential risk, and that this risk needs much more attention than it currently gets, has been endorsed by many public figures; perhaps the most famous are Elon Musk, Bill Gates, and Stephen Hawking. The most notable AI researcher to endorse the thesis is Stuart J. Russell. Endorsers of the thesis sometimes express bafflement at skeptics: Gates states he does not "understand why some people are not concerned", and Hawking criticized widespread indifference in his 2014 editorial:

'So, facing possible futures of incalculable benefits and risks, the experts are surely doing everything possible to ensure the best outcome, right? Wrong. If a superior alien civilisation sent us a message saying, 'We'll arrive in a few decades,' would we just reply, 'OK, call us when you get here–we'll leave the lights on?' Probably not–but this is more or less what is happening with AI.'

Many of the scholars who are concerned about existential risk believe that the best way forward would be to conduct (possibly massive) research into solving the difficult "control problem" to answer the question: what types of safeguards, algorithms, or architectures can programmers implement to maximize the probability that their recursively-improving AI would continue to behave in a friendly, rather than destructive, manner after it reaches superintelligence?

The thesis that AI can pose existential risk also has many strong detractors. Skeptics sometimes charge that the thesis is crypto-religious, with an irrational belief in the possibility of superintelligence replacing an irrational belief in an omnipotent God; at an extreme, Jaron Lanier argues that the whole concept that current machines are in any way intelligent is "an illusion" and a "stupendous con" by the wealthy.

Much of existing criticism argues that AGI is unlikely in the short term. Computer scientist Gordon Bell argues that the human race will already destroy itself before it reaches the technological singularity. Gordon Moore, the original proponent of Moore's Law, declares that "I am a skeptic. I don't believe [a technological singularity] is likely to happen, at least for a long time. And I don't know why I feel that way." Baidu Vice President Andrew Ng states AI existential risk is "like worrying about overpopulation on Mars when we have not even set foot on the planet yet."

Evidence-based medicine

From Wikipedia, the free encyclopedia

Jump to navigation Jump to search

Evidence-based medicine (EBM) is "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients." The aim of EBM is to integrate the experience of the clinician, the values of the patient, and the best available scientific information to guide decision-making about clinical management. The term was originally used to describe an approach to teaching the practice of medicine and improving decisions by individual physicians about individual patients.

Background, history and definition

Medicine has a long history of scientific inquiry about the prevention, diagnosis, and treatment of human disease.

The concept of a controlled clinical trial was first described in 1662 by Jan Baptist van Helmont in reference to the practice of bloodletting. Wrote Van Helmont:

Let us take out of the Hospitals, out of the Camps, or from elsewhere, 200, or 500 poor People, that have fevers or Pleuritis. Let us divide them in Halfes, let us cast lots, that one halfe of them may fall to my share, and the others to yours; I will cure them without blood-letting and sensible evacuation; but you do, as ye know ... we shall see how many Funerals both of us shall have...

The first published report describing the conduct and results of a controlled clinical trial was by James Lind, a Scot Naval Surgeon who conducted research on scurvy during his time aboard HMS Salisbury in the Channel Fleet, while patrolling the Bay of Biscay. Lind divided the sailors participating in his experiment into six groups, so that the effects of various treatments could be fairly compared. Lind found improvement in symptoms and signs of scurvy among the group of men treated with lemons or oranges. He published a treatise describing the results of this experiment in 1753.

An early critique of statistical methods in medicine was published in 1835.

The term "Evidence-based medicine" was introduced in 1990 by Gordon Guyatt of McMaster University.

Clinical decision making

Alvan Feinstein's publication of Clinical Judgment in 1967 focused attention on the role of clinical reasoning and identified biases that can affect it. In 1972, Archie Cochrane published Effectiveness and Efficiency, which described the lack of controlled trials supporting many practices that had previously been assumed to be effective. In 1973, John Wennberg began to document wide variations in how physicians practiced. Through the 1980s, David M. Eddy described errors in clinical reasoning and gaps in evidence. In the mid 1980s, Alvin Feinstein, David Sackett and others published textbooks on clinical epidemiology, which translated epidemiological methods to physician decision making.Toward the end of the 1980s, a group at RAND showed that large proportions of procedures performed by physicians were considered inappropriate even by the standards of their own experts.

Evidence-based guidelines and policies

David M. Eddy first began to use the term "evidence-based" in 1987 in workshops and a manual commissioned by the Council of Medical Specialty Societies to teach formal methods for designing clinical practice guidelines. The manual was eventually published by the American College of Physicians. Eddy first published the term "evidence-based" in March, 1990, in an article in the Journal of the American Medical Association that laid out the principles of evidence-based guidelines and population-level policies, which Eddy described as "explicitly describing the available evidence that pertains to a policy and tying the policy to evidence instead of standard-of-care practices or the beliefs of experts. The pertinent evidence must be identified, described, and analyzed. The policymakers must determine whether the policy is justified by the evidence. A rationale must be written." He discussed "evidence-based" policies in several other papers published in JAMA in the spring of 1990.Those papers were part of a series of 28 published in JAMA between 1990 and 1997 on formal methods for designing population-level guidelines and policies.

Medical education

The term "evidence-based medicine" was introduced slightly later, in the context of medical education. This branch of evidence-based medicine has its roots in clinical epidemiology. In the autumn of 1990, Gordon Guyatt used it in an unpublished description of a program at McMaster University for prospective or new medical students. Guyatt and others first published the term two years later (1992) to describe a new approach to teaching the practice of medicine.

In 1996, David Sackett and colleagues clarified the definition of this tributary of evidence-based medicine as "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. ... [It] means integrating individual clinical expertise with the best available external clinical evidence from systematic research." This branch of evidence-based medicine aims to make individual decision making more structured and objective by better reflecting the evidence from research. Population-based data are applied to the care of an individual patient, while respecting the fact that practitioners have clinical expertise reflected in effective and efficient diagnosis and thoughtful identification and compassionate use of individual patients' predicaments, rights, and preferences.

Between 1993 and 2000, the Evidence-based Medicine Working Group at McMaster University published the methods to a broad physician audience in a series of 25 "Users' Guides to the Medical Literature" in JAMA. In 1995 Rosenberg and Donald defined individual level evidence-based medicine as "the process of finding, appraising, and using contemporaneous research findings as the basis for medical decisions." In 2010, Greenhalgh used a definition that emphasized quantitative methods: "the use of mathematical estimates of the risk of benefit and harm, derived from high-quality research on population samples, to inform clinical decision-making in the diagnosis, investigation or management of individual patients."

The two original definitions highlight important differences in how evidence-based medicine is applied to populations versus individuals. When designing guidelines applied to large groups of people in settings where there is relatively little opportunity for modification by individual physicians, evidence-based policymaking stresses that there should be good evidence to document a test's or treatment's effectiveness. In the setting of individual decision-making, practitioners can be given greater latitude in how they interpret research and combine it with their clinical judgment. In 2005, Eddy offered an umbrella definition for the two branches of EBM: "Evidence-based medicine is a set of principles and methods intended to ensure that to the greatest extent possible, medical decisions, guidelines, and other types of policies are based on and consistent with good evidence of effectiveness and benefit."

Progress

On the evidence-based guidelines and policies side, explicit insistence on evidence of effectiveness was introduced by the American Cancer Society in 1980. The U.S. Preventive Services Task Force (USPSTF) began issuing guidelines for preventive interventions based on evidence-based principles in 1984. In 1985, the Blue Cross Blue Shield Association applied strict evidence-based criteria for covering new technologies. Beginning in 1987, specialty societies such as the American College of Physicians, and voluntary health organizations such as the American Heart Association, wrote many evidence-based guidelines. In 1991, Kaiser Permanente, a managed care organization in the US, began an evidence-based guidelines program. In 1991, Richard Smith wrote an editorial in the British Medical Journal and introduced the ideas of evidence-based policies in the UK. In 1993, the Cochrane Collaboration created a network of 13 countries to produce systematic reviews and guidelines. In 1997, the US Agency for Healthcare Research and Quality (AHRQ, then known as the Agency for Health Care Policy and Research, or AHCPR) established Evidence-based Practice Centers (EPCs) to produce evidence reports and technology assessments to support the development of guidelines. In the same year, a National Guideline Clearinghouse that followed the principles of evidence-based policies was created by AHRQ, the AMA, and the American Association of Health Plans (now America's Health Insurance Plans). In 1999, the National Institute for Clinical Excellence (NICE) was created in the UK.

On the medical education side, programs to teach evidence-based medicine have been created in medical schools in Canada, the US, the UK, Australia, and other countries. A 2009 study of UK programs found the more than half of UK medical schools offered some training in evidence-based medicine, although there was considerable variation in the methods and content, and EBM teaching was restricted by lack of curriculum time, trained tutors and teaching materials. Many programs have been developed to help individual physicians gain better access to evidence. For example, UpToDate was created in the early 1990s. The Cochrane Collaboration began publishing evidence reviews in 1993.

BMJ Publishing Group launched a 6-monthly periodical in 1995 called Clinical Evidence that provided brief summaries of the current state of evidence about important clinical questions for clinicians.

Current practice

By 2000, "evidence-based medicine" had become an umbrella term for the emphasis on evidence in both population-level and individual-level decisions. In subsequent years, use of the term "evidence-based" had extended to other levels of the health care system. An example is "evidence-based health services", which seek to increase the competence of health service decision makers and the practice of evidence-based medicine at the organizational or institutional level.

The multiple tributaries of evidence-based medicine share an emphasis on the importance of incorporating evidence from formal research in medical policies and decisions. However, they differ on the extent to which they require good evidence of effectiveness before promoting a guideline or payment policy; hence, the distinction sometimes made between evidence-based medicine and science-based medicine, which also takes into account factors, such as prior plausibility and compatibility with established science as when medical organizations promote controversial treatments such as acupuncture. They also differ on the extent to which it is feasible to incorporate individual-level information in decisions. Thus, evidence-based guidelines and policies may not readily 'hybridise' with experience-based practices orientated towards ethical clinical judgement, and can lead to contradictions, contest, and unintended crises. The most effective 'knowledge leaders' (managers and clinical leaders) use a broad range of management knowledge in their decision making, rather than just formal evidence. Evidence-based guidelines may provide the basis for governmentality in health care, and consequently play a central role in the governance of contemporary health care systems.

Methods

Steps

The steps for designing explicit, evidence-based guidelines were described in the late 1980s: Formulate the question (population, intervention, comparison intervention, outcomes, time horizon, setting); search the literature to identify studies that inform the question; interpret each study to determine precisely what it says about the question; if several studies address the question, synthesize their results (meta-analysis); summarize the evidence in "evidence tables"; compare the benefits, harms and costs in a "balance sheet"; draw a conclusion about the preferred practice; write the guideline; write the rationale for the guideline; have others review each of the previous steps; implement the guideline.

For the purposes of medical education and individual-level decision making, five steps of EBM in practice were described in 1992 and the experience of delegates attending the 2003 Conference of Evidence-Based Health Care Teachers and Developers was summarized into five steps and published in 2005. This five step process can broadly be categorized as:

Translation of uncertainty to an answerable question and includes critical questioning, study design and levels of evidence
Systematic retrieval of the best evidence available
Critical appraisal of evidence for internal validity that can be broken down into aspects regarding:
- Systematic errors as a result of selection bias, information bias and confounding
- Quantitative aspects of diagnosis and treatment
- The effect size and aspects regarding its precision
- Clinical importance of results
- External validity or generalizability
Application of results in practice
Evaluation of performance

Evidence reviews

Systematic reviews of published research studies is a major part of the evaluation of particular treatments. The Cochrane Collaboration is one of the best-known organisations that conducts systematic reviews. Like other producers of systematic reviews, it requires authors to provide a detailed and repeatable plan of their literature search and evaluations of the evidence. Once all the best evidence is assessed, treatment is categorized as (1) likely to be beneficial, (2) likely to be harmful, or (3) evidence did not support either benefit or harm.

A 2007 analysis of 1,016 systematic reviews from all 50 Cochrane Collaboration Review Groups found that 44% of the reviews concluded that the intervention was likely to be beneficial, 7% concluded that the intervention was likely to be harmful, and 49% concluded that evidence did not support either benefit or harm. 96% recommended further research. In 2017, a study assessed the role of systematic reviews produced by Cochrane Collaboration to inform US private payers' policies making; it showed that though medical policy documents of major US private payers were informed by Cochrane systematic reviews there was still scope to encourage the further usage.

Assessing the quality of evidence

Evidence-based medicine categorizes different types of clinical evidence and rates or grades them according to the strength of their freedom from the various biases that beset medical research. For example, the strongest evidence for therapeutic interventions is provided by systematic review of randomized, well-blinded, placebo-controlled trials with allocation concealment and complete follow-up involving a homogeneous patient population and medical condition. In contrast, patient testimonials, case reports, and even expert opinion (however, some critics have argued that expert opinion "does not belong in the rankings of the quality of empirical evidence because it does not represent a form of empirical evidence" and continue that "expert opinion would seem to be a separate, complex type of knowledge that would not fit into hierarchies otherwise limited to empirical evidence alone") have little value as proof because of the placebo effect, the biases inherent in observation and reporting of cases, difficulties in ascertaining who is an expert and more.

Several organizations have developed grading systems for assessing the quality of evidence. For example, in 1989 the U.S. Preventive Services Task Force (USPSTF) put forth the following:

Level I: Evidence obtained from at least one properly designed randomized controlled trial.
Level II-1: Evidence obtained from well-designed controlled trials without randomization.
Level II-2: Evidence obtained from well-designed cohort studies or case-control studies, preferably from more than one center or research group.
Level II-3: Evidence obtained from multiple time series designs with or without the intervention. Dramatic results in uncontrolled trials might also be regarded as this type of evidence.
Level III: Opinions of respected authorities, based on clinical experience, descriptive studies, or reports of expert committees.

Another example is the Oxford (UK) CEBM Levels of Evidence. First released in September 2000, the Oxford CEBM Levels of Evidence provides 'levels' of evidence for claims about prognosis, diagnosis, treatment benefits, treatment harms, and screening, which most grading schemes do not address. The original CEBM Levels was Evidence-Based On Call to make the process of finding evidence feasible and its results explicit. In 2011, an international team redesigned the Oxford CEBM Levels to make it more understandable and to take into account recent developments in evidence ranking schemes. The Oxford CEBM Levels of Evidence have been used by patients, clinicians and also to develop clinical guidelines including recommendations for the optimal use of phototherapy and topical therapy in psoriasis and guidelines for the use of the BCLC staging system for diagnosing and monitoring hepatocellular carcinoma in Canada.

In 2000, a system was developed by the GRADE (short for Grading of Recommendations Assessment, Development and Evaluation) working group and takes into account more dimensions than just the quality of medical research. It requires users of GRADE who are performing an assessment of the quality of evidence, usually as part of a systematic review, to consider the impact of different factors on their confidence in the results. Authors of GRADE tables grade the quality of evidence into four levels, on the basis of their confidence in the observed effect (a numerical value) being close to what the true effect is. The confidence value is based on judgements assigned in five different domains in a structured manner. The GRADE working group defines 'quality of evidence' and 'strength of recommendations' based on the quality as two different concepts which are commonly confused with each other.

Systematic reviews may include randomized controlled trials that have low risk of bias, or, observational studies that have high risk of bias. In the case of randomized controlled trials, the quality of evidence is high, but can be downgraded in five different domains.

Risk of bias: Is a judgement made on the basis of the chance that bias in included studies has influenced the estimate of effect.
Imprecision: Is a judgement made on the basis of the chance that the observed estimate of effect could change completely.
Indirectness: Is a judgement made on the basis of the differences in characteristics of how the study was conducted and how the results are actually going to be applied.
Inconsistency: Is a judgement made on the basis of the variability of results across the included studies.
Publication bias: Is a judgement made on the basis of the question whether all the research evidence has been taken to account.

In the case of observational studies per GRADE, the quality of evidence starts of lower and may be upgraded in three domains in addition to being subject to downgrading.

Large effect: This is when methodologically strong studies show that the observed effect is so large that the probability of it changing completely is less likely.
Plausible confounding would change the effect: This is when despite the presence of a possible confounding factor which is expected to reduce the observed effect, the effect estimate still shows significant effect.
Dose response gradient: This is when the intervention used becomes more effective with increasing dose. This suggests that a further increase will likely bring about more effect.

Meaning of the levels of quality of evidence as per GRADE:

High Quality Evidence: The authors are very confident that the estimate that is presented lies very close to the true value. One could interpret it as "there is very low probability of further research completely changing the presented conclusions."
Moderate Quality Evidence: The authors are confident that the presented estimate lies close to the true value, but it is also possible that it may be substantially different. One could also interpret it as: further research may completely change the conclusions.
Low Quality Evidence: The authors are not confident in the effect estimate and the true value may be substantially different. One could interpret it as "further research is likely to change the presented conclusions completely."
Very low quality Evidence: The authors do not have any confidence in the estimate and it is likely that the true value is substantially different from it. One could interpret it as "new research will most probably change the presented conclusions completely."

Categories of recommendations

In guidelines and other publications, recommendation for a clinical service is classified by the balance of risk versus benefit and the level of evidence on which this information is based. The U.S. Preventive Services Task Force uses:

Level A: Good scientific evidence suggests that the benefits of the clinical service substantially outweigh the potential risks. Clinicians should discuss the service with eligible patients.
Level B: At least fair scientific evidence suggests that the benefits of the clinical service outweighs the potential risks. Clinicians should discuss the service with eligible patients.
Level C: At least fair scientific evidence suggests that there are benefits provided by the clinical service, but the balance between benefits and risks are too close for making general recommendations. Clinicians need not offer it unless there are individual considerations.
Level D: At least fair scientific evidence suggests that the risks of the clinical service outweighs potential benefits. Clinicians should not routinely offer the service to asymptomatic patients.
Level I: Scientific evidence is lacking, of poor quality, or conflicting, such that the risk versus benefit balance cannot be assessed. Clinicians should help patients understand the uncertainty surrounding the clinical service.

GRADE guideline panelists may make strong or weak recommendations on the basis of further criteria. Some of the important criteria are the balance between desirable and undesirable effects (not considering cost), the quality of the evidence, values and preferences and costs (resource utilization).

Despite the differences between systems, the purposes are the same: to guide users of clinical research information on which studies are likely to be most valid. However, the individual studies still require careful critical appraisal.

Statistical measures

Evidence-based medicine attempts to express clinical benefits of tests and treatments using mathematical methods. Tools used by practitioners of evidence-based medicine include:

Likelihood ratio
Main article: Likelihood ratios in diagnostic testing
The pre-test odds of a particular diagnosis, multiplied by the likelihood ratio, determines the post-test odds. (Odds can be calculated from, and then converted to, the [more familiar] probability.) This reflects Bayes' theorem. The differences in likelihood ratio between clinical tests can be used to prioritize clinical tests according to their usefulness in a given clinical situation.
AUC-ROC The area under the receiver operating characteristic curve (AUC-ROC) reflects the relationship between sensitivity and specificity for a given test. High-quality tests will have an AUC-ROC approaching 1, and high-quality publications about clinical tests will provide information about the AUC-ROC. Cutoff values for positive and negative tests can influence specificity and sensitivity, but they do not affect AUC-ROC.
Number needed to treat (NNT)/Number needed to harm (NNH). Number needed to treat or number needed to harm are ways of expressing the effectiveness and safety, respectively, of interventions in a way that is clinically meaningful. NNT is the number of people who need to be treated in order to achieve the desired outcome (e.g. survival from cancer) in one patient. For example, if a treatment increases the chance of survival by 5%, then 20 people need to be treated in order to have 1 additional patient survive due to the treatment. The concept can also be applied to diagnostic tests. For example, if 1,339 women age 50–59 have to be invited for breast cancer screening over a ten-year period in order to prevent one woman from dying of breast cancer, then the NNT for being invited to breast cancer screening is 1339.

Quality of clinical trials

Evidence-based medicine attempts to objectively evaluate the quality of clinical research by critically assessing techniques reported by researchers in their publications.

Trial design considerations. High-quality studies have clearly defined eligibility criteria and have minimal missing data.
Generalizability considerations. Studies may only be applicable to narrowly defined patient populations and may not be generalizable to other clinical contexts.
Follow-up. Sufficient time for defined outcomes to occur can influence the prospective study outcomes and the statistical power of a study to detect differences between a treatment and control arm.
Power. A mathematical calculation can determine if the number of patients is sufficient to detect a difference between treatment arms. A negative study may reflect a lack of benefit, or simply a lack of sufficient quantities of patients to detect a difference.

Limitations and criticism

Although evidence-based medicine is regarded as the gold standard of clinical practice, there are a number of limitations and criticisms of its use. Two widely cited categorization schemes for the various published critiques of EBM include the three-fold division of Straus and McAlister ("limitations universal to the practice of medicine, limitations unique to evidence-based medicine and misperceptions of evidence-based-medicine") and the five-point categorization of Cohen, Stavri and Hersh (EBM is a poor philosophic basis for medicine, defines evidence too narrowly, is not evidence-based, is limited in usefulness when applied to individual patients, or reduces the autonomy of the doctor/patient relationship).

In no particular order, some published objections include:

The theoretical ideal of EBM (that every narrow clinical question, of which hundreds of thousands can exist, would be answered by meta-analysis and systematic reviews of multiple RCTs) faces the limitation that research (especially the RCTs themselves) is expensive; thus, in reality, for the foreseeable future, there will always be much more demand for EBM than supply, and the best humanity can do is to triage the application of scarce resources.
Research produced by EBM, such as from randomized controlled trials (RCTs), may not be relevant for all treatment situations. Research tends to focus on specific populations, but individual persons can vary substantially from population norms. Since certain population segments have been historically under-researched (racial minorities and people with co-morbid diseases), evidence from RCTs may not be generalizable to those populations. Thus EBM applies to groups of people, but this should not preclude clinicians from using their personal experience in deciding how to treat each patient. One author advises that "the knowledge gained from clinical research does not directly answer the primary clinical question of what is best for the patient at hand" and suggests that evidence-based medicine should not discount the value of clinical experience. Another author stated that "the practice of evidence-based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research."
Research can be influenced by biases such as publication bias and conflict of interest in academic publishing. For example, studies with conflicts due to industry funding are more likely to favor their product.
There is a lag between when the RCT is conducted and when its results are published.
There is a lag between when results are published and when these are properly applied.
Hypocognition (the absence of a simple, consolidated mental framework that new information can be placed into) can hinder the application of EBM.
Values: while patient values are considered in the original definition of EBM, the importance of values is not commonly emphasized in EBM training, a potential problem under current study.

A 2018 study, "Why all randomised controlled trials produce biased results", assessed the 10 most cited RCTs and argued that trials face a wide range of biases and constraints, from trials only being feasible to study a small set of questions amenable to randomisation and generally only being able to assess the average treatment effect of a sample, to limitations in extrapolating results to another context, among many others outlined in the study.

Application of evidence in clinical settings

Despite the emphasis on evidence-based medicine, unsafe or ineffective medical practices continue to be applied, because of patient demand for tests or treatments, because of failure to access information about the evidence, or because of the rapid pace of change in the scientific evidence. For example, between 2003 and 2017, the evidence shifted on hundreds of medical practices, ranging from whether hormone replacement therapy was safe to whether babies should be given certain vitamins to whether antidepressant drugs are effective in people with Alzheimer's disease. Even when the evidence is unequivocally against a treatment, it usually takes ten years for other treatments to be adopted. In other cases, significant change can require a generation of physicians to retire or die, and be replaced by physicians who were trained with more recent evidence.

Physicians may also reject evidence which conflicts with their anecdotal experience or because of cognitive biases – for example, a vivid memory of a rare but shocking outcome (the availability heuristic), such as a patient dying after refusing treatment. They may overtreat to "do something" or to address a patient's emotional needs. They may worry about malpractice charges based on a discrepancy between what the patient expects and what the evidence recommends. They may also overtreat or provide ineffective treatments because the treatment feels biologically plausible.

Education

Training in evidence based medicine is offered across the continuum of medical education.

The Berlin questionnaire and the Fresno Test are validated instruments for assessing the effectiveness of education in evidence-based medicine. These questionnaires have been used in diverse settings.

A Campbell systematic review that included 24 trials examined the effectiveness of e-learning in improving evidence-based health care knowledge and practice. It was found that e-learning, compared to no learning, improves evidence-based health care knowledge and skills but not attitudes and behaviour. There is no difference in outcomes when comparing e-learning to face-to-face learning. Combining e-learning with face-to-face learning (blended learning) has a positive impact on evidence-based knowledge, skills, attitude and behaviour. Related to e-learning, medical school students have engaged with editing Wikipedia to increase their EBM skills.

Search This Blog

Monday, September 28, 2020

Intelligence amplification

Major contributions

William Ross Ashby: Intelligence Amplification

J. C. R. Licklider: Man-Computer Symbiosis

Douglas Engelbart: Augmenting Human Intellect

Later contributions

Artificial general intelligence

Requirements

Tests for confirming human-level AGI

Problems requiring AGI to solve

History

Classical AI

Narrow AI research

Modern artificial general intelligence research

Processing power needed to simulate a brain

Whole brain emulation

Early estimates

Modelling the neurons in more detail

Current research

Criticisms of simulation-based approaches

Strong AI and consciousness

Consciousness

Artificial consciousness research

Possible explanations for the slow progress of AI research

Controversies and dangers

Feasibility

Potential threat to human existence

Evidence-based medicine

Background, history and definition

Clinical decision making

Evidence-based guidelines and policies

Medical education

Progress

Current practice

Methods

Steps

Evidence reviews

Assessing the quality of evidence

Categories of recommendations

Statistical measures

Quality of clinical trials

Limitations and criticism

Application of evidence in clinical settings

Education

1947–1948 civil war in Mandatory Palestine