Search This Blog

Thursday, May 23, 2024

Metascience

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Metascience

Metascience
(also known as meta-research) is the use of scientific methodology to study science itself. Metascience seeks to increase the quality of scientific research while reducing inefficiency. It is also known as "research on research" and "the science of science", as it uses research methods to study how research is done and find where improvements can be made. Metascience concerns itself with all fields of research and has been described as "a bird's eye view of science". In the words of John Ioannidis, "Science is the best thing that has happened to human beings ... but we can do it better."

In 1966, an early meta-research paper examined the statistical methods of 295 papers published in ten high-profile medical journals. It found that "in almost 73% of the reports read ... conclusions were drawn when the justification for these conclusions was invalid." Meta-research in the following decades found many methodological flaws, inefficiencies, and poor practices in research across numerous scientific fields. Many scientific studies could not be reproduced, particularly in medicine and the soft sciences. The term "replication crisis" was coined in the early 2010s as part of a growing awareness of the problem.

Measures have been implemented to address the issues revealed by metascience. These measures include the pre-registration of scientific studies and clinical trials as well as the founding of organizations such as CONSORT and the EQUATOR Network that issue guidelines for methodology and reporting. There are continuing efforts to reduce the misuse of statistics, to eliminate perverse incentives from academia, to improve the peer review process, to systematically collect data about the scholarly publication system, to combat bias in scientific literature, and to increase the overall quality and efficiency of the scientific process. As such, metascience is a big part of methods underlying the Open Science Movement.

History

John Ioannidis (2005), "Why Most Published Research Findings Are False"

In 1966, an early meta-research paper examined the statistical methods of 295 papers published in ten high-profile medical journals. It found that, "in almost 73% of the reports read ... conclusions were drawn when the justification for these conclusions was invalid." A paper in 1976 called for funding for meta-research: "Because the very nature of research on research, particularly if it is prospective, requires long periods of time, we recommend that independent, highly competent groups be established with ample, long term support to conduct and support retrospective and prospective research on the nature of scientific discovery". In 2005, John Ioannidis published a paper titled "Why Most Published Research Findings Are False", which argued that a majority of papers in the medical field produce conclusions that are wrong. The paper went on to become the most downloaded paper in the Public Library of Science and is considered foundational to the field of metascience. In a related study with Jeremy Howick and Despina Koletsi, Ioannidis showed that only a minority of medical interventions are supported by 'high quality' evidence according to The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. Later meta-research identified widespread difficulty in replicating results in many scientific fields, including psychology and medicine. This problem was termed "the replication crisis". Metascience has grown as a reaction to the replication crisis and to concerns about waste in research.

Many prominent publishers are interested in meta-research and in improving the quality of their publications. Top journals such as Science, The Lancet, and Nature, provide ongoing coverage of meta-research and problems with reproducibility. In 2012 PLOS ONE launched a Reproducibility Initiative. In 2015 Biomed Central introduced a minimum-standards-of-reporting checklist to four titles.

The first international conference in the broad area of meta-research was the Research Waste/EQUATOR conference held in Edinburgh in 2015; the first international conference on peer review was the Peer Review Congress held in 1989. In 2016, Research Integrity and Peer Review was launched. The journal's opening editorial called for "research that will increase our understanding and suggest potential solutions to issues related to peer review, study reporting, and research and publication ethics".

Fields and topics of meta-research

An exemplary visualization of a conception of scientific knowledge generation structured by layers, with the "Institution of Science" being the subject of metascience

Metascience can be categorized into five major areas of interest: Methods, Reporting, Reproducibility, Evaluation, and Incentives. These correspond, respectively, with how to perform, communicate, verify, evaluate, and reward research.

Methods

Metascience seeks to identify poor research practices, including biases in research, poor study design, abuse of statistics, and to find methods to reduce these practices. Meta-research has identified numerous biases in scientific literature. Of particular note is the widespread misuse of p-values and abuse of statistical significance.

Scientific data science

Scientific data science is the use of data science to analyse research papers. It encompasses both qualitative and quantitative methods. Research in scientific data science includes fraud detection and citation network analysis.

Journalology

Journalology, also known as publication science, is the scholarly study of all aspects of the academic publishing process. The field seeks to improve the quality of scholarly research by implementing evidence-based practices in academic publishing. The term "journalology" was coined by Stephen Lock, the former editor-in-chief of The BMJ. The first Peer Review Congress, held in 1989 in Chicago, Illinois, is considered a pivotal moment in the founding of journalology as a distinct field. The field of journalology has been influential in pushing for study pre-registration in science, particularly in clinical trials. Clinical-trial registration is now expected in most countries.

Reporting

Meta-research has identified poor practices in reporting, explaining, disseminating and popularizing research, particularly within the social and health sciences. Poor reporting makes it difficult to accurately interpret the results of scientific studies, to replicate studies, and to identify biases and conflicts of interest in the authors. Solutions include the implementation of reporting standards, and greater transparency in scientific studies (including better requirements for disclosure of conflicts of interest). There is an attempt to standardize reporting of data and methodology through the creation of guidelines by reporting agencies such as CONSORT and the larger EQUATOR Network.

Reproducibility

Barriers to conducting replications of experiment in cancer research, The Reproducibility Project: Cancer Biology

The replication crisis is an ongoing methodological crisis in which it has been found that many scientific studies are difficult or impossible to replicate. While the crisis has its roots in the meta-research of the mid- to late 20th century, the phrase "replication crisis" was not coined until the early 2010s as part of a growing awareness of the problem. The replication crisis has been closely studied in psychology (especially social psychology) and medicine, including cancer research.  Replication is an essential part of the scientific process, and the widespread failure of replication puts into question the reliability of affected fields.

Moreover, replication of research (or failure to replicate) is considered less influential than original research, and is less likely to be published in many fields. This discourages the reporting of, and even attempts to replicate, studies.

Evaluation and incentives

Metascience seeks to create a scientific foundation for peer review. Meta-research evaluates peer review systems including pre-publication peer review, post-publication peer review, and open peer review. It also seeks to develop better research funding criteria.

Metascience seeks to promote better research through better incentive systems. This includes studying the accuracy, effectiveness, costs, and benefits of different approaches to ranking and evaluating research and those who perform it. Critics argue that perverse incentives have created a publish-or-perish environment in academia which promotes the production of junk science, low quality research, and false positives. According to Brian Nosek, "The problem that we face is that the incentive system is focused almost entirely on getting research published, rather than on getting research right." Proponents of reform seek to structure the incentive system to favor higher-quality results. For example, by quality being judged on the basis of narrative expert evaluations ("rather than [only or mainly] indices"), institutional evaluation criteria, guaranteeing of transparency, and professional standards.

Contributorship

Studies proposed machine-readable standards and (a taxonomy of) badges for science publication management systems that hones in on contributorship – who has contributed what and how much of the research labor – rather that using traditional concept of plain authorship – who was involved in any way creation of a publication. A study pointed out one of the problems associated with the ongoing neglect of contribution nuanciation – it found that "the number of publications has ceased to be a good metric as a result of longer author lists, shorter papers, and surging publication numbers".

Assessment factors

Factors other than a submission's merits can substantially influence peer reviewers' evaluations. Such factors may however also be important such as the use of track-records about the veracity of a researchers' prior publications and its alignment with public interests. Nevertheless, evaluation systems – include those of peer-review – may substantially lack mechanisms and criteria that are oriented or well-performingly oriented towards merit, real-world positive impact, progress and public usefulness rather than analytical indicators such as number of citations or altmetrics even when such can be used as partial indicators of such ends. Rethinking of the academic reward structure "to offer more formal recognition for intermediate products, such as data" could have positive impacts and reduce data withholding.

Recognition of training

A commentary noted that academic rankings don't consider where (country and institute) the respective researchers were trained.

Scientometrics

Scientometrics concerns itself with measuring bibliographic data in scientific publications. Major research issues include the measurement of the impact of research papers and academic journals, the understanding of scientific citations, and the use of such measurements in policy and management contexts. Studies suggest that "metrics used to measure academic success, such as the number of publications, citation number, and impact factor, have not changed for decades" and have to some degrees "ceased" to be good measures, leading to issues such as "overproduction, unnecessary fragmentations, overselling, predatory journals (pay and publish), clever plagiarism, and deliberate obfuscation of scientific results so as to sell and oversell".

Novel tools in this area include systems to quantify how much the cited-node informs the citing-node. This can be used to convert unweighted citation networks to a weighted one and then for importance assessment, deriving "impact metrics for the various entities involved, like the publications, authors etc" as well as, among other tools, for search engine- and recommendation systems.

Science governance

Science funding and science governance can also be explored and informed by metascience.

Incentives

Various interventions such as prioritization can be important. For instance, the concept of differential technological development refers to deliberately developing technologies – e.g. control-, safety- and policy-technologies versus risky biotechnologies – at different precautionary paces to decrease risks, mainly global catastrophic risk, by influencing the sequence in which technologies are developed. Relying only on the established form of legislation and incentives to ensure the right outcomes may not be adequate as these may often be too slow or inappropriate.

Other incentives to govern science and related processes, including via metascience-based reforms, may include ensuring accountability to the public (in terms of e.g. accessibility of, especially publicly-funded, research or of it addressing various research topics of public interest in serious manners), increasing the qualified productive scientific workforce, improving the efficiency of science to improve problem-solving in general, and facilitating that unambiguous societal needs based on solid scientific evidence – such as about human physiology – are adequately prioritized and addressed. Such interventions, incentives and intervention-designs can be subjects of metascience.

Science funding and awards
Cluster network of scientific publications in relation to Nobel prizes
Funding for climate research in the natural and technical sciences versus the social sciences and humanities

Scientific awards are one category of science incentives. Metascience can explore existing and hypothetical systems of science awards. For instance, it found that work honored by Nobel prizes clusters in only a few scientific fields with only 36/71 having received at least one Nobel prize of the 114/849 domains science could be divided into according to their DC2 and DC3 classification systems. Five of the 114 domains were shown to make up over half of the Nobel prizes awarded 1995–2017 (particle physics [14%], cell biology [12.1%], atomic physics [10.9%], neuroscience [10.1%], molecular chemistry [5.3%]).

A study found that delegation of responsibility by policy-makers – a centralized authority-based top-down approach – for knowledge production and appropriate funding to science with science subsequently somehow delivering "reliable and useful knowledge to society" is too simple.

Measurements show that allocation of bio-medical resources can be more strongly correlated to previous allocations and research than to burden of diseases.

A study suggests that "[i]f peer review is maintained as the primary mechanism of arbitration in the competitive selection of research reports and funding, then the scientific community needs to make sure it is not arbitrary".

Studies indicate there to is a need to "reconsider how we measure success" (see #Factors of success and progress).

Funding data

Funding information from grant databases and funding acknowledgment sections can be sources of data for scientometrics studies, e.g. for investigating or recognition of the impact of funding entities on the development of science and technology.

Research questions and coordination
Risk governance

Science communication and public use

It has been argued that "science has two fundamental attributes that underpin its value as a global public good: that knowledge claims and the evidence on which they are based are made openly available to scrutiny, and that the results of scientific research are communicated promptly and efficiently". Metascientific research is exploring topics of science communication such as media coverage of science, science journalism and online communication of results by science educators and scientists. A study found that the "main incentive academics are offered for using social media is amplification" and that it should be "moving towards an institutional culture that focuses more on how these [or such] platforms can facilitate real engagement with research". Science communication may also involve the communication of societal needs, concerns and requests to scientists.

Alternative metrics tools

Alternative metrics tools can be used not only for help in assessment (performance and impact) and findability, but also aggregate many of the public discussions about a scientific paper in social media such as reddit, citations on Wikipedia, and reports about the study in the news media which can then in turn be analyzed in metascience or provided and used by related tools. In terms of assessment and findability, altmetrics rate publications' performance or impact by the interactions they receive through social media or other online platforms, which can for example be used for sorting recent studies by measured impact, including before other studies are citing them. The specific procedures of established altmetrics are not transparent and the used algorithms can not be customized or altered by the user as open source software can. A study has described various limitations of altmetrics and points "toward avenues for continued research and development". They are also limited in their use as a primary tool for researchers to find received constructive feedback. (see above)

Societal implications and applications

It has been suggested that it may benefit science if "intellectual exchange—particularly regarding the societal implications and applications of science and technology—are better appreciated and incentivized in the future".

Knowledge integration

Primary studies "without context, comparison or summary are ultimately of limited value" and various types of research syntheses and summaries integrate primary studies. Progress in key social-ecological challenges of the global environmental agenda is "hampered by a lack of integration and synthesis of existing scientific evidence", with a "fast-increasing volume of data", compartmentalized information and generally unmet evidence synthesis challenges. According to Khalil, researchers are facing the problem of too many papers – e.g. in March 2014 more than 8,000 papers were submitted to arXiv – and to "keep up with the huge amount of literature, researchers use reference manager software, they make summaries and notes, and they rely on review papers to provide an overview of a particular topic". He notes that review papers are usually (only)" for topics in which many papers were written already, and they can get outdated quickly" and suggests "wiki-review papers" that get continuously updated with new studies on a topic and summarize many studies' results and suggest future research. A study suggests that if a scientific publication is being cited in a Wikipedia article this could potentially be considered as an indicator of some form of impact for this publication, for example as this may, over time, indicate that the reference has contributed to a high-level of summary of the given topic.

Science journalism

Science journalists play an important role in the scientific ecosystem and in science communication to the public and need to "know how to use, relevant information when deciding whether to trust a research finding, and whether and how to report on it", vetting the findings that get transmitted to the public.

Science education

Some studies investigate science education, e.g. the teaching about selected scientific controversies and historical discovery process of major scientific conclusions, and common scientific misconceptions. Education can also be a topic more generally such as how to improve the quality of scientific outputs and reduce the time needed before scientific work or how to enlarge and retain various scientific workforces.

Science misconceptions and anti-science attitudes

Many students have misconceptions about what science is and how it works. Anti-science attitudes and beliefs are also a subject of research. Hotez suggests antiscience "has emerged as a dominant and highly lethal force, and one that threatens global security", and that there is a need for "new infrastructure" that mitigates it.

Evolution of sciences

Scientific practice

Number of authors of research articles in six journals through time
Trends of diversity of work cited, mean number of self-citations, and mean age of cited work may indicate papers are using "narrower portions of existing knowledge".

Metascience can investigate how scientific processes evolve over time. A study found that teams are growing in size, "increasing by an average of 17% per decade". (see labor advantage below)

ArXiv's yearly submission rate growth over 30 years

It was found that prevalent forms of non-open access publication and prices charged for many conventional journals – even for publicly funded papers – are unwarranted, unnecessary – or suboptimal – and detrimental barriers to scientific progress. Open access can save considerable amounts of financial resources, which could be used otherwise, and level the playing field for researchers in developing countries. There are substantial expenses for subscriptions, gaining access to specific studies, and for article processing charges. Paywall: The Business of Scholarship is a documentary on such issues.

Another topic are the established styles of scientific communication (e.g. long text-form studies and reviews) and the scientific publishing practices – there are concerns about a "glacial pace" of conventional publishing. The use of preprint-servers to publish study-drafts early is increasing and open peer review, new tools to screen studies, and improved matching of submitted manuscripts to reviewers are among the proposals to speed up publication.

Science overall and intrafield developments

A visualization of scientific outputs by field in OpenAlex.  A study can be part of multiple fields and lower numbers of papers is not necessarily detrimental for fields.
Change of number of scientific papers by field according to OpenAlex
Number of PubMed search results for "coronavirus" by year from 1949 to 2020

Studies have various kinds of metadata which can be utilized, complemented and made accessible in useful ways. OpenAlex is a free online index of over 200 million scientific documents that integrates and provides metadata such as sources, citations, author information, scientific fields and research topics. Its API and open source website can be used for metascience, scientometrics and novel tools that query this semantic web of papers. Another project under development, Scholia, uses metadata of scientific publications for various visualizations and aggregation features such as providing a simple user interface summarizing literature about a specific feature of the SARS-CoV-2 virus using Wikidata's "main subject" property.

Subject-level resolutions

Beyond metadata explicitly assigned to studies by humans, natural language processing and AI can be used to assign research publications to topics – one study investigating the impact of science awards used such to associate a paper's text (not just keywords) with the linguistic content of Wikipedia's scientific topics pages ("pages are created and updated by scientists and users through crowdsourcing"), creating meaningful and plausible classifications of high-fidelity scientific topics for further analysis or navigability.

Growth or stagnation of science overall
Rough trend of scholarly publications about biomarkers according to Scholia; biomarker-related publications may not follow closely the number of viable biomarkers.
The CD index for papers published in Nature, PNAS, and Science and Nobel-Prize-winning papers
The CD index may indicate a "decline of disruptive science and technology".

Metascience research is investigating the growth of science overall, using e.g. data on the number of publications in bibliographic databases. A study found segments with different growth rates appear related to phases of "economic (e.g., industrialization)" – money is considered as necessary input to the science system – "and/or political developments (e.g., Second World War)". It also confirmed a recent exponential growth in the volume of scientific literature and calculated an average doubling period of 17.3 years.

However, others have pointed out that is difficult to measure scientific progress in meaningful ways, partly because it's hard to accurately evaluate how important any given scientific discovery is. A variety of perspectives of the trajectories of science overall (impact, number of major discoveries, etc) have been described in books and articles, including that science is becoming harder (per dollar or hour spent), that if science "slowing today, it is because science has remained too focused on established fields", that papers and patents are increasingly less likely to be "disruptive" in terms of breaking with the past as measured by the "CD index", and that there is a great stagnation – possibly as part of a larger trend – whereby e.g. "things haven't changed nearly as much since the 1970s" when excluding the computer and the Internet.

Better understanding of potential slowdowns according to some measures could be a major opportunity to improve humanity's future. For example, emphasis on citations in the measurement of scientific productivity, information overloads, reliance on a narrower set of existing knowledge (which may include narrow specialization and related contemporary practices) based on three "use of previous knowledge"-indicators, and risk-avoidant funding structures may have "toward incremental science and away from exploratory projects that are more likely to fail". The study that introduced the "CD index" suggests the overall number of papers has risen while the total of "highly disruptive" papers as measured by the index hasn't (notably, the 1998 discovery of the accelerating expansion of the universe has a CD index of 0). Their results also suggest scientists and inventors "may be struggling to keep up with the pace of knowledge expansion." Various ways of measuring "novelty" of studies, novelty metrics, have been proposed to balance a potential anti-novelty bias – such as textual analysis or measuring whether it makes first-time-ever combinations of referenced journals, taking into account the difficulty. Other approaches include pro-actively funding risky projects. (see above)

Topic mapping

Science maps could show main interrelated topics within a certain scientific domain, their change over time, and their key actors (researchers, institutions, journals). They may help find factors determine the emergence of new scientific fields and the development of interdisciplinary areas and could be relevant for science policy purposes. (see above) Theories of scientific change could guide "the exploration and interpretation of visualized intellectual structures and dynamic patterns". The maps can show the intellectual, social or conceptual structure of a research field. Beyond visual maps, expert survey-based studies and similar approaches could identify understudied or neglected societally important areas, topic-level problems (such as stigma or dogma), or potential misprioritizations. Examples of such are studies about policy in relation to public health and the social science of climate change mitigation where it has been estimated that only 0.12% of all funding for climate-related research is spent on such despite the most urgent puzzle at the current juncture being working out how to mitigate climate change, whereas the natural science of climate change is already well established.

There are also studies that map a scientific field or a topic such as the study of the use of research evidence in policy and practice, partly using surveys.

Controversies, current debates and disagreement

Percent of all citances in each field that contain signals of disagreement

Some research is investigating scientific controversy or controversies, and may identify currently ongoing major debates (e.g. open questions), and disagreement between scientists or studies. One study suggests the level of disagreement was highest in the social sciences and humanities (0.61%), followed by biomedical and health sciences (0.41%), life and earth sciences (0.29%); physical sciences and engineering (0.15%), and mathematics and computer science (0.06%). Such research may also show, where the disagreements are, especially if they cluster, including visually such as with cluster diagrams.

Challenges of interpretation of pooled results

Studies about a specific research question or research topic are often reviewed in the form of higher-level overviews in which results from various studies are integrated, compared, critically analyzed and interpreted. Examples of such works are scientific reviews and meta-analyses. These and related practices face various challenges and are a subject of metascience.

Various issues with included or available studies such as, for example, heterogeneity of methods used may lead to faulty conclusions of the meta-analysis.

Knowledge integration and living documents

Various problems require swift integration of new and existing science-based knowledge. Especially setting where there are a large number of loosely related projects and initiatives benefit from a common ground or "commons".

Evidence synthesis can be applied to important and, notably, both relatively urgent and certain global challenges: "climate change, energy transitions, biodiversity loss, antimicrobial resistance, poverty eradication and so on". It was suggested that a better system would keep summaries of research evidence up to date via living systematic reviews – e.g. as living documents. While the number of scientific papers and data (or information and online knowledge) has risen substantially, the number of published academic systematic reviews has risen from "around 6,000 in 2011 to more than 45,000 in 2021". An evidence-based approach is important for progress in science, policy, medical and other practices. For example, meta-analyses can quantify what is known and identify what is not yet known and place "truly innovative and highly interdisciplinary ideas" into the context of established knowledge which may enhance their impact. (see above)

Factors of success and progress

It has been hypothesized that a deeper understanding of factors behind successful science could "enhance prospects of science as a whole to more effectively address societal problems".

Novel ideas and disruptive scholarship

Two metascientists reported that "structures fostering disruptive scholarship and focusing attention on novel ideas" could be important as in a growing scientific field citation flows disproportionately consolidate to already well-cited papers, possibly slowing and inhibiting canonical progress. A study concluded that to enhance impact of truly innovative and highly interdisciplinary novel ideas, they should be placed in the context of established knowledge.torship, partnerships and social factors

Other researchers reported that the most successful – in terms of "likelihood of prizewinning, National Academy of Science (NAS) induction, or superstardom" – protégés studied under mentors who published research for which they were conferred a prize after the protégés' mentorship. Studying original topics rather than these mentors' research-topics was also positively associated with success. Highly productive partnerships are also a topic of research – e.g. "super-ties" of frequent co-authorship of two individuals who can complement skills, likely also the result of other factors such as mutual trust, conviction, commitment and fun.

Study of successful scientists and processes, general skills and activities

The emergence or origin of ideas by successful scientists is also a topic of research, for example reviewing existing ideas on how Mendel made his discoveries, – or more generally, the process of discovery by scientists. Science is a "multifaceted process of appropriation, copying, extending, or combining ideas and inventions" [and other types of knowledge or information], and not an isolated process. There are also few studies investigating scientists' habits, common modes of thinking, reading habits, use of information sources, digital literacy skills, and workflows.

Labor advantage

A study theorized that in many disciplines, larger scientific productivity or success by elite universities can be explained by their larger pool of available funded laborers. The study found that university prestige was only associated with higher productivity for faculty with group members, not for faculty publishing alone or the group members themselves. This is presented as evidence that the outsize productivity of elite researchers is not from a more rigorous selection of talent by top universities, but from labor advantages accrued through greater access to funding and the attraction of prestige to graduate and postdoctoral researchers.

Ultimate impacts

Success in science (as indicated in tenure review processes) is often measured in terms of metrics like citations, not in terms of the eventual or potential impact on lives and society, which awards (see above) sometimes do. Problems with such metrics are roughly outlined elsewhere in this article and include that reviews replace citations to primary studies. There are also proposals for changes to the academic incentives systems that increase the recognition of societal impact in the research process.

Progress studies

A proposed field of "Progress Studies" could investigate how scientists (or funders or evaluators of scientists) should be acting, "figuring out interventions" and study progress itself. The field was explicitly proposed in a 2019 essay and described as an applied science that prescribes action.

As and for acceleration of progress

A study suggests that improving the way science is done could accelerate the rate of scientific discovery and its applications which could be useful for finding urgent solutions to humanity's problems, improve humanity's conditions, and enhance understanding of nature. Metascientific studies can seek to identify aspects of science that need improvement, and develop ways to improve them. If science is accepted as the fundamental engine of economic growth and social progress, this could raise "the question of what we – as a society – can do to accelerate science, and to direct science toward solving society's most important problems." However, one of the authors clarified that a one-size-fits-all approach is not thought to be right answer – for example, in funding, DARPA models, curiosity-driven methods, allowing "a single reviewer to champion a project even if his or her peers do not agree", and various other approaches all have their uses. Nevertheless, evaluation of them can help build knowledge of what works or works best.

Reforms

Meta-research identifying flaws in scientific practice has inspired reforms in science. These reforms seek to address and fix problems in scientific practice which lead to low-quality or inefficient research.

A 2015 study lists "fragmented" efforts in meta-research.

Pre-registration

The practice of registering a scientific study before it is conducted is called pre-registration. It arose as a means to address the replication crisis. Pregistration requires the submission of a registered report, which is then accepted for publication or rejected by a journal based on theoretical justification, experimental design, and the proposed statistical analysis. Pre-registration of studies serves to prevent publication bias (e.g. not publishing negative results), reduce data dredging, and increase replicability.

Reporting standards

Studies showing poor consistency and quality of reporting have demonstrated the need for reporting standards and guidelines in science, which has led to the rise of organisations that produce such standards, such as CONSORT (Consolidated Standards of Reporting Trials) and the EQUATOR Network.

The EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network is an international initiative aimed at promoting transparent and accurate reporting of health research studies to enhance the value and reliability of medical research literature. The EQUATOR Network was established with the goals of raising awareness of the importance of good reporting of research, assisting in the development, dissemination and implementation of reporting guidelines for different types of study designs, monitoring the status of the quality of reporting of research studies in the health sciences literature, and conducting research relating to issues that impact the quality of reporting of health research studies. The Network acts as an "umbrella" organisation, bringing together developers of reporting guidelines, medical journal editors and peer reviewers, research funding bodies, and other key stakeholders with a mutual interest in improving the quality of research publications and research itself.

Applications

Information and communications technologies

Metascience is used in the creation and improvement of technical systems (ICTs) and standards of science evaluation, incentivation, communication, commissioning, funding, regulation, production, management, use and publication. Such can be called "applied metascience" and may seek to explore ways to increase quantity, quality and positive impact of research. One example for such is the development of alternative metrics.

Study screening and feedback

Various websites or tools also identify inappropriate studies and/or enable feedback such as PubPeer, Cochrane's Risk of Bias Tool and RetractionWatch. Medical and academic disputes are as ancient as antiquity and a study calls for research into "constructive and obsessive criticism" and into policies to "help strengthen social media into a vibrant forum for discussion, and not merely an arena for gladiator matches". Feedback to studies can be found via altmetrics which is often integrated at the website of the study – most often as an embedded Altmetrics badge – but may often be incomplete, such as only showing social media discussions that link to the study directly but not those that link to news reports about the study. (see above)

Tools used, modified, extended or investigated

Tools may get developed with metaresearch or can be used or investigated by such. Notable examples may include:

  • The tool scite.ai aims to track and link citations of papers as 'Supporting', 'Mentioning' or 'Contrasting' the study.
  • The Scite Reference Check bot is an extension of scite.ai that scans new article PDFs "for references to retracted papers, and posts both the citing and retracted papers on Twitter" and also "flags when new studies cite older ones that have issued corrections, errata, withdrawals, or expressions of concern". Studies have suggested as few as 4% of citations to retracted papers clearly recognize the retraction.
  • Search engines like Google Scholar are used to find studies and the notification service Google Alerts enables notifications for new studies matching specified search terms. Scholarly communication infrastructure includes search databases.
  • Shadow library Sci-hub is a topic of metascience
  • Personal knowledge management systems for research-, knowledge- and task management, such as saving information in organized ways with multi-document text editors for future use Such systems could be described as part of, along with e.g. Web browser (tabs-addons etc) and search software, "mind-machine partnerships" that could be investigated by metascience for how they could improve science.
  • Scholia – efforts to open scholarly publication metadata and use it via Wikidata. (see above)
  • Various software enables common metascientific practices such as bibliometric analysis.

Development

According to a study "a simple way to check how often studies have been repeated, and whether or not the original findings are confirmed" is needed due to reproducibility issues in science. A study suggests a tool for screening studies for early warning signs for research fraud.

Medicine

Clinical research in medicine is often of low quality, and many studies cannot be replicated. An estimated 85% of research funding is wasted. Additionally, the presence of bias affects research quality. The pharmaceutical industry exerts substantial influence on the design and execution of medical research. Conflicts of interest are common among authors of medical literature and among editors of medical journals. While almost all medical journals require their authors to disclose conflicts of interest, editors are not required to do so. Financial conflicts of interest have been linked to higher rates of positive study results. In antidepressant trials, pharmaceutical sponsorship is the best predictor of trial outcome.

Blinding is another focus of meta-research, as error caused by poor blinding is a source of experimental bias. Blinding is not well reported in medical literature, and widespread misunderstanding of the subject has resulted in poor implementation of blinding in clinical trials. Furthermore, failure of blinding is rarely measured or reported. Research showing the failure of blinding in antidepressant trials has led some scientists to argue that antidepressants are no better than placebo. In light of meta-research showing failures of blinding, CONSORT standards recommend that all clinical trials assess and report the quality of blinding.

Studies have shown that systematic reviews of existing research evidence are sub-optimally used in planning a new research or summarizing the results. Cumulative meta-analyses of studies evaluating the effectiveness of medical interventions have shown that many clinical trials could have been avoided if a systematic review of existing evidence was done prior to conducting a new trial. For example, Lau et al. analyzed 33 clinical trials (involving 36974 patients) evaluating the effectiveness of intravenous streptokinase for acute myocardial infarction. Their cumulative meta-analysis demonstrated that 25 of 33 trials could have been avoided if a systematic review was conducted prior to conducting a new trial. In other words, randomizing 34542 patients was potentially unnecessary. One study analyzed 1523 clinical trials included in 227 meta-analyses and concluded that "less than one quarter of relevant prior studies" were cited. They also confirmed earlier findings that most clinical trial reports do not present systematic review to justify the research or summarize the results.

Many treatments used in modern medicine have been proven to be ineffective, or even harmful. A 2007 study by John Ioannidis found that it took an average of ten years for the medical community to stop referencing popular practices after their efficacy was unequivocally disproven.

Psychology

Metascience has revealed significant problems in psychological research. The field suffers from high bias, low reproducibility, and widespread misuse of statistics. The replication crisis affects psychology more strongly than any other field; as many as two-thirds of highly publicized findings may be impossible to replicate. Meta-research finds that 80-95% of psychological studies support their initial hypotheses, which strongly implies the existence of publication bias.

The replication crisis has led to renewed efforts to re-test important findings. In response to concerns about publication bias and p-hacking, more than 140 psychology journals have adopted result-blind peer review, in which studies are pre-registered and published without regard for their outcome. An analysis of these reforms estimated that 61 percent of result-blind studies produce null results, in contrast with 5 to 20 percent in earlier research. This analysis shows that result-blind peer review substantially reduces publication bias.

Psychologists routinely confuse statistical significance with practical importance, enthusiastically reporting great certainty in unimportant facts. Some psychologists have responded with an increased use of effect size statistics, rather than sole reliance on the p values.

Physics

Richard Feynman noted that estimates of physical constants were closer to published values than would be expected by chance. This was believed to be the result of confirmation bias: results that agreed with existing literature were more likely to be believed, and therefore published. Physicists now implement blinding to prevent this kind of bias.

Computer Science

Web measurement studies are essential for understanding the workings of the modern Web, particularly in the fields of security and privacy. However, these studies often require custom-built or modified crawling setups, leading to a plethora of analysis tools for similar tasks. In a paper by Nurullah Demir et al., the authors surveyed 117 recent research papers to derive best practices for Web-based measurement studies and establish criteria for reproducibility and replicability. They found that experimental setups and other critical information for reproducing and replicating results are often missing. In a large-scale Web measurement study on 4.5 million pages with 24 different measurement setups, the authors demonstrated the impact of slight differences in experimental setups on the overall results, emphasizing the need for accurate and comprehensive documentation.

Organizations and institutes

There are several organizations and universities across the globe which work on meta-research – these include the Meta-Research Innovation Center at Berlin, the Meta-Research Innovation Center at Stanford, the Meta-Research Center at Tilburg University, the Meta-research & Evidence Synthesis Unit, The George Institute for Global Health at India and Center for Open Science. Organizations that develop tools for metascience include OurResearch, Center for Scientific Integrity and altmetrics companies. There is an annual Metascience Conference hosted by the Association for Interdisciplinary Meta-Research and Open Science (AIMOS) and biannual conference hosted by the Centre for Open Science.

Evidence-based medicine

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Evidence-based_medicine

Evidence-based medicine (EBM) is "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients." The aim of EBM is to integrate the experience of the clinician, the values of the patient, and the best available scientific information to guide decision-making about clinical management. The term was originally used to describe an approach to teaching the practice of medicine and improving decisions by individual physicians about individual patients.

The EBM Pyramid is a tool that helps in visualizing the hierarchy of evidence in medicine, from least authoritative, like expert opinions, to most authoritative, like systematic reviews.

Background, history, and definition

Medicine has a long history of scientific inquiry about the prevention, diagnosis, and treatment of human disease. In the 11th century AD, Avicenna, a Persian physician and philosopher, developed an approach to EBM that was mostly similar to current ideas and practises.

The concept of a controlled clinical trial was first described in 1662 by Jan Baptist van Helmont in reference to the practice of bloodletting. Wrote Van Helmont:

Let us take out of the Hospitals, out of the Camps, or from elsewhere, 200, or 500 poor People, that have fevers or Pleuritis. Let us divide them in Halfes, let us cast lots, that one halfe of them may fall to my share, and the others to yours; I will cure them without blood-letting and sensible evacuation; but you do, as ye know ... we shall see how many Funerals both of us shall have...

The first published report describing the conduct and results of a controlled clinical trial was by James Lind, a Scottish naval surgeon who conducted research on scurvy during his time aboard HMS Salisbury in the Channel Fleet, while patrolling the Bay of Biscay. Lind divided the sailors participating in his experiment into six groups, so that the effects of various treatments could be fairly compared. Lind found improvement in symptoms and signs of scurvy among the group of men treated with lemons or oranges. He published a treatise describing the results of this experiment in 1753.

An early critique of statistical methods in medicine was published in 1835.

The term 'evidence-based medicine' was introduced in 1990 by Gordon Guyatt of McMaster University.

Clinical decision-making

Alvan Feinstein's publication of Clinical Judgment in 1967 focused attention on the role of clinical reasoning and identified biases that can affect it. In 1972, Archie Cochrane published Effectiveness and Efficiency, which described the lack of controlled trials supporting many practices that had previously been assumed to be effective. In 1973, John Wennberg began to document wide variations in how physicians practiced. Through the 1980s, David M. Eddy described errors in clinical reasoning and gaps in evidence. In the mid-1980s, Alvin Feinstein, David Sackett and others published textbooks on clinical epidemiology, which translated epidemiological methods to physician decision-making. Toward the end of the 1980s, a group at RAND showed that large proportions of procedures performed by physicians were considered inappropriate even by the standards of their own experts.

Evidence-based guidelines and policies

David M. Eddy first began to use the term 'evidence-based' in 1987 in workshops and a manual commissioned by the Council of Medical Specialty Societies to teach formal methods for designing clinical practice guidelines. The manual was eventually published by the American College of Physicians. Eddy first published the term 'evidence-based' in March 1990, in an article in the Journal of the American Medical Association (JAMA) that laid out the principles of evidence-based guidelines and population-level policies, which Eddy described as "explicitly describing the available evidence that pertains to a policy and tying the policy to evidence instead of standard-of-care practices or the beliefs of experts. The pertinent evidence must be identified, described, and analyzed. The policymakers must determine whether the policy is justified by the evidence. A rationale must be written." He discussed evidence-based policies in several other papers published in JAMA in the spring of 1990. Those papers were part of a series of 28 published in JAMA between 1990 and 1997 on formal methods for designing population-level guidelines and policies.

Medical education

The term 'evidence-based medicine' was introduced slightly later, in the context of medical education. In the autumn of 1990, Gordon Guyatt used it in an unpublished description of a program at McMaster University for prospective or new medical students. Guyatt and others first published the term two years later (1992) to describe a new approach to teaching the practice of medicine.

In 1996, David Sackett and colleagues clarified the definition of this tributary of evidence-based medicine as "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients. ... [It] means integrating individual clinical expertise with the best available external clinical evidence from systematic research." This branch of evidence-based medicine aims to make individual decision making more structured and objective by better reflecting the evidence from research. Population-based data are applied to the care of an individual patient, while respecting the fact that practitioners have clinical expertise reflected in effective and efficient diagnosis and thoughtful identification and compassionate use of individual patients' predicaments, rights, and preferences.

Between 1993 and 2000, the Evidence-Based Medicine Working Group at McMaster University published the methods to a broad physician audience in a series of 25 "Users' Guides to the Medical Literature" in JAMA. In 1995 Rosenberg and Donald defined individual-level, evidence-based medicine as "the process of finding, appraising, and using contemporaneous research findings as the basis for medical decisions." In 2010, Greenhalgh used a definition that emphasized quantitative methods: "the use of mathematical estimates of the risk of benefit and harm, derived from high-quality research on population samples, to inform clinical decision-making in the diagnosis, investigation or management of individual patients."

The two original definitions highlight important differences in how evidence-based medicine is applied to populations versus individuals. When designing guidelines applied to large groups of people in settings with relatively little opportunity for modification by individual physicians, evidence-based policymaking emphasizes that good evidence should exist to document a test's or treatment's effectiveness. In the setting of individual decision-making, practitioners can be given greater latitude in how they interpret research and combine it with their clinical judgment. In 2005, Eddy offered an umbrella definition for the two branches of EBM: "Evidence-based medicine is a set of principles and methods intended to ensure that to the greatest extent possible, medical decisions, guidelines, and other types of policies are based on and consistent with good evidence of effectiveness and benefit."

Progress

In the area of evidence-based guidelines and policies, the explicit insistence on evidence of effectiveness was introduced by the American Cancer Society in 1980. The U.S. Preventive Services Task Force (USPSTF) began issuing guidelines for preventive interventions based on evidence-based principles in 1984. In 1985, the Blue Cross Blue Shield Association applied strict evidence-based criteria for covering new technologies. Beginning in 1987, specialty societies such as the American College of Physicians, and voluntary health organizations such as the American Heart Association, wrote many evidence-based guidelines. In 1991, Kaiser Permanente, a managed care organization in the US, began an evidence-based guidelines program. In 1991, Richard Smith wrote an editorial in the British Medical Journal and introduced the ideas of evidence-based policies in the UK. In 1993, the Cochrane Collaboration created a network of 13 countries to produce systematic reviews and guidelines. In 1997, the US Agency for Healthcare Research and Quality (AHRQ, then known as the Agency for Health Care Policy and Research, or AHCPR) established Evidence-based Practice Centers (EPCs) to produce evidence reports and technology assessments to support the development of guidelines. In the same year, a National Guideline Clearinghouse that followed the principles of evidence-based policies was created by AHRQ, the AMA, and the American Association of Health Plans (now America's Health Insurance Plans). In 1999, the National Institute for Clinical Excellence (NICE) was created in the UK.

In the area of medical education, medical schools in Canada, the US, the UK, Australia, and other countries now offer programs that teach evidence-based medicine. A 2009 study of UK programs found that more than half of UK medical schools offered some training in evidence-based medicine, although the methods and content varied considerably, and EBM teaching was restricted by lack of curriculum time, trained tutors and teaching materials. Many programs have been developed to help individual physicians gain better access to evidence. For example, UpToDate was created in the early 1990s. The Cochrane Collaboration began publishing evidence reviews in 1993. In 1995, BMJ Publishing Group launched Clinical Evidence, a 6-monthly periodical that provided brief summaries of the current state of evidence about important clinical questions for clinicians.

Current practice

By 2000, use of the term evidence-based had extended to other levels of the health care system. An example is evidence-based health services, which seek to increase the competence of health service decision makers and the practice of evidence-based medicine at the organizational or institutional level.

The multiple tributaries of evidence-based medicine share an emphasis on the importance of incorporating evidence from formal research in medical policies and decisions. However, because they differ on the extent to which they require good evidence of effectiveness before promoting a guideline or payment policy, a distinction is sometimes made between evidence-based medicine and science-based medicine, which also takes into account factors such as prior plausibility and compatibility with established science (as when medical organizations promote controversial treatments such as acupuncture). Differences also exist regarding the extent to which it is feasible to incorporate individual-level information in decisions. Thus, evidence-based guidelines and policies may not readily "hybridise" with experience-based practices orientated towards ethical clinical judgement, and can lead to contradictions, contest, and unintended crises. The most effective "knowledge leaders" (managers and clinical leaders) use a broad range of management knowledge in their decision making, rather than just formal evidence. Evidence-based guidelines may provide the basis for governmentality in health care, and consequently play a central role in the governance of contemporary health care systems.

Methods

Steps

The steps for designing explicit, evidence-based guidelines were described in the late 1980s: formulate the question (population, intervention, comparison intervention, outcomes, time horizon, setting); search the literature to identify studies that inform the question; interpret each study to determine precisely what it says about the question; if several studies address the question, synthesize their results (meta-analysis); summarize the evidence in evidence tables; compare the benefits, harms and costs in a balance sheet; draw a conclusion about the preferred practice; write the guideline; write the rationale for the guideline; have others review each of the previous steps; implement the guideline.

For the purposes of medical education and individual-level decision making, five steps of EBM in practice were described in 1992 and the experience of delegates attending the 2003 Conference of Evidence-Based Health Care Teachers and Developers was summarized into five steps and published in 2005. This five-step process can broadly be categorized as follows:

  1. Translation of uncertainty to an answerable question; includes critical questioning, study design and levels of evidence
  2. Systematic retrieval of the best evidence available
  3. Critical appraisal of evidence for internal validity that can be broken down into aspects regarding:
    • Systematic errors as a result of selection bias, information bias and confounding
    • Quantitative aspects of diagnosis and treatment
    • The effect size and aspects regarding its precision
    • Clinical importance of results
    • External validity or generalizability
  4. Application of results in practice
  5. Evaluation of performance

Evidence reviews

Systematic reviews of published research studies are a major part of the evaluation of particular treatments. The Cochrane Collaboration is one of the best-known organisations that conducts systematic reviews. Like other producers of systematic reviews, it requires authors to provide a detailed study protocol as well as a reproducible plan of their literature search and evaluations of the evidence. After the best evidence is assessed, treatment is categorized as (1) likely to be beneficial, (2) likely to be harmful, or (3) without evidence to support either benefit or harm.

A 2007 analysis of 1,016 systematic reviews from all 50 Cochrane Collaboration Review Groups found that 44% of the reviews concluded that the intervention was likely to be beneficial, 7% concluded that the intervention was likely to be harmful, and 49% concluded that evidence did not support either benefit or harm. 96% recommended further research. In 2017, a study assessed the role of systematic reviews produced by Cochrane Collaboration to inform US private payers' policymaking; it showed that although the medical policy documents of major US private payers were informed by Cochrane systematic reviews, there was still scope to encourage the further use.

Assessing the quality of evidence

Evidence-based medicine categorizes different types of clinical evidence and rates or grades them according to the strength of their freedom from the various biases that beset medical research. For example, the strongest evidence for therapeutic interventions is provided by systematic review of randomized, well-blinded, placebo-controlled trials with allocation concealment and complete follow-up involving a homogeneous patient population and medical condition. In contrast, patient testimonials, case reports, and even expert opinion have little value as proof because of the placebo effect, the biases inherent in observation and reporting of cases, and difficulties in ascertaining who is an expert (however, some critics have argued that expert opinion "does not belong in the rankings of the quality of empirical evidence because it does not represent a form of empirical evidence" and continue that "expert opinion would seem to be a separate, complex type of knowledge that would not fit into hierarchies otherwise limited to empirical evidence alone.").

Several organizations have developed grading systems for assessing the quality of evidence. For example, in 1989 the U.S. Preventive Services Task Force (USPSTF) put forth the following system:

  • Level I: Evidence obtained from at least one properly designed randomized controlled trial.
  • Level II-1: Evidence obtained from well-designed controlled trials without randomization.
  • Level II-2: Evidence obtained from well-designed cohort studies or case-control studies, preferably from more than one center or research group.
  • Level II-3: Evidence obtained from multiple time series designs with or without the intervention. Dramatic results in uncontrolled trials might also be regarded as this type of evidence.
  • Level III: Opinions of respected authorities, based on clinical experience, descriptive studies, or reports of expert committees.

Another example are the Oxford CEBM Levels of Evidence published by the Centre for Evidence-Based Medicine. First released in September 2000, the Levels of Evidence provide a way to rank evidence for claims about prognosis, diagnosis, treatment benefits, treatment harms, and screening, which most grading schemes do not address. The original CEBM Levels were Evidence-Based On Call to make the process of finding evidence feasible and its results explicit. In 2011, an international team redesigned the Oxford CEBM Levels to make them more understandable and to take into account recent developments in evidence ranking schemes. The Oxford CEBM Levels of Evidence have been used by patients and clinicians, as well as by experts to develop clinical guidelines, such as recommendations for the optimal use of phototherapy and topical therapy in psoriasis and guidelines for the use of the BCLC staging system for diagnosing and monitoring hepatocellular carcinoma in Canada.

In 2000, a system was developed by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group. The GRADE system takes into account more dimensions than just the quality of medical research. It requires users who are performing an assessment of the quality of evidence, usually as part of a systematic review, to consider the impact of different factors on their confidence in the results. Authors of GRADE tables assign one of four levels to evaluate the quality of evidence, on the basis of their confidence that the observed effect (a numeric value) is close to the true effect. The confidence value is based on judgments assigned in five different domains in a structured manner. The GRADE working group defines 'quality of evidence' and 'strength of recommendations' based on the quality as two different concepts that are commonly confused with each other.

Systematic reviews may include randomized controlled trials that have low risk of bias, or observational studies that have high risk of bias. In the case of randomized controlled trials, the quality of evidence is high but can be downgraded in five different domains.

  • Risk of bias: A judgment made on the basis of the chance that bias in included studies has influenced the estimate of effect.
  • Imprecision: A judgment made on the basis of the chance that the observed estimate of effect could change completely.
  • Indirectness: A judgment made on the basis of the differences in characteristics of how the study was conducted and how the results are actually going to be applied.
  • Inconsistency: A judgment made on the basis of the variability of results across the included studies.
  • Publication bias: A judgment made on the basis of the question whether all the research evidence has been taken to account.

In the case of observational studies per GRADE, the quality of evidence starts off lower and may be upgraded in three domains in addition to being subject to downgrading.

  • Large effect: Methodologically strong studies show that the observed effect is so large that the probability of it changing completely is less likely.
  • Plausible confounding would change the effect: Despite the presence of a possible confounding factor that is expected to reduce the observed effect, the effect estimate still shows significant effect.
  • Dose response gradient: The intervention used becomes more effective with increasing dose. This suggests that a further increase will likely bring about more effect.

Meaning of the levels of quality of evidence as per GRADE:

  • High Quality Evidence: The authors are very confident that the presented estimate lies very close to the true value. In other words, the probability is very low that further research will completely change the presented conclusions.
  • Moderate Quality Evidence: The authors are confident that the presented estimate lies close to the true value, but it is also possible that it may be substantially different. In other words, further research may completely change the conclusions.
  • Low Quality Evidence: The authors are not confident in the effect estimate, and the true value may be substantially different. In other words, further research is likely to change the presented conclusions completely.
  • Very Low Quality Evidence: The authors do not have any confidence in the estimate and it is likely that the true value is substantially different from it. In other words, new research will probably change the presented conclusions completely.

Categories of recommendations

In guidelines and other publications, recommendation for a clinical service is classified by the balance of risk versus benefit and the level of evidence on which this information is based. The U.S. Preventive Services Task Force uses the following system:

  • Level A: Good scientific evidence suggests that the benefits of the clinical service substantially outweigh the potential risks. Clinicians should discuss the service with eligible patients.
  • Level B: At least fair scientific evidence suggests that the benefits of the clinical service outweighs the potential risks. Clinicians should discuss the service with eligible patients.
  • Level C: At least fair scientific evidence suggests that the clinical service provides benefits, but the balance between benefits and risks is too close for general recommendations. Clinicians need not offer it unless individual considerations apply.
  • Level D: At least fair scientific evidence suggests that the risks of the clinical service outweigh potential benefits. Clinicians should not routinely offer the service to asymptomatic patients.
  • Level I: Scientific evidence is lacking, of poor quality, or conflicting, such that the risk versus benefit balance cannot be assessed. Clinicians should help patients understand the uncertainty surrounding the clinical service.

GRADE guideline panelists may make strong or weak recommendations on the basis of further criteria. Some of the important criteria are the balance between desirable and undesirable effects (not considering cost), the quality of the evidence, values and preferences and costs (resource utilization).

Despite the differences between systems, the purposes are the same: to guide users of clinical research information on which studies are likely to be most valid. However, the individual studies still require careful critical appraisal.

Statistical measures

Evidence-based medicine attempts to express clinical benefits of tests and treatments using mathematical methods. Tools used by practitioners of evidence-based medicine include:

  • Likelihood ratio The pre-test odds of a particular diagnosis, multiplied by the likelihood ratio, determines the post-test odds. (Odds can be calculated from, and then converted to, the [more familiar] probability.) This reflects Bayes' theorem. The differences in likelihood ratio between clinical tests can be used to prioritize clinical tests according to their usefulness in a given clinical situation.
  • AUC-ROC The area under the receiver operating characteristic curve (AUC-ROC) reflects the relationship between sensitivity and specificity for a given test. High-quality tests will have an AUC-ROC approaching 1, and high-quality publications about clinical tests will provide information about the AUC-ROC. Cutoff values for positive and negative tests can influence specificity and sensitivity, but they do not affect AUC-ROC.
  • Number needed to treat (NNT)/Number needed to harm (NNH). NNT and NNH are ways of expressing the effectiveness and safety, respectively, of interventions in a way that is clinically meaningful. NNT is the number of people who need to be treated in order to achieve the desired outcome (e.g. survival from cancer) in one patient. For example, if a treatment increases the chance of survival by 5%, then 20 people need to be treated in order for 1 additional patient to survive because of the treatment. The concept can also be applied to diagnostic tests. For example, if 1,339 women age 50–59 need to be invited for breast cancer screening over a ten-year period in order to prevent one woman from dying of breast cancer, then the NNT for being invited to breast cancer screening is 1339.

Quality of clinical trials

Evidence-based medicine attempts to objectively evaluate the quality of clinical research by critically assessing techniques reported by researchers in their publications.

  • Trial design considerations: High-quality studies have clearly defined eligibility criteria and have minimal missing data.
  • Generalizability considerations: Studies may only be applicable to narrowly defined patient populations and may not be generalizable to other clinical contexts.
  • Follow-up: Sufficient time for defined outcomes to occur can influence the prospective study outcomes and the statistical power of a study to detect differences between a treatment and control arm.
  • Power: A mathematical calculation can determine whether the number of patients is sufficient to detect a difference between treatment arms. A negative study may reflect a lack of benefit, or simply a lack of sufficient quantities of patients to detect a difference.

Limitations and criticism

There are a number of limitations and criticisms of evidence-based medicine. Two widely cited categorization schemes for the various published critiques of EBM include the three-fold division of Straus and McAlister ("limitations universal to the practice of medicine, limitations unique to evidence-based medicine and misperceptions of evidence-based-medicine") and the five-point categorization of Cohen, Stavri and Hersh (EBM is a poor philosophic basis for medicine, defines evidence too narrowly, is not evidence-based, is limited in usefulness when applied to individual patients, or reduces the autonomy of the doctor/patient relationship).

In no particular order, some published objections include:

  • Research produced by EBM, such as from randomized controlled trials (RCTs), may not be relevant for all treatment situations. Research tends to focus on specific populations, but individual persons can vary substantially from population norms. Because certain population segments have been historically under-researched (due to reasons such as race, gender, age, and co-morbid diseases), evidence from RCTs may not be generalizable to those populations. Thus, EBM applies to groups of people, but this should not preclude clinicians from using their personal experience in deciding how to treat each patient. One author advises that "the knowledge gained from clinical research does not directly answer the primary clinical question of what is best for the patient at hand" and suggests that evidence-based medicine should not discount the value of clinical experience. Another author stated that "the practice of evidence-based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research."
  • Use of evidence-based guidelines often fits poorly for complex, multimorbid patients. This is because the guidelines are usually based on clinical studies focused on single diseases. In reality, the recommended treatments in such circumstances may interact unfavorably with each other and often lead to polypharmacy.
  • The theoretical ideal of EBM (that every narrow clinical question, of which hundreds of thousands can exist, would be answered by meta-analysis and systematic reviews of multiple RCTs) faces the limitation that research (especially the RCTs themselves) is expensive; thus, in reality, for the foreseeable future, the demand for EBM will always be much higher than the supply, and the best humanity can do is to triage the application of scarce resources.
  • Research can be influenced by biases such as political or belief bias, publication bias and conflict of interest in academic publishing. For example, studies with conflicts due to industry funding are more likely to favor their product. It has been argued that contemporary evidence based medicine is an illusion, since evidence based medicine has been corrupted by corporate interests, failed regulation, and commercialisation of academia.
  • Systematic Reviews methodologies are capable of bias and abuse in respect of (i) choice of inclusion criteria (ii) choice of outcome measures, comparisons and analyses (iii) the subjectivity inevitable in Risk of Bias assessments, even when codified procedures and criteria are observed. An example of all these problems can be seen in a Cochrane Review, as analyzed by Edmund J. Fordham, et al. in their relevant review.
  • A lag exists between when the RCT is conducted and when its results are published.
  • A lag exists between when results are published and when they are properly applied.
  • Hypocognition (the absence of a simple, consolidated mental framework into which new information can be placed) can hinder the application of EBM.
  • Values: while patient values are considered in the original definition of EBM, the importance of values is not commonly emphasized in EBM training, a potential problem under current study.

A 2018 study, "Why all randomised controlled trials produce biased results", assessed the 10 most cited RCTs and argued that trials face a wide range of biases and constraints, from trials only being able to study a small set of questions amenable to randomisation and generally only being able to assess the average treatment effect of a sample, to limitations in extrapolating results to another context, among many others outlined in the study.

Application of evidence in clinical settings

Despite the emphasis on evidence-based medicine, unsafe or ineffective medical practices continue to be applied, because of patient demand for tests or treatments, because of failure to access information about the evidence, or because of the rapid pace of change in the scientific evidence. For example, between 2003 and 2017, the evidence shifted on hundreds of medical practices, including whether hormone replacement therapy was safe, whether babies should be given certain vitamins, and whether antidepressant drugs are effective in people with Alzheimer's disease. Even when the evidence unequivocally shows that a treatment is either not safe or not effective, it may take many years for other treatments to be adopted.

There are many factors that contribute to lack of uptake or implementation of evidence-based recommendations. These include lack of awareness at the individual clinician or patient (micro) level, lack of institutional support at the organisation level (meso) level or higher at the policy (macro) level. In other cases, significant change can require a generation of physicians to retire or die and be replaced by physicians who were trained with more recent evidence.

Physicians may also reject evidence that conflicts with their anecdotal experience or because of cognitive biases – for example, a vivid memory of a rare but shocking outcome (the availability heuristic), such as a patient dying after refusing treatment. They may overtreat to "do something" or to address a patient's emotional needs. They may worry about malpractice charges based on a discrepancy between what the patient expects and what the evidence recommends. They may also overtreat or provide ineffective treatments because the treatment feels biologically plausible.

It is the responsibility of those developing clinical guidelines to include an implementation plan to facilitate uptake. The implementation process will include an implementation plan, analysis of the context, identifying barriers and facilitators and designing the strategies to address them.

Education

Training in evidence based medicine is offered across the continuum of medical education. Educational competencies have been created for the education of health care professionals.

The Berlin questionnaire and the Fresno Test are validated instruments for assessing the effectiveness of education in evidence-based medicine. These questionnaires have been used in diverse settings.

A Campbell systematic review that included 24 trials examined the effectiveness of e-learning in improving evidence-based health care knowledge and practice. It was found that e-learning, compared to no learning, improves evidence-based health care knowledge and skills but not attitudes and behaviour. No difference in outcomes is present when comparing e-learning with face-to-face learning. Combining e-learning and face-to-face learning (blended learning) has a positive impact on evidence-based knowledge, skills, attitude and behavior. As a form of e-learning, some medical school students engage in editing Wikipedia to increase their EBM skills, and some students construct EBM materials to develop their skills in communicating medical knowledge.

Delayed-choice quantum eraser

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Delayed-choice_quantum_eraser A delayed-cho...