A Medley of Potpourri

Thursday, May 24, 2018

Publication bias

From Wikipedia, the free encyclopedia

Publication bias is a type of bias that occurs in published academic research. It occurs when the outcome of an experiment or research study influences the decision whether to publish or otherwise distribute it. Publication bias matters because literature reviews regarding support for a hypothesis can be biased if the original literature is contaminated by publication bias.^[1] Publishing only results that show a significant finding disturbs the balance of findings.^[2]

Studies with significant results can be of the same standard as studies with a null result with respect to quality of execution and design.^[3] However, statistically significant results are three times more likely to be published than papers with null results.^[4]

Multiple factors contribute to publication bias.^[1] For instance, once a scientific finding is well established, it may become newsworthy to publish reliable papers that fail to reject the null hypothesis.^[5] It has been found that the most common reason for non-publication is simply that investigators decline to submit results, leading to non-response bias. Factors cited as underlying this effect include investigators assuming they must have made a mistake, failure to support a known finding, loss of interest in the topic, or anticipation that others will be uninterested in the null results.^[3] The nature of these issues and the problems that have been triggered, have been referred to as the 5 diseases that threaten science, which include: "significosis, an inordinate focus on statistically significant results; neophilia, an excessive appreciation for novelty; theorrhea, a mania for new theory; arigorium, a deficiency of rigor in theoretical and empirical work; and finally, disjunctivitis, a proclivity to produce large quantities of redundant, trivial, and incoherent works." ^[6]

Attempts to identify unpublished studies often prove difficult or are unsatisfactory.^[1] In an effort to combat this problem, some journals require that studies submitted for publication are pre-registered (registering a study prior to collection of data and analysis) with organizations like the Center for Open Science.

Other proposed strategies to detect and control for publication bias^[1] include p-curve analysis ^[7] and disfavoring small and non-randomised studies because of their demonstrated high susceptibility to error and bias.^[3]

Definition

Publication bias occurs when the publication of research results depends not just on the quality of the research but also on the hypothesis tested, and the significance and direction of effects detected.^[8] The term was first used in 1959 by statistician Theodore Sterling to refer to fields in which "successful" research is more likely to be published. As a result, "the literature of such a field consists in substantial part of false conclusions resulting from type-I errors".^[9]

Publication bias is sometimes called the "file drawer effect," or "file drawer problem." This term suggests that results not supporting the hypotheses of researchers often go no further than the researchers' file drawers, leading to a bias in published research.^[10] The term "file drawer problem" was coined by Rosenthal in 1979.^[11]

Positive-results bias, a type of publication bias, occurs when authors are more likely to submit, or editors are more likely to accept, positive results than negative or inconclusive results.^[12] Outcome reporting bias occurs when multiple outcomes are measured and analyzed, but the reporting of these outcomes is dependent on the strength and direction of its results. A generic term coined to describe these post-hoc choices is HARKing ("Hypothesizing After the Results are Known").^[13]

Evidence

Meta-analysis of stereotype threat on girls' math scores showing asymmetry typical of publication bias. From Flore, P. C., & Wicherts, J. M. (2015)^[14]

The presence of publication bias in the literature has been most extensively studied in biomedical research. Investigators following clinical trials from the submission of their protocols to ethics committees (or regulatory authorities) until the publication of their results observed that those with positive results are more likely to be published.^[15]^[16]^[17] In addition, studies often fail to report negative results when published, as demonstrated by research comparing study protocols with published articles.^[18]^[19]

The presence of publication bias was investigated in meta-analyses. The largest such analysis investigated the presence of publication bias in systematic reviews of medical treatments from the Cochrane Library.^[20] The study showed that statistically positive significant findings are 27% more likely to be included in meta-analyses of efficacy than other findings. Results showing no evidence of adverse effects have a 78% greater probability of inclusion in safety studies than statistically significant results showing adverse effects. Evidence of publication bias was found in meta-analyses published in prominent medical journals.^[21]

Impact on meta-analysis

Where publication bias is present, published studies are no longer a representative sample of the available evidence. This bias distorts the results of meta-analyses and systematic reviews. For example, evidence-based medicine is increasingly reliant on meta-analysis to assess evidence.

Meta-analyses and systematic reviews can account for publication bias by including evidence from unpublished studies and the grey literature. The presence of publication bias can also be explored by constructing a funnel plot in which the estimate of the reported effect size is plotted against a measure of precision or sample size. The premise is that the scatter of points should reflect a funnel shape, indicating that the reporting of effect sizes is not related to their statistical significance.^[22] However, when small studies are predominately in one direction (usually the direction of larger effect sizes), asymmetry will ensue and this may be indicative of publication bias.^[23]

Because an inevitable degree of subjectivity exists in the interpretation of funnel plots, several tests have been proposed for detecting funnel plot asymmetry.^[22]^[24]^[25] These are often based on linear regression, and may adopt a multiplicative or additive dispersion parameter to adjust for the presence of between-study heterogeneity. Some approaches may even attempt to compensate for the (potential) presence of publication bias,^[20]^[26]^[27] which is particularly useful to explore the potential impact on meta-analysis results.^[28]^[29]^[30]

Compensation examples

Two meta-analyses of the efficacy of reboxetine as an antidepressant demonstrated attempts to detect publication bias in clinical trials. Based on positive trial data, reboxetine was originally passed as a treatment for depression in many countries in Europe and the UK in 2001 (though in practice it is rarely used for this indication). A 2010 meta-analysis concluded that reboxetine was ineffective and that the preponderance of positive-outcome trials reflected publication bias, mostly due to trials published by the drug manufacturer Pfizer. A subsequent meta-analysis published in 2011, based on the original data, found flaws in the 2010 analyses and suggested that the data indicated reboxetine was effective in severe depression. Examples of publication bias are given by Goldacre^[31] and Wilmshurst.^[32]

In the social sciences, a study of published papers exploring the relationship between corporate social and financial performance found that "in economics, finance, and accounting journals, the average correlations were only about half the magnitude of the findings published in Social Issues Management, Business Ethics, or Business and Society journals".^[33]

One example cited as an instance of publication bias is the refusal to publish attempted replications of Bem's work that claimed evidence for precognition by The Journal of Personality and Social Psychology (the original publisher of Bem's article).^[34]

An analysis^[35] comparing studies of gene-disease associations originating in China to those originating outside China found that those conducted within the country reported a stronger association and a more statistically significant result.^[36]

Risks

Ioannidis argued that "claimed research findings may often be simply accurate measures of the prevailing bias."^[37] He lists the following factors as those that make a paper with a positive result more likely to enter the literature and suppress negative-result papers:

The studies conducted in a field are smaller.
The effect sizes are smaller.
There is both a greater number and lesser preselection of tested relationships.
There is greater flexibility in designs, definitions, outcomes, and analytical modes.
There is prejudice (financial interest or otherwise).
More teams are involved in a particular scientific field and chasing statistical significance.

Other factors include experimenter bias and white hat bias.

Remedies

Publication bias can be contained through better-powered studies, enhanced research standards, and careful consideration of true and non-true relationships.^[37] Better-powered studies refer to large studies that deliver definitive results or test major concepts and lead to low-bias meta-analysis. Enhanced research standards such as the pre-registration of protocols, the registration of data collections and adherence to established protocols are other techniques. To avoid false-positive results, the experimenter must consider the chances that they are testing a true or non-true relationship. This can be untertaken by properly assessing the false positive report probability based on the statistical power of the test^[38] and reconfirming (whenever ethically acceptable) established findings of prior studies known to have minimal bias.

Study registration

In September 2004, editors of prominent medical journals (including the New England Journal of Medicine, The Lancet, Annals of Internal Medicine, and JAMA) announced that they would no longer publish results of drug research sponsored by pharmaceutical companies, unless that research was registered in a public clinical trials registry database from the start.^[39] Furthermore, some journals (e.g. Trials), encourage publication of study protocols in their journals.^[40]

The World Health Organization (WHO) agreed that basic information about all clinical trials should be registered at the study's inception, and that this information should be publicly accessible through the WHO International Clinical Trials Registry Platform. Additionally, public availability of complete study protocols, alongside reports of trials, is becoming more common for studies .^[41]

Replication crisis

From Wikipedia, the free encyclopedia

The replication crisis (or replicability crisis or reproducibility crisis) refers to a methodological crisis in science in which scientists have found that the results of many scientific studies are difficult or impossible to replicate/reproduce on subsequent investigation, either by independent researchers or by the original researchers themselves.^[1]^[2] The crisis has long-standing roots; the phrase was coined in the early 2010s^[3] as part of a growing awareness of the problem.

Because the reproducibility of experiments is an essential part of the scientific method,^[4] the inability to replicate the studies of others has potentially grave consequences for many fields of science in which significant theories are grounded on unreproducible experimental work.

The replication crisis has been particularly widely discussed in the field of psychology (and in particular, social psychology) and in medicine, where a number of efforts have been made to re-investigate classic results, and to attempt to determine both the reliability of the results, and, if found to be unreliable, the reasons for the failure of replication.^[5]^[6]

Scope of the crisis

Overall

According to a 2016 poll of 1,500 scientists reported in the journal Nature, 70% of them had failed to reproduce at least one other scientist's experiment (50% had failed to reproduce one of their own experiments).

chemistry: 90% (60%),
biology: 80% (60%),
physics and engineering: 70% (50%),
medicine: 70% (60%),
Earth and environment science: 60% (40%).

In 2009, 2% of scientists admitted to falsifying studies at least once and 14% admitted to personally knowing someone who did. Misconducts were reported more frequently by medical researchers than others.^[8]

In medicine

Out of 49 medical studies from 1990–2003, with more than 1000 citations, 45 claimed that studied therapy was effective. Out of these studies, 16% were contradicted by subsequent studies, 16% had found stronger effects than did subsequent studies, 44% were replicated, and 24% remained largely unchallenged.^[9] Food and Drug Administration in 1977–90 found flaws in 10–20% of medical studies.^[10] In a paper published in 2012, Glenn Begley, a biotech consultant working at Amgen, and Lee Ellis, at the University of Texas, argued that only 11% of the pre-clinical cancer studies could be replicated.^[11]^[12]

A 2016 article by John Ioannidis, Professor of Medicine and of Health Research and Policy at Stanford University School of Medicine and a Professor of Statistics at Stanford University School of Humanities and Sciences, elaborated on "Why Most Clinical Research Is Not Useful".^[13] In the article Ioannidis laid out some of the problems and called for reform, characterizing certain points for medical research to be useful again – one example he made was the need for medicine to be "Patient Centered" (e.g. in the form of the Patient-Centered Outcomes Research Institute) instead of the current practice to mainly take care of "the needs of physicians, investigators, or sponsors". Ioannidis is known for his research focus on science itself since the 2005 paper "Why Most Published Research Findings Are False".^[14]

In psychology

Replication failures are not unique to psychology and are found in all fields of science.^[15] However, several factors have combined to put psychology at the center of controversy. Much of the focus has been on the area of social psychology,^[16] although other areas of psychology such as clinical psychology have also been implicated.

Firstly, questionable research practices (QRPs) have been identified as common in the field.^[17] Such practices, while not intentionally fraudulent, involve capitalizing on the gray area of acceptable scientific practices or exploiting flexibility in data collection, analysis, and reporting, often in an effort to obtain a desired outcome. Examples of QRPs include selective reporting or partial publication of data (reporting only some of the study conditions or collected dependent measures in a publication), optional stopping (choosing when to stop data collection, often based on statistical significance of tests), p-value rounding (rounding p-values down to .05 to suggest statistical significance), file drawer effect (nonpublication of data), post-hoc storytelling (framing exploratory analyses as confirmatory analyses), and manipulation of outliers (either removing outliers or leaving outliers in a dataset to cause a statistical test to be significant).^[17]^[18]^[19]^[20] A survey of over 2,000 psychologists indicated that a majority of respondents admitted to using at least one QRP.^[17] False positive conclusions, often resulting from the pressure to publish or the author's own confirmation bias, are an inherent hazard in the field, requiring a certain degree of skepticism on the part of readers.^[21]

Secondly, psychology and social psychology in particular, has found itself at the center of several scandals involving outright fraudulent research, most notably the admitted data fabrication by Diederik Stapel^[22] as well as allegations against others. However, most scholars acknowledge that fraud is, perhaps, the lesser contribution to replication crises.

Third, several effects in psychological science have been found to be difficult to replicate even before the current replication crisis. For example the scientific journal Judgment and Decision Making has published several studies over the years that fail to provide support for the unconscious thought theory. Replications appear particularly difficult when research trials are pre-registered and conducted by research groups not highly invested in the theory under questioning.

These three elements together have resulted in renewed attention for replication supported by Kahneman.^[23] Scrutiny of many effects have shown that several core beliefs are hard to replicate. A recent special edition of the journal Social Psychology focused on replication studies and a number of previously held beliefs were found to be difficult to replicate.^[24] A 2012 special edition of the journal Perspectives on Psychological Science also focused on issues ranging from publication bias to null-aversion that contribute to the replication crises in psychology.^[25] In 2015, the first open empirical study of reproducibility in Psychology was published, called the Reproducibility Project. Researchers from around the world collaborated to replicate 100 empirical studies from three top Psychology journals. Fewer than half of the attempted replications were successful at producing statistically significant results in the expected directions, though most of the attempted replications did produce trends in the expected directions.^[26]

Scholar James Coyne has recently written that many research trials and meta-analyses are compromised by poor quality and conflicts of interest that involve both authors and professional advocacy organizations, resulting in many false positives regarding the effectiveness of certain types of psychotherapy.^[27]

The replication crisis does not necessarily mean that psychology is unscientific.^[28]^[29]^[30] Rather this process is a healthy if sometimes acrimonious part of the scientific process in which old ideas or those that cannot withstand careful scrutiny are pruned,^[31]^[32] although this pruning process is not always effective.^[33]^[34] The consequence is that some areas of psychology once considered solid, such as social priming, have come under increased scrutiny due to failed replications.^[35] The British Independent newspaper wrote that the results of the reproducibility project show that much of the published research is just "psycho-babble".^[36]

Nobel laureate and professor emeritus in psychology Daniel Kahneman argued that the original authors should be involved in the replication effort because the published methods are often too vague.^[37] Some others scientists, like Dr. Andrew Wilson disagree and argue that the methods should be written down in detail. An investigation of replication rates in psychology in 2012 indicated higher success rates of replication in replication studies when there was author overlap with the original authors of a study^[38] (91.7% successful replication rates in studies with author overlap compared to 64.6% success replication rates without author overlap).

Psychology replication rates

A report by the Open Science Collaboration in August 2015 that was coordinated by Brian Nosek estimated the reproducibility of 100 studies in psychological science from three high-ranking psychology journals.^[39] Overall, 36% of the replications yielded significant findings (p value below .05) compared to 97% of the original studies that had significant effects. The mean effect size in the replications was approximately half the magnitude of the effects reported in the original studies.

The same paper examined the reproducibility rates and effect sizes by journal (Journal of Personality and Social Psychology [JPSP], Journal of Experimental Psychology: Learning, Memory, and Cognition [JEP:LMC], Psychological Science [PSCI]) and discipline (social psychology, cognitive psychology). Study replication rates were 23% for JPSP, 38% for JEP:LMC, and 38% for PSCI. Studies in the field of cognitive psychology had a higher replication rate (50%) than studies in the field of social psychology (25%).

An analysis of the publication history in the top 100 psychology journals between 1900 and 2012 indicated that approximately 1.6% of all psychology publications were replication attempts.^[38] Articles were considered a replication attempt if the term "replication" appeared in the text. A subset of those studies (500 studies) was randomly selected for further examination and yielded a lower replication rate of 1.07% (342 of the 500 studies [68.4%] were actually replications). In the subset of 500 studies, analysis indicated that 78.9% of published replication attempts were successful. The rate of successful replication was significantly higher when at least one author of the original study was part of the replication attempt (91.7% relative to 64.6%).

A disciplinary social dilemma

Highlighting the social structure that discourages replication in psychology, Brian D. Earp and Jim A. C. Everett enumerated five points as to why replication attempts are uncommon^[40]^[41]

"Independent, direct replications of others’ findings can be time-consuming for the replicating researcher
"[Replications] are likely to take energy and resources directly away from other projects that reflect one’s own original thinking
"[Replications] are generally harder to publish (in large part because they are viewed as being unoriginal)
"Even if [replications] are published, they are likely to be seen as 'bricklaying' exercises, rather than as major contributions to the field
"[Replications] bring less recognition and reward, and even basic career security, to their authors"^[42] For these reasons the authors advocated that psychology is facing a disciplinary social dilemma, where the interests of the discipline are at odds with the interests of the individual researcher.

"Methodological terrorism" controversy

With the replication crisis of psychology earning attention, Princeton University psychologist Susan Fiske drew controversy for calling out critics of psychology.^[43]^[44]^[45]^[46] She called these unnamed "adversaries" names such as "methodological terrorist" and "self-appointed data police", and said that criticism of psychology should only be expressed in private or through contacting the journals.^[43] Columbia University statistician and political scientist Andrew Gelman, "well-respected among the researchers driving the replication debate", responded to Fiske, saying that she had found herself willing to tolerate the "dead paradigm" of faulty statistics and had refused to retract publications even when errors were pointed out.^[43]^[47] He added that her tenure as editor has been abysmal and that a number of published papers edited by her were found to be based on extremely weak statistics; one of Fiske's own published papers had a major statistical error and "impossible" conclusions.^[43]

In marketing

Marketing is another discipline with a "desperate need" for replication.^[48] Many famous marketing studies fail to be repeated upon replication, a notable example being the "too-many-choices" effect, in which a high number of choices of product makes a consumer less likely to purchase.^[49] In addition to the previously mentioned arguments, replications studies in marketing are needed to examine the applicability of theories and models across countries and cultures, which is especially important because of possible influences of globalization.^[50]

In economics

A 2016 study in the journal Science found that two-thirds of 18 experimental studies from two top-tier economics journals (American Economic Review and the Quarterly Journal of Economics) successfully replicated.^[51]^[52] A 2017 study in the Economic Journal suggested that "the majority of the average effects in the empirical economics literature are exaggerated by a factor of at least 2 and at least one-third are exaggerated by a factor of 4 or more".^[53]

In sports science

A 2018 study took the field of exercise and sports science to task for insufficient replication studies, limited reporting of null results and trivial results, and insufficient research transparency.^[54] Statisticians have criticized sports science for common use of an invalid statistical method called "magnitude-based inference" that has allowed sports scientists to extract spurious results that appear to be meaningful from noisy data.^[55]

Causes of the crisis

In a work published in 2015 Glenn Begley and John Ioannidis offer five bullets as to summarize the present predicaments:^[56]

Generation of new data/ publications at an unprecedented rate.
Compelling evidence that the majority of these discoveries will not stand the test of time.
Causes: failure to adhere to good scientific practice & the desperation to publish or perish.
This is a multifaceted, multistakeholder problem.
No single party is solely responsible, and no single solution will suffice.

In fact some predictions of a possible crisis in the quality control mechanism of science can be traced back several decades, especially among scholars in science and technology studies (STS). Derek de Solla Price – considered the father of scientometrics – predicted that science could reach 'senility' as a result of its own exponential growth.^[57] Some present day literature seems to vindicate this 'overflow' prophesy, lamenting at decay in both attention and quality.^[58]^[59]

Philosopher and historian of science Jerome R. Ravetz predicted in his 1971 book Scientific Knowledge and Its Social Problems that science – in moving from the little science made of restricted communities of scientists to big science or techno-science – would suffer major problems in its internal system of quality control. Ravetz anticipated that modern science's system of rewarding scientists for research might become dysfunctional, the present 'publish or perish' challenge, creating perverse incentives to publish any findings however dubious. For Ravetz quality in science is maintained when there is a community of scholars linked by norms and standards, and a willingness to stand by these.

Historian Philip Mirowski offered more recently a similar diagnosis in his 2011 book Science Mart (2011).^[60] 'Mart' is here a reference to the retail giant 'Walmart' and an allusion to the commodification of science. In the analysis of Mirowski when science becomes a commodity being traded in a market its quality collapses. Mirowski argues his case by tracing the decay of science to the decision of major corporations to close their in house laboratories in order to outsource their work to universities, and subsequently to move their research away from universities to even cheaper contract research organization (CRO).

The crisis of science's quality control system is affecting the use of science for policy. This is the thesis of a recent work by a group of STS scholars, who identify in 'evidence based (or informed) policy' a point of present tension.^[61]^[62] Economist Noah Smith suggests that a factor in the crisis has been the overvaluing of research in academia and undervaluing of teaching ability, especially in fields with few major recent discoveries.^[63]

Addressing the replication crisis

Replication has been referred to as "the cornerstone of science".^[64]^[65] Replication studies attempt to evaluate whether published results reflect true findings or false positives. The integrity of scientific findings and reproducibility of research are important as they form the knowledge foundation on which future studies are built.

Tackling publication bias with pre-registration of studies

A recent innovation in scientific publishing to address the replication crisis is through the use of registered reports.^[66]^[67] The registered report format requires authors to submit a description of the study methods and analyses prior to data collection. Once the method and analysis plan is vetted through peer-review, publication of the findings is provisionally guaranteed, based on whether the authors follow the proposed protocol. One goal of registered reports is to circumvent the publication bias toward significant findings that can lead to implementation of Questionable Research Practices and to encourage publication of studies with rigorous methods.

The journal Psychological Science has encouraged the preregistration of studies and the reporting of effect sizes and confidence intervals.^[68] The editor in chief also noted that the editorial staff will be asking for replication of studies with surprising findings from examinations using small sample sizes before allowing the manuscripts to be published.

Moreover, only a very small proportion of academic journals in psychology and neurosciences explicitly stated that they welcome submissions of replication studies in their aim and scope or instructions to authors.^[69]^[70] This phenomenon does not encourage the reporting or even attempt on replication studies.

Emphasizing replication attempts in teaching

Based on coursework in experimental methods at MIT and Stanford, it has been suggested that methods courses in psychology emphasize replication attempts rather than original studies.^[71]^[72] Such an approach would help students learn scientific methodology and provide numerous independent replications of meaningful scientific findings that would test the replicability of scientific findings. Some have recommended that graduate students should be required to publish a high-quality replication attempt on a topic related to their doctoral research prior to graduation.^[41]

Reducing the p-value required for claiming significance of new results

Many publications require a p-value of p < 0.05 to claim statistical significance. The paper "Redefine statistical significance",^[73] signed by a large number of scientists and mathematicians, proposes that in "fields where the threshold for defining statistical significance for new discoveries is P < 0.05, we propose a change to P < 0.005. This simple step would immediately improve the reproducibility of scientific research in many fields."

Their rationale is that "a leading cause of non-reproducibility (is that the) statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating 'statistically significant' findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems."

Addressing the misinterpretation of p-values

Although statisticians are unanimous that use of the p < 0.05 provides weaker evidence than is generally appreciated, there is an unfortunate lack of unanimity about what should be done about it. Some have advocated that Bayesian methods should replace p-values. This has not happened on a wide scale, partly because it is complicated, and partly because many users distrust the specification of prior distributions in the absence of hard data. A simplified version of the Bayesian argument, based on testing a point null hypothesis was suggested by Colquhoun (2014, 2017).^[74]^[75] The logical problems of inductive inference were discussed in "The problem with p-values" (2016).^[76]

The hazards of reliance on p-values was emphasized by pointing out that even observation of p = 0.001 was not necessarily strong evidence against the null hypothesis.^[75] Despite the fact that the likelihood ratio in favour of the alternative hypothesis over the null is close to 100, if the hypothesis was implausible, with a prior probability of a real effect being 0.1, even the observation of p = 0.001 would have a false positive risk of 8 percent. It wouldn't even reach the 5 percent level.

It was recommended^[75] that the terms "significant" and "non-significant" should not be used. p-values and confidence intervals should still be specified, but they should be accompanied by an indication of the false positive risk. It was suggested that the best way to do this is to calculate the prior probability that would be necessary to believe in order to achieve a false positive risk of, say, 5%. The calculations can be done with R scripts that are provided,^[75] or, more simply, with a web calculator.^[77] This so-called reverse Bayesian approach, which was suggested by Matthews (2001),^[78] is one way to avoid the problem that the prior probability is rarely known.

Encouraging use of larger sample sizes

To improve the quality of replications, larger sample sizes than those used in the original study are often needed.^[79] Larger sample sizes are needed because estimates of effect sizes in published work are often exaggerated due to publication bias and large sampling variability associated with small sample sizes in an original study.^[80]^[81]^[81]^[82] Further, using significance thresholds usually leads to inflated effects, because particularly with small sample sizes, only the largest effects will become significant.^[83]

Sharing raw data in online repositories

Online repositories where data, protocols, and findings can be stored and evaluated by the public seek to improve the integrity and reproducibility of research. Examples of such repositories include the Open Science Framework, Registry of Research Data Repositories, and Psychfiledrawer.org. Sites like Open Science Framework offer badges for using open science practices in an effort to incentivize scientists. However, there has been concern that those who are most likely to provide their data and code for analyses are the researchers that are likely the most sophisticated.^[84] John Ioannidis at Stanford University suggested that "the paradox may arise that the most meticulous and sophisticated and method-savvy and careful researchers may become more susceptible to criticism and reputation attacks by reanalyzers who hunt for errors, no matter how negligible these errors are".^[84]

Funding for replication studies

In July 2016 the Netherlands Organisation for Scientific Research made 3 million Euros available for replication studies. The funding is for replication based on reanalysis of existing data and replication by collecting and analysing new data. Funding is available in the areas of social sciences, health research and healthcare innovation.^[85]

In 2013 the Laura and John Arnold Foundation funded the launch of The Center for Open Science with a $5.25 million grant and by 2017 had provided an additional $10 million in funding.^[86] It also funded the launch of the Meta-Research Innovation Center at Stanford at Stanford University run by John Ioannidis and Steven Goodman to study ways to improve scientific research.^[86] It also provided funding for the AllTrials initiative led in part by Ben Goldacre.^[86]

Emphasize triangulation, not just replication

Marcus R. Munafò and George Davey Smith argue, in a piece published by Nature, that research should emphasize triangulation, not just replication. They claim that,

"replication alone will get us only so far (and) might actually make matters worse... We believe that an essential protection against flawed ideas is triangulation. This is the strategic use of multiple approaches to address one question. Each approach has its own unrelated assumptions, strengths and weaknesses. Results that agree across different methodologies are less likely to be artifacts.... Maybe one reason replication has captured so much interest is the often-repeated idea that falsification is at the heart of the scientific enterprise. This idea was popularized by Karl Popper's 1950s maxim that theories can never be proved, only falsified. (Yet) an overemphasis on repeating experiments could provide an unfounded sense of certainty about findings that rely on a single approach.... philosophers of science have moved on since Popper. Better descriptions of how scientists actually work include what epistemologist Peter Lipton called in 1991 "inference to the best explanation" (instead.)"^[87]

Falsifiability

From Wikipedia, the free encyclopedia

"All swans are white" can be proven false and is hence a falsifiable statement, since evidence of black swans proves it to be false, and such evidence can be provided. Were the statement true, however, it would be difficult to prove true.

A statement, hypothesis, or theory has falsifiability (or is said to be falsifiable) if one can conceive an empirical observation or experiment which could refute it, that is, show it to be false. For example, the claim "all swans are white" is falsifiable since it could be refuted by observing a single swan that is not white. The concept is also known by the terms refutable and refutability.

The concept was introduced by the philosopher of science Karl Popper, in his exposition of scientific epistemology. He saw falsifiability as the criterion for demarcating the limits of scientific inquiry. He proposed that statements and theories that are not falsifiable are unscientific. Declaring an unfalsifiable theory to be scientific would then be pseudoscience.^[1] ^[2]

Popper excluded refutation by logical argument because he considers consistency a prerequisite so necessary that without it it is useless to add falsification as a further condition.^[3]

Overview

The classical view of the philosophy of science is that it is the goal of science to prove hypotheses like "All swans are white" or to induce them from observational data. Popper argued that this would require the inference of a general rule from a number of individual cases, which is inadmissible in deductive logic.^[4] However, if one finds one single swan that is not white, deductive logic admits the conclusion that the statement that all swans are white is false. Falsificationism thus strives for questioning, for falsification, of hypotheses instead of proving them.

For a statement to be questioned using observation, it needs to be at least theoretically possible that it can come into conflict with observation. A key observation of falsificationism is thus that a criterion of demarcation is needed to distinguish those statements that can come into conflict with observation and those that cannot. Popper chose falsifiability as the name of this criterion.

My proposal is based upon an asymmetry between verifiability and falsifiability; an asymmetry which results from the logical form of universal statements. For these are never derivable from singular statements, but can be contradicted by singular statements.

— Karl Popper, Popper 1959. p 19

Popper stressed that unfalsifiable statements are important in science.^[5] Contrary to intuition, unfalsifiable statements can be embedded in—and deductively entailed by—falsifiable theories. For example, while "all men are mortal" is unfalsifiable, it is a logical consequence of the falsifiable theory that "all men die 150 years after their birth at the latest". ^[6] Similarly, the ancient metaphysical and unfalsifiable idea of the existence of atoms has led to corresponding falsifiable modern theories. Popper invented the notion of metaphysical research programs to name such unfalsifiable ideas.^[7] In contrast to Positivism, which held that statements are meaningless if they cannot be verified or falsified, Popper claimed that falsifiability is merely a special case of the more general notion of criticizability, even though he admitted that empirical refutation is one of the most effective methods by which theories can be criticized. Criticizability, in contrast to falsifiability, and thus rationality, may be comprehensive (i.e., have no logical limits), though this claim is controversial, even among proponents of Popper's philosophy and critical rationalism.

Naïve falsification

Two types of statements: observational and categorical

In work beginning in the 1930s, Popper gave falsifiability a renewed emphasis as a criterion of empirical statements in science. Popper noticed that two types of statements are of particular value to scientists:^[8]

The first are statements of observations, such as "there is a white swan". Logicians call these statements singular existential statements, since they assert the existence of some particular thing. They are equivalent to a predicate calculus statement of the form: There exists an x such that x is a swan, and x is white.

The second are statements that categorize all instances of something, such as "all swans are white". Logicians call these statements universal. They are usually parsed in the form: For all x, if x is a swan, then x is white. Scientific laws are commonly supposed to be of this type. One difficult question in the methodology of science is: How does one move from observations to laws? How can one validly infer a universal statement from any number of existential statements?

Inductivist methodology supposed that one can somehow move from a series of singular existential statements to a universal statement. That is, that one can move from 'this is a white swan', 'that is a white swan', and so on, to a universal statement such as 'all swans are white'. This method is clearly deductively invalid, since it is always possible that there may be a non-white swan that has eluded observation (and, in fact, the discovery of the Australian black swan demonstrated the deductive invalidity of this particular statement).

Inductive categorical inference

Popper held that science could not be grounded on such an inferential basis. He proposed falsification as a solution to the problem of induction. Popper noticed that although a singular existential statement such as 'there is a white swan' cannot be used to affirm a universal statement, it can be used to show that one is false: the singular existential observation of a black swan serves to show that the universal statement 'all swans are white' is false—in logic this is called modus tollens. 'There is a black swan' implies 'there is a non-white swan', which, in turn, implies 'there is something that is a swan and that is not white', hence 'all swans are white' is false, because that is the same as 'there is nothing that is a swan and that is not white'.

One notices a white swan. From this one can conclude:

At least one swan is white.

From this, one may wish to conjecture:

All swans are white.

It is impractical to observe all the swans in the world to verify that they are all white.

Even so, the statement all swans are white is testable by being falsifiable. For, if in testing many swans, the researcher finds a single black swan, then the statement all swans are white would be falsified by the counterexample of the single black swan.

Deductive falsification

Deductive falsification is different from an absence of verification. The falsification of statements occurs through modus tollens, via some observation. Suppose some universal statement U forbids some observation O:

U\rightarrow \neg O

Observation O, however, is made:

\ \ O

So by modus tollens,

\neg U

Although the logic of naïve falsification is valid, it is rather limited. Nearly any statement can be made to fit the data, so long as one makes the requisite 'compensatory adjustments'. Popper drew attention to these limitations in The Logic of Scientific Discovery in response to criticism from Pierre Duhem. W. V. Quine expounded this argument in detail, calling it confirmation holism. To logically falsify a universal, one must find a true falsifying singular statement. But Popper pointed out that it is always possible to change the universal statement or the existential statement so that falsification does not occur^[9]. On hearing that a black swan has been observed in Australia, one might introduce the ad hoc hypothesis, 'all swans are white except those found in Australia'; or one might adopt another, more cynical view about some observers, 'Australian bird watchers are incompetent'.

Thus, naïve falsification ought to, but does not, supply a way of handling competing hypotheses for many subject controversies (for instance conspiracy theories and urban legends). People arguing that there is no support for such an observation may argue that there is nothing to see, that all is normal, or that the differences or appearances are too small to be statistically significant. On the other side are those who concede that an observation has occurred and that a universal statement has been falsified as a consequence. Therefore, naïve falsification does not enable scientists, who rely on objective criteria, to present a definitive falsification of universal statements.

Falsificationism

Naïve falsificationism is an unsuccessful attempt to prescribe a rationally unavoidable method for science. Sophisticated methodological falsification, on the other hand, is a prescription of a way in which scientists ought to behave as a matter of choice. The object of this is to arrive at an incremental process whereby theories become less bad.

Naïve falsification considers scientific statements individually. Scientific theories are formed from groups of these sorts of statements, and it is these groups that must be accepted or rejected by scientists. Scientific theories can always be defended by the addition of ad hoc hypotheses. As Popper put it, a decision is required on the part of the scientist to accept or reject the statements that go to make up a theory or that might falsify it. At some point, the weight of the ad hoc hypotheses and disregarded falsifying observations will become so great that it becomes unreasonable to support the base theory any longer, and a decision will be made to reject it.

In place of naïve falsification, Popper envisioned science as progressing by the successive rejection of falsified theories, rather than falsified statements. Falsified theories are to be replaced by theories that can account for the phenomena that falsified the prior theory, that is, with greater explanatory power. For example, Aristotelian mechanics explained observations of everyday situations, but were falsified by Galileo's experiments, and were replaced by Newtonian mechanics, which accounted for the phenomena noted by Galileo (and others). Newtonian mechanics' reach included the observed motion of the planets and the mechanics of gases. The Youngian wave theory of light (i.e., waves carried by the luminiferous aether) replaced Newton's (and many of the Classical Greeks') particles of light but in turn was falsified by the Michelson-Morley experiment and was superseded by Maxwell's electrodynamics and Einstein's special relativity, which did account for the newly observed phenomena. Furthermore, Newtonian mechanics applied to the atomic scale was replaced with quantum mechanics, when the old theory could not provide an answer to the ultraviolet catastrophe, the Gibbs paradox, or how electron orbits could exist without the particles radiating away their energy and spiraling towards the centre. Thus the new theory had to posit the existence of unintuitive concepts such as energy levels, quanta and Heisenberg's uncertainty principle.

At each stage, experimental observation made a theory untenable (i.e., falsified it) and a new theory was found that had greater explanatory power (i.e., could account for the previously unexplained phenomena), and as a result, provided greater opportunity for its own falsification.

Criterion of demarcation

Popper uses falsification as a criterion of demarcation to draw a sharp line between those theories that are scientific and those that are unscientific. It is useful to know if a statement or theory is falsifiable, if for no other reason than that it provides us with an understanding of the ways in which one might assess the theory. One might at the least be saved from attempting to falsify a non-falsifiable theory, or come to see an unfalsifiable theory as unsupportable. Popper claimed that, if a theory is falsifiable, then it is scientific.

The Popperian criterion excludes from the domain of science not unfalsifiable statements but only whole theories that contain no falsifiable statements; thus it leaves us with the Duhemian problem of what constitutes a 'whole theory' as well as the problem of what makes a statement 'meaningful'. Popper's own falsificationism, thus, is not only an alternative to verificationism, it is also an acknowledgement of the conceptual distinction that previous theories had ignored.

Verificationism

In the philosophy of science, verificationism (also known as the verifiability theory of meaning) holds that a statement must, in principle, be empirically verifiable in order that it be both meaningful and scientific. This was an essential feature of the logical positivism of the so-called Vienna Circle that included such philosophers as Moritz Schlick, Rudolf Carnap, Otto Neurath, the Berlin philosopher Hans Reichenbach, and the logical empiricism of A.J. Ayer. Popper noticed that the philosophers of the Vienna Circle had mixed two different problems, that of meaning and that of demarcation, and had proposed in verificationism a single solution to both. In opposition to this view, Popper emphasized that there are meaningful theories that are not scientific, and that, accordingly, a criterion of meaningfulness does not coincide with a criterion of demarcation.

Thus, Popper urged that verifiability be replaced with falsifiability as the criterion of demarcation. On the other hand, he strictly opposed the view that non-falsifiable statements are meaningless or otherwise inherently bad, and noted that falsificationism is only concerned with meaningful statements.^[10]

Use in courts of law

Judge William Overton used falsifiability in the McLean v. Arkansas ruling in 1982 as one of the criteria to determine that "creation science" was not scientific and should not be taught in Arkansas public schools as such (it can be taught as religion). The argument was presented by philosopher Michael Ruse, who defined the characteristics which constitute science as explanatory, testable, and tentative; the latter of the three being another term for falsifiability.^[11] In his conclusion related to this criterion Judge Overton stated that "[w]hile anybody is free to approach a scientific inquiry in any fashion they choose, they cannot properly describe the methodology as scientific, if they start with the conclusion and refuse to change it regardless of the evidence developed during the course of the investigation."^[12]

The Daubert standard set forth in the United States Supreme Court decision Daubert v. Merrell Dow Pharmaceuticals, Inc. suggests that when determining whether scientific evidence is admissible, one of five factors that the U.S. federal courts should consider is "whether the theory or technique in question can be and has been tested."^[13] Some commentators have suggested that "inquiring into the existence of meaningful attempts at falsification is an appropriate and crucial consideration in admissibility determinations" but that some courts have misconstrued Daubert by accepting "the abstract possibility of falsifiability" as sufficient, rather than requiring "actual corroboration" through empirical testing.^[14]

Criticisms

Contemporary philosophers

Many contemporary philosophers of science and analytic philosophers are strongly critical of Popper's philosophy of science.^[15] Popper's mistrust of inductive reasoning has led to claims that he misrepresents scientific practice.

Bartley in 1978 claimed,^[16]

Sir Karl Popper is not really a participant in the contemporary professional philosophical dialogue; quite the contrary, he has ruined that dialogue. If he is on the right track, then the majority of professional philosophers the world over have wasted or are wasting their intellectual careers. The gulf between Popper's way of doing philosophy and that of the bulk of contemporary professional philosophers is as great as that between astronomy and astrology."

— W. W. Bartley in Philosophia 6 1976

Rafe Champion said,^[17]

"Popper's ideas have failed to convince the majority of professional philosophers because his theory of conjectural knowledge does not even pretend to provide positively justified foundations of belief. Nobody else does better, but they keep trying, like chemists still in search of the Philosopher's Stone or physicists trying to build perpetual motion machines."

— Rafe Champion "Agreeing to Disagree: Bartley's Critique of Reason" 1985

David Miller,^[18]

What distinguishes science from all other human endeavours is that the accounts of the world that our best, mature sciences deliver are strongly supported by evidence and this evidence gives us the strongest reason to believe them.' That anyway is what is said at the beginning of the advertisement for a recent conference on induction at a celebrated seat of learning in the UK. It shows how much critical rationalists still have to do to make known the message of Logik der Forschung concerning what empirical evidence is able to do and what it does."

— David Miller "Some hard questions for critical rationalism" 2011

Kuhn and Lakatos

Whereas Popper was concerned in the main with the logic of science, Thomas Kuhn's influential book The Structure of Scientific Revolutions examined in detail the history of science. Kuhn argued that scientists work within a conceptual paradigm that strongly influences the way in which they see data. Scientists will go to great length to defend their paradigm against falsification, by the addition of ad hoc hypotheses to existing theories. Changing a 'paradigm' is difficult, as it requires an individual scientist to break with his or her peers and defend a heterodox theory.

Some falsificationists saw Kuhn's work as a vindication, since it provided historical evidence that science progressed by rejecting inadequate theories, and that it is the decision, on the part of the scientist, to accept or reject a theory that is the crucial element of falsificationism. Foremost amongst these was Imre Lakatos.

Lakatos attempted to explain Kuhn's work by arguing that science progresses by the falsification of research programs rather than the more specific universal statements of naïve falsification. In Lakatos' approach, a scientist works within a research program that corresponds roughly with Kuhn's 'paradigm'. Whereas Popper rejected the use of ad hoc hypotheses as unscientific, Lakatos accepted their place in the development of new theories.^[19]

Feyerabend

Paul Feyerabend examined the history of science with a more critical eye, and ultimately rejected any prescriptive methodology at all. He rejected Lakatos' argument for ad hoc hypothesis, arguing that science would not have progressed without making use of any and all available methods to support new theories. He rejected any reliance on a scientific method, along with any special authority for science that might derive from such a method. Rather, he claimed that if one is keen to have a universally valid methodological rule, epistemological anarchism or anything goes would be the only candidate. For Feyerabend, any special status that science might have derives from the social and physical value of the results of science rather than its method.^{[citation needed]}

Sokal and Bricmont

In their book Fashionable Nonsense (published in the UK as Intellectual Impostures) the physicists Alan Sokal and Jean Bricmont criticized falsifiability on the grounds that it does not accurately describe the way science really works. They argue that theories are used because of their successes, not because of the failures of other theories. Their discussion of Popper, falsifiability and the philosophy of science comes in a chapter entitled "Intermezzo," which contains an attempt to make clear their own views of what constitutes truth, in contrast with the extreme epistemological relativism of postmodernism.

Sokal and Bricmont write, "When a theory successfully withstands an attempt at falsification, a scientist will, quite naturally, consider the theory to be partially confirmed and will accord it a greater likelihood or a higher subjective probability. ... But Popper will have none of this: throughout his life he was a stubborn opponent of any idea of 'confirmation' of a theory, or even of its 'probability'. ... [but] the history of science teaches us that scientific theories come to be accepted above all because of their successes." (Sokal and Bricmont 1997, 62f)

They further argue that falsifiability cannot distinguish between astrology and astronomy, as both make technical predictions that are sometimes incorrect.

David Miller, a contemporary philosopher of critical rationalism, has attempted to defend Popper against these claims.^[20] Miller argues that astrology does not lay itself open to falsification, while astronomy does, and this is the litmus test for science.

Economics

Karl Popper argued that Marxism shifted from falsifiable to unfalsifiable.^[21]

Some economists, such as those of the Austrian School, believe that macroeconomics is empirically unfalsifiable and that thus the only appropriate means to understand economic events is by logically studying the intentions of individual economic decision-makers, based on certain fundamental truths.^[22]^[23] Prominent figures within the Austrian School of economics Ludwig von Mises and Friedrich Hayek were associates of Karl Popper's, with whom they co-founded the Mont Pelerin Society.

Evolution

Numerous examples of potential (indirect) ways to falsify common descent have been proposed by its proponents. J.B.S. Haldane, when asked what hypothetical evidence could disprove evolution, replied "fossil rabbits in the Precambrian era".^[24] Richard Dawkins adds that any other modern animal, such as a hippo, would suffice.^[25]^[26]^[27] Karl Popper at first spoke against the testability of natural selection^[28]^[29] but recanted, "I have changed my mind about the testability and logical status of the theory of natural selection, and I am glad to have the opportunity to make a recantation."^[30]

Young-earth creationism

Much of the criticism against young-Earth creationism is based on evidence in nature that the Earth is much older than adherents believe. Confronting such evidence, some adherents make an argument (called the Omphalos hypothesis) that the world was created with the appearance of age; e.g., the sudden appearance of a mature chicken capable of laying eggs. This hypothesis is non-falsifiable since no evidence about the age of the earth (or any astronomical feature) can be shown not to be fabricated during creation.

Historicism

Theories of history or politics that allegedly predict future events have a logical form that renders them neither falsifiable nor verifiable. They claim that for every historically significant event, there exists an historical or economic law that determines the way in which events proceeded. Failure to identify the law does not mean that it does not exist, yet an event that satisfies the law does not prove the general case. Evaluation of such claims is at best difficult. On this basis, Popper "fundamentally criticized historicism in the sense of any preordained prediction of history"^[31] and argued that neither Marxism nor psychoanalysis was science,^[31] although both made such claims. Again, this does not mean that any of these types of theories is necessarily incorrect. Popper considered falsifiability a test of whether theories are scientific, not of whether propositions that they contain or support are true.

Mathematics

Many philosophers^{[weasel words]} believe that mathematics is not experimentally falsifiable, and thus not a science according to the definition of Karl Popper.^[32] However, in the 1930s Gödel's incompleteness theorems proved that there does not exist a set of axioms for mathematics which is both complete and consistent.^[33] Karl Popper concluded that "most mathematical theories are, like those of physics and biology, hypothetico-deductive: pure mathematics therefore turns out to be much closer to the natural sciences whose hypotheses are conjectures, than it seemed even recently."^[34] Other thinkers, notably Imre Lakatos, have applied a version of falsificationism to mathematics itself.

Like all formal sciences, mathematics is not concerned with the validity of theories based on observations in the empirical world, but rather, mathematics is occupied with the theoretical, abstract study of such topics as quantity, structure, space and change. Methods of the mathematical sciences are, however, applied in constructing and testing scientific models dealing with observable reality. Albert Einstein wrote, "One reason why mathematics enjoys special esteem, above all other sciences, is that its laws are absolutely certain and indisputable, while those of other sciences are to some extent debatable and in constant danger of being overthrown by newly discovered facts."^[35]

Quotations

Albert Einstein is reported to have said something that can be paraphrased into: No amount of experimentation can ever prove me right; a single experiment can prove me wrong.^[36]^[37]^[38]
Popper said in Conjectures and Refutations^[39],

"... the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability."

— Popper

Search This Blog

Thursday, May 24, 2018

Publication bias

Definition

Evidence

Impact on meta-analysis

Compensation examples

Risks

Remedies

Study registration

Replication crisis

Scope of the crisis

Overall

In medicine

In psychology

Psychology replication rates

A disciplinary social dilemma

"Methodological terrorism" controversy

In marketing

In economics

In sports science

Causes of the crisis

Addressing the replication crisis

Tackling publication bias with pre-registration of studies

Emphasizing replication attempts in teaching

Reducing the p-value required for claiming significance of new results

Addressing the misinterpretation of p-values

Encouraging use of larger sample sizes

Sharing raw data in online repositories

Funding for replication studies

Emphasize triangulation, not just replication

Falsifiability

Overview

Naïve falsification

Two types of statements: observational and categorical

Inductive categorical inference

Deductive falsification

Falsificationism

Criterion of demarcation

Verificationism

Use in courts of law

Criticisms

Contemporary philosophers

Kuhn and Lakatos

Feyerabend

Sokal and Bricmont

Economics

Evolution

Young-earth creationism

Historicism

Mathematics

Quotations

Mandatory Palestine