Search This Blog

Sunday, April 11, 2021

Invalid science

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Invalid_science

Invalid science consists of scientific claims based on experiments that cannot be reproduced or that are contradicted by experiments that can be reproduced. Recent analyses indicate that the proportion of retracted claims in the scientific literature is steadily increasing. The number of retractions has grown tenfold over the past decade, but they still make up approximately 0.2% of the 1.4m papers published annually in scholarly journals.

The U.S. Office of Research Integrity (ORI), investigates scientific misconduct.

Incidence

Science magazine ranked first for the number of articles retracted at 70, just edging out PNAS, which retracted 69. Thirty-two of Science's retractions were due to fraud or suspected fraud, and 37 to error. A subsequent "retraction index" indicated that journals with relatively high impact factors, such as Science, Nature and Cell, had a higher rate of retractions. Under 0.1% of papers in PubMed had were retracted of more than 25 million papers going back to the 1940s.

The fraction of retracted papers due to scientific misconduct was estimated at two-thirds, according to studies of 2047 papers published since 1977. Misconducted included fraud and plagiarism. Another one-fifth were retracted because of mistakes, and the rest were pulled for unknown or other reasons.

A separate study analyzed 432 claims of genetic links for various health risks that vary between men and women. Only one of these claims proved to be consistently reproducible. Another meta review, found that of the 49 most-cited clinical research studies published between 1990 and 2003, more than 40 percent of them were later shown to be either totally wrong or significantly incorrect.

Biological sciences

In 2012 biotech firm Amgen was able to reproduce just six of 53 important studies in cancer research. Earlier, a group at Bayer, a drug company, successfully repeated only one fourth of 67 important papers. In 2000-10 roughly 80,000 patients took part in clinical trials based on research that was later retracted because of mistakes or improprieties.

Paleontology

Nathan Mhyrvold failed repeatedly to replicate the findings of several papers on dinosaur growth. Dinosaurs added a layer to their bones each year. Tyrannosaurus rex was thought to have increased in size by more than 700 kg a year, until Mhyrvold showed that this was a factor of 2 too large. In 4 of 12 papers he examined, the original data had been lost. In three, the statistics were correct, while three had serious errors that invalidated their conclusions. Two papers mistakenly relied on data from these three. He discovered that some of the paper's graphs did not reflect the data. In one case, he found that only four of nine points on the graph came from data cited in the paper.

Major retractions

Torcetrapib was originally hyped as a drug that could block a protein that converts HDL cholesterol into LDL with the potential to "redefine cardiovascular treatment". One clinical trial showed that the drug could increase HDL and decrease LDL. Two days after Pfizer announced its plans for the drug, it ended the Phase III clinical trial due to higher rates of chest pain and heart failure and a 60 percent increase in overall mortality. Pfizer had invested more than $1 billion in developing the drug.

An in-depth review of the most highly cited biomarkers (whose presence are used to infer illness and measure treatment effects) claimed that 83 percent of supposed correlations became significantly weaker in subsequent studies. Homocysteine is an amino acid whose levels correlated with heart disease. However, a 2010 study showed that lowering homocysteine by nearly 30 percent had no effect on heart attack or stroke.

Priming

Priming studies claim that decisions can be influenced by apparently irrelevant events that a subject witnesses just before making a choice. Nobel Prize-winner Daniel Kahneman alleges that much of it is poorly founded. Researchers have been unable to replicate some of the more widely cited examples. A paper in PLoS ONE reported that nine separate experiments could not reproduce a study purporting to show that thinking about a professor before taking an intelligence test leads to a higher score than imagining a football hooligan. A further systematic replication involving 40 different labs around the world did not replicate the main finding. However, this latter systematic replication showed that participants who did not think there was a relation between thinking about a hooligan or a professor where significantly more susceptible to the priming manipulation.

Potential causes

Competition

In the 1950s, when academic research accelerated during the cold war, the total number of scientists was a few hundred thousand. In the new century 6m-7m researchers are active. The number of research jobs has not matched this increase. Every year six new PhDs compete for every academic post. Replicating other researcher’s results is not perceived to be valuable. The struggle to compete encourages exaggeration of findings and biased data selection. A recent survey found that one in three researchers knows of a colleague who has at least somewhat distorted their results.

Publication bias

Major journals reject in excess of 90% of submitted manuscripts and tend to favor the most dramatic claims. The statistical measures that researchers use to test their claims allow a fraction of false claims to appear valid. Invalid claims are more likely to be dramatic (because they are false.) Without replication, such errors are less likely to be caught.

Conversely, failures to prove a hypothesis are rarely even offered for publication. “Negative results” now account for only 14% of published papers, down from 30% in 1990. Knowledge of what is not true is as important as of what is true.

Peer review

Peer review is the primary validation technique employed by scientific publications. However, a prominent medical journal tested the system and found major failings. It supplied research with induced errors and found that most reviewers failed to spot the mistakes, even after being told of the tests.

A pseudonymous fabricated paper on the effects of a chemical derived from lichen on cancer cells was submitted to 304 journals for peer review. The paper was filled with errors of study design, analysis and interpretation. 157 lower-rated journals accepted it. Another study sent an article containing eight deliberate mistakes in study design, analysis and interpretation to more than 200 of the British Medical Journal’s regular reviewers. On average, they reported fewer than two of the problems.

Peer reviewers typically do not re-analyse data from scratch, checking only that the authors’ analysis is properly conceived.

Statistics

Type I and type II errors

Scientists divide errors into type I, incorrectly asserting the truth of a hypothesis (false positive) and type II, rejecting a correct hypothesis (false negative). Statistical checks assess the probability that data which seem to support a hypothesis come about simply by chance. If the probability is less than 5%, the evidence is rated “statistically significant”. One definitional consequence is a type one error rate of one in 20.

Statistical power

In 2005 Stanford epidemiologist John Ioannidis showed that the idea that only one paper in 20 gives a false-positive result was incorrect. He claimed, “most published research findings are probably false.” He found three categories of problems: insufficient “statistical power” (avoiding type II errors); the unlikeliness of the hypothesis; and publication bias favoring novel claims.

A statistically powerful study identifies factors with only small effects on data. In general studies with more repetitions that run the experiment more times on more subjects have greater power. A power of 0.8 means that of ten true hypotheses tested, the effects of two are missed. Ioannidis found that in neuroscience the typical statistical power is 0.21; another study found that psychology studies average 0.35.

Unlikeliness is a measure of the degree of surprise in a result. Scientists prefer surprising results, leading them to test hypotheses that are unlikely to very unlikely. Ioannidis claimed that in epidemiology, some one in ten hypotheses should be true. In exploratory disciplines like genomics, which rely on examining voluminous data about genes and proteins, only one in a thousand should prove correct.

In a discipline in which 100 out of 1,000 hypotheses are true, studies with a power of 0.8 will find 80 and miss 20. Of the 900 incorrect hypotheses, 5% or 45 will be accepted because of type I errors. Adding the 45 false positives to the 80 true positives gives 125 positive results, or 36% specious. Dropping statistical power to 0.4, optimistic for many fields, would still produce 45 false positives but only 40 true positives, less than half.

Negative results are more reliable. Statistical power of 0.8 produces 875 negative results of which only 20 are false, giving an accuracy of over 97%. Negative results however account for a minority of published results, varying by discipline. A study of 4,600 papers found that the proportion of published negative results dropped from 30% to 14% between 1990 and 2007.

Subatomic physics sets an acceptable false-positive rate of one in 3.5m (known as the five-sigma standard). However, even this does not provide perfect protection. The problem invalidates some 3/4s of machine learning studies according to one review.

Statistical significance

Statistical significance is a measure for testing statistical correlation. It was invented by English mathematician Ronald Fisher in the 1920s. It defines a “significant” result as any data point that would be produced by chance less than 5 (or more stringently, 1) percent of the time. A significant result is widely seen as an important indicator that the correlation is not random.

While correlations track the relationship between truly independent measurements, such as smoking and cancer, they are much less effective when variables cannot be isolated, a common circumstance in biological systems. For example, statistics found a high correlation between lower back pain and abnormalities in spinal discs, although it was later discovered that serious abnormalities were present in two-thirds of pain-free patients.

Minimum threshold publishers

Journals such as PLoS One use a “minimal-threshold” standard, seeking to publish as much science as possible, rather than to pick out the best work. Their peer reviewers assess only whether a paper is methodologically sound. Almost half of their submissions are still rejected on that basis.

Unpublished research

Only 22% of the clinical trials financed by the National Institutes of Health (NIH) released summary results within one year of completion, even though the NIH requires it. Fewer than half published within 30 months; a third remained unpublished after 51 months. When other scientists rely on invalid research, they may waste time on lines of research that are themselves invalid. The failure to report failures means that researchers waste money and effort exploring blind alleys already investigated by other scientists.

Fraud

In 21 surveys of academics (mostly in the biomedical sciences but also in civil engineering, chemistry and economics) carried out between 1987 and 2008, 2% admitted fabricating data, but 28% claimed to know of colleagues who engaged in questionable research practices.

Lack of access to data and software

Clinical trials are generally too costly to rerun. Access to trial data is the only practical approach to reassessment. A campaign to persuade pharmaceutical firms to make all trial data available won its first convert in February 2013 when GlaxoSmithKline became the first to agree.

Software used in a trial is generally considered to be proprietary intellectual property and is not available to replicators, further complicating matters. Journals that insist on data-sharing tend not to do the same for software.

Even well-written papers may not include sufficient detail and/or tacit knowledge (subtle skills and extemporisations not considered notable) for the replication to succeed. One cause of replication failure is insufficient control of the protocol, which can cause disputes between the original and replicating researchers.

Reform

Statistics training

Geneticists have begun more careful reviews, particularly of the use of statistical techniques. The effect was to stop a flood of specious results from genome sequencing.

Protocol registration

Registering research protocols in advance and monitoring them over the course of a study can prevent researchers from modifying the protocol midstream to highlight preferred results. Providing raw data for other researchers to inspect and test can also better hold researchers to account.

Post-publication review

Replacing peer review with post-publication evaluations can encourage researchers to think more about the long-term consequences of excessive or unsubstantiated claims. That system was adopted in physics and mathematics with good results.

Replication

Few researchers, especially junior workers, seek opportunities to replicate others' work, partly to protect relationships with senior researchers.

Reproduction benefits from access to the original study's methods and data. More than half of 238 biomedical papers published in 84 journals failed to identify all the resources (such as chemical reagents) necessary to reproduce the results. In 2008 some 60% of researchers said they would share raw data; in 2013 just 45% do. Journals have begun to demand that at least some raw data be made available, although only 143 of 351 randomly selected papers covered by some data-sharing policy actually complied.

The Reproducibility Initiative is a service allowing life scientists to pay to have their work validated by an independent lab. In October 2013 the initiative received funding to review 50 of the highest-impact cancer findings published between 2010 and 2012. Blog Syn is a website run by graduate students that is dedicated to reproducing chemical reactions reported in papers.

In 2013 replication efforts received greater attention. Nature and related publications introduced an 18-point checklist for life science authors in May, in its effort to ensure that its published research can be reproduced. Expanded "methods" sections and all data were to be available online. The Centre for Open Science opened as an independent laboratory focused on replication. The journal Perspectives on Psychological Science announced a section devoted to replications. Another project announced plans to replicate 100 studies published in the first three months of 2008 in three leading psychology journals.

Major funders, including the European Research Council, the US National Science Foundation and Research Councils UK have not changed their preference for new work over replications.

 

Publication bias

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Publication_bias

Publication bias is a type of bias that occurs in published academic research. It occurs when the outcome of an experiment or research study influences the decision whether to publish or otherwise distribute it. Publishing only results that show a significant finding disturbs the balance of findings, and inserts bias in favor of positive results. The study of publication bias is an important topic in metascience.

Studies with significant results can be of the same standard as studies with a null result with respect to quality of execution and design. However, statistically significant results are three times more likely to be published than papers with null results. A consequence of this is that researchers are unduly motivated to manipulate their practices to ensure that a statistically significant result is reported.

Multiple factors contribute to publication bias. For instance, once a scientific finding is well established, it may become newsworthy to publish reliable papers that fail to reject the null hypothesis. It has been found that the most common reason for non-publication is simply that investigators decline to submit results, leading to non-response bias. Factors cited as underlying this effect include investigators assuming they must have made a mistake, failure to support a known finding, loss of interest in the topic, or anticipation that others will be uninterested in the null results. The nature of these issues and the problems that have been triggered, have been referred to as the 5 diseases that threaten science, which include: "significosis, an inordinate focus on statistically significant results; neophilia, an excessive appreciation for novelty; theorrhea, a mania for new theory; arigorium, a deficiency of rigor in theoretical and empirical work; and finally, disjunctivitis, a proclivity to produce large quantities of redundant, trivial, and incoherent works."

Attempts to identify unpublished studies often prove difficult or are unsatisfactory. In an effort to combat this problem, some journals require that studies submitted for publication are pre-registered (registering a study prior to collection of data and analysis) with organizations like the Center for Open Science.

Other proposed strategies to detect and control for publication bias include p-curve analysis and disfavoring small and non-randomised studies because of their demonstrated high susceptibility to error and bias.

Definition

Publication bias occurs when the publication of research results depends not just on the quality of the research but also on the hypothesis tested, and the significance and direction of effects detected. The subject was first discussed in 1959 by statistician Theodore Sterling to refer to fields in which "successful" research is more likely to be published. As a result, "the literature of such a field consists in substantial part of false conclusions resulting from errors of the first kind in statistical tests of significance". In the worst case, false conclusions could canonize as being true if the publication rate of negative results is too low.

Publication bias is sometimes called the file-drawer effect, or file-drawer problem. This term suggests that results not supporting the hypotheses of researchers often go no further than the researchers' file drawers, leading to a bias in published research. The term "file drawer problem" was coined by psychologist Robert Rosenthal in 1979.

Positive-results bias, a type of publication bias, occurs when authors are more likely to submit, or editors are more likely to accept, positive results than negative or inconclusive results. Outcome reporting bias occurs when multiple outcomes are measured and analyzed, but the reporting of these outcomes is dependent on the strength and direction of its results. A generic term coined to describe these post-hoc choices is HARKing ("Hypothesizing After the Results are Known").

Evidence

Meta-analysis of stereotype threat on girls' math scores showing asymmetry typical of publication bias. From Flore, P. C., & Wicherts, J. M. (2015)

There is extensive meta-research on publication bias in the biomedical field. Investigators following clinical trials from the submission of their protocols to ethics committees (or regulatory authorities) until the publication of their results observed that those with positive results are more likely to be published. In addition, studies often fail to report negative results when published, as demonstrated by research comparing study protocols with published articles.

The presence of publication bias was investigated in meta-analyses. The largest such analysis investigated the presence of publication bias in systematic reviews of medical treatments from the Cochrane Library. The study showed that statistically positive significant findings are 27% more likely to be included in meta-analyses of efficacy than other findings. Results showing no evidence of adverse effects have a 78% greater probability of inclusion in safety studies than statistically significant results showing adverse effects. Evidence of publication bias was found in meta-analyses published in prominent medical journals.

Impact on meta-analysis

Where publication bias is present, published studies are no longer a representative sample of the available evidence. This bias distorts the results of meta-analyses and systematic reviews. For example, evidence-based medicine is increasingly reliant on meta-analysis to assess evidence.

Meta-analyses and systematic reviews can account for publication bias by including evidence from unpublished studies and the grey literature. The presence of publication bias can also be explored by constructing a funnel plot in which the estimate of the reported effect size is plotted against a measure of precision or sample size. The premise is that the scatter of points should reflect a funnel shape, indicating that the reporting of effect sizes is not related to their statistical significance. However, when small studies are predominately in one direction (usually the direction of larger effect sizes), asymmetry will ensue and this may be indicative of publication bias.

Because an inevitable degree of subjectivity exists in the interpretation of funnel plots, several tests have been proposed for detecting funnel plot asymmetry. These are often based on linear regression, and may adopt a multiplicative or additive dispersion parameter to adjust for the presence of between-study heterogeneity. Some approaches may even attempt to compensate for the (potential) presence of publication bias, which is particularly useful to explore the potential impact on meta-analysis results.

Compensation examples

Two meta-analyses of the efficacy of reboxetine as an antidepressant demonstrated attempts to detect publication bias in clinical trials. Based on positive trial data, reboxetine was originally passed as a treatment for depression in many countries in Europe and the UK in 2001 (though in practice it is rarely used for this indication). A 2010 meta-analysis concluded that reboxetine was ineffective and that the preponderance of positive-outcome trials reflected publication bias, mostly due to trials published by the drug manufacturer Pfizer. A subsequent meta-analysis published in 2011, based on the original data, found flaws in the 2010 analyses and suggested that the data indicated reboxetine was effective in severe depression. Examples of publication bias are given by Ben Goldacre and Peter Wilmshurst.

In the social sciences, a study of published papers exploring the relationship between corporate social and financial performance found that "in economics, finance, and accounting journals, the average correlations were only about half the magnitude of the findings published in Social Issues Management, Business Ethics, or Business and Society journals".

One example cited as an instance of publication bias is the refusal to publish attempted replications of Bem's work that claimed evidence for precognition by The Journal of Personality and Social Psychology (the original publisher of Bem's article).

An analysis comparing studies of gene-disease associations originating in China to those originating outside China found that those conducted within the country reported a stronger association and a more statistically significant result.

Risks

John Ioannidis argues that "claimed research findings may often be simply accurate measures of the prevailing bias." He lists the following factors as those that make a paper with a positive result more likely to enter the literature and suppress negative-result papers:

  • The studies conducted in a field have small sample sizes.
  • The effect sizes in a field tend to be smaller.
  • There is both a greater number and lesser preselection of tested relationships.
  • There is greater flexibility in designs, definitions, outcomes, and analytical modes.
  • There are prejudices (financial interest, political, or otherwise).
  • The scientific field is hot and there are more scientific teams pursuing publication.

Other factors include experimenter bias and white hat bias.

Remedies

Publication bias can be contained through better-powered studies, enhanced research standards, and careful consideration of true and non-true relationships. Better-powered studies refer to large studies that deliver definitive results or test major concepts and lead to low-bias meta-analysis. Enhanced research standards such as the pre-registration of protocols, the registration of data collections and adherence to established protocols are other techniques. To avoid false-positive results, the experimenter must consider the chances that they are testing a true or non-true relationship. This can be undertaken by properly assessing the false positive report probability based on the statistical power of the test and reconfirming (whenever ethically acceptable) established findings of prior studies known to have minimal bias.

Study registration

In September 2004, editors of prominent medical journals (including the New England Journal of Medicine, The Lancet, Annals of Internal Medicine, and JAMA) announced that they would no longer publish results of drug research sponsored by pharmaceutical companies, unless that research was registered in a public clinical trials registry database from the start. Furthermore, some journals (e.g. Trials), encourage publication of study protocols in their journals.

The World Health Organization (WHO) agreed that basic information about all clinical trials should be registered at the study's inception, and that this information should be publicly accessible through the WHO International Clinical Trials Registry Platform. Additionally, public availability of complete study protocols, alongside reports of trials, is becoming more common for studies.


Grok

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Grok

Grok /ˈɡrɒk/ is a neologism coined by American writer Robert A. Heinlein for his 1961 science fiction novel Stranger in a Strange Land. While the Oxford English Dictionary summarizes the meaning of grok as "to understand intuitively or by empathy, to establish rapport with" and "to empathize or communicate sympathetically (with); also, to experience enjoyment", Heinlein's concept is far more nuanced, with critic Istvan Csicsery-Ronay Jr. observing that "the book's major theme can be seen as an extended definition of the term." The concept of grok garnered significant critical scrutiny in the years after the book's initial publication. The term and aspects of the underlying concept have become part of communities such as computer science.

Descriptions of grok in Stranger in a Strange Land

Critic David E. Wright Sr. points out that in the 1991 "uncut" edition of Stranger, the word grok "was used first without any explicit definition on page 22" and continued to be used without being explicitly defined until page 253 (emphasis in original). He notes that this first intensional definition is simply "to drink", but that this is only a metaphor "much as English 'I see' often means the same as 'I understand'". Critics have bridged this absence of explicit definition by citing passages from Stranger that illustrate the term. A selection of these passages follows:

Grok means "to understand", of course, but Dr. Mahmoud, who might be termed the leading Terran expert on Martians, explains that it also means, "to drink" and "a hundred other English words, words which we think of as antithetical concepts. 'Grok' means all of these. It means 'fear', it means 'love', it means 'hate' – proper hate, for by the Martian 'map' you cannot hate anything unless you grok it, understand it so thoroughly that you merge with it and it merges with you – then you can hate it. By hating yourself. But this implies that you love it, too, and cherish it and would not have it otherwise. Then you can hate – and (I think) Martian hate is an emotion so black that the nearest human equivalent could only be called mild distaste.

Grok means "identically equal". The human cliché "This hurts me worse than it does you" has a distinctly Martian flavor. The Martian seems to know instinctively what we learned painfully from modern physics, that observer acts with observed through the process of observation. Grok means to understand so thoroughly that the observer becomes a part of the observed – to merge, blend, intermarry, lose identity in group experience. It means almost everything that we mean by religion, philosophy, and science and it means as little to us as color does to a blind man.

The Martian Race had encountered the people of the fifth planet, grokked them completely, and had taken action; asteroid ruins were all that remained, save that the Martians continued to praise and cherish the people they had destroyed.

All that groks is God.

Etymology

Robert A. Heinlein originally coined the term grok in his 1961 novel Stranger in a Strange Land as a Martian word that could not be defined in Earthling terms, but can be associated with various literal meanings such as "water", "to drink", "life", or "to live", and had a much more profound figurative meaning that is hard for terrestrial culture to understand because of its assumption of a singular reality.

According to the book, drinking water is a central focus on Mars, where it is scarce. Martians use the merging of their bodies with water as a simple example or symbol of how two entities can combine to create a new reality greater than the sum of its parts. The water becomes part of the drinker, and the drinker part of the water. Both grok each other. Things that once had separate realities become entangled in the same experiences, goals, history, and purpose. Within the book, the statement of divine immanence verbalized among the main characters, "thou art God", is logically derived from the concept inherent in the term grok.

Heinlein describes Martian words as "guttural" and "jarring". Martian speech is described as sounding "like a bullfrog fighting a cat". Accordingly, grok is generally pronounced as a guttural gr terminated by a sharp k with very little or no vowel sound (a narrow IPA transcription might be [ɡɹ̩kʰ]). William Tenn suggests Heinlein in creating the word might have been influenced by Tenn's very similar concept of griggo, earlier introduced in Tenn's story "Venus and the Seven Sexes" (published in 1949). In his later afterword to the story, Tenn says Heinlein considered such influence "very possible".

Adoption and modern usage

In computer programmer culture

Uses of the word in the decades after the 1960s are more concentrated in computer culture, such as a 1984 appearance in InfoWorld: "There isn't any software! Only different internal states of hardware. It's all hardware! It's a shame programmers don't grok that better."

The Jargon File, which describes itself as a "Hacker's Dictionary" and has been published under that name three times, puts grok in a programming context:

When you claim to "grok" some knowledge or technique, you are asserting that you have not merely learned it in a detached instrumental way but that it has become part of you, part of your identity. For example, to say that you "know" Lisp is simply to assert that you can code in it if necessary – but to say you "grok" Lisp is to claim that you have deeply entered the world-view and spirit of the language, with the implication that it has transformed your view of programming. Contrast zen, which is a similar supernatural understanding experienced as a single brief flash.

The entry existed in the very earliest forms of the Jargon File, dating from the early 1980s. A typical tech usage from the Linux Bible, 2005 characterizes the Unix software development philosophy as "one that can make your life a lot simpler once you grok the idea".

The book Perl Best Practices defines grok as understanding a portion of computer code in a profound way. It goes on to suggest that to re-grok code is to reload the intricacies of that portion of code into one's memory after some time has passed and all the details of it are no longer remembered. In that sense, to grok means to load everything into memory for immediate use. It is analogous to the way a processor caches memory for short term use, but the only implication by this reference was that it was something a human (or perhaps a Martian) would do.

The main web page for cURL, an open source tool and programming library, describes the function of cURL as "cURL groks URLs".

The book Cyberia covers its use in this subculture extensively:

This is all latter day usage, the original derivation was from an early text processing utility from so long ago that no one remembers but, grok was the output when it understood the file. K&R would remember.

The keystroke logging software used by the NSA for its remote intelligence gathering operations is named GROK.

One of the most powerful parsing filters used in ElasticSearch software's logstash component is named grok.

A reference book by Carey Bunks on the use of the GNU Image Manipulation Program is titled Grokking the GIMP

In counterculture

Tom Wolfe, in his book The Electric Kool-Aid Acid Test (1968), describes a character's thoughts during an acid trip: "He looks down, two bare legs, a torso rising up at him and like he is just noticing them for the first time ... he has never seen any of this flesh before, this stranger. He groks over that ..."

In his counterculture Volkswagen repair manual, How to Keep Your Volkswagen Alive: A Manual of Step-by-Step Procedures for the Compleat Idiot (1969), dropout aerospace engineer John Muir instructs prospective used VW buyers to "grok the car" before buying.

The word was used numerous times by Robert Anton Wilson in his works The Illuminatus! Trilogy and Schrödinger's Cat Trilogy.

The term inspired actress Mayim Bialik's women's lifestyle site, Grok Nation.

 

Neologism

From Wikipedia, the free encyclopedia

A neologism (/nˈɒləɪzəm/; from Greek νέο- néo-, "new" and λόγος lógos, "speech, utterance") is a relatively recent or isolated term, word, or phrase that may be in the process of entering common use, but that has not yet been fully accepted into mainstream language. Neologisms are often driven by changes in culture and technology. In the process of language formation, neologisms are more mature than protologisms. A word whose development stage is between that of the protologism (freshly coined) and neologism (new word) is a prelogism.

Popular examples of neologisms can be found in science, fiction (notably science fiction), films and television, branding, literature, jargon, cant, linguistic and popular culture.

Examples include laser (1960) from Light Amplification by Stimulated Emission of Radiation; robotics (1941) from Czech writer Karel Čapek's play R.U.R. (Rossum's Universal Robots); and agitprop (1930) (a portmanteau of "agitation" and "propaganda").

Background

Neologisms are often formed by combining existing words (see compound noun and adjective) or by giving words new and unique suffixes or prefixes. Neologisms can also be formed by blending words, for example, "brunch" is a blend of the words "breakfast" and "lunch", or through abbreviation or acronym, by intentionally rhyming with existing words or simply through playing with sounds.

Neologisms can become popular through memetics, through mass media, the Internet, and word of mouth, including academic discourse in many fields renowned for their use of distinctive jargon, and often become accepted parts of the language. Other times, they disappear from common use just as readily as they appeared. Whether a neologism continues as part of the language depends on many factors, probably the most important of which is acceptance by the public. It is unusual for a word to gain popularity if it does not clearly resemble other words.

History and meaning

The term neologism is first attested in English in 1772, borrowed from French néologisme (1734), being called the "neologist-in-chief". In an academic sense, there is no professional Neologist, because the study of such things (cultural or ethnic vernacular, for example) is interdisciplinary. Anyone such as a lexicographer or an etymologist might study neologisms, how their uses span the scope of human expression, and how, due to science and technology, they spread more rapidly than ever before in the present times.

The term neologism has a broader meaning which also includes "a word which has gained a new meaning". Sometimes, the latter process is called semantic shifting, or semantic extension. Neologisms are distinct from a person's idiolect, one's unique patterns of vocabulary, grammar, and pronunciation.

Neologisms are usually introduced when it is found that a specific notion is lacking a term, or when the existing vocabulary lacks detail, or when a speaker is unaware of the existing vocabulary. The law, governmental bodies, and technology have a relatively high frequency of acquiring neologisms. Another trigger that motivates the coining of a neologism is to disambiguate a term which may be unclear due to having many meanings.

Literature

Neologisms may come from a word used in the narrative of fiction such as novels and short stories. Examples include "grok" (to intuitively understand) from the science fiction novel about a Martian, entitled Stranger in a Strange Land by Robert A. Heinlein; "McJob" ( precarious, poorly-paid employment) from Generation X: Tales for an Accelerated Culture by Douglas Coupland; "cyberspace" (widespread, interconnected digital technology) from Neuromancer by William Gibson and "quark" (Slavic slang for "rubbish"; German for a type of dairy product) from James Joyce's Finnegans Wake.

The title of a book may become a neologism, for instance, Catch-22 (from the title of Joseph Heller's novel). Alternatively, the author's name may give rise to the neologism, although the term is sometimes based on only one work of that author. This includes such words as "Orwellian" (from George Orwell, referring to his dystopian novel Nineteen Eighty-Four) and "Kafkaesque" (from Franz Kafka), which refers to arbitrary, complex bureaucratic systems.

Names of famous characters are another source of literary neologisms, e.g. quixotic (referring to the romantic and misguided title character in Don Quixote by Miguel de Cervantes), scrooge (from the avaricious main character in Charles Dickens' A Christmas Carol) and pollyanna (from the unfailingly optimistic character in Eleanor H. Porter's book of the same name).

Cant

Polari is a cant used by some actors, circus performers, and the gay subculture to communicate without outsiders understanding. Some Polari terms have crossed over into mainstream slang, in part through their usage in pop song lyrics and other works. Example include: acdc, barney, blag, butch, camp, khazi, cottaging, hoofer, mince, ogle, scarper, slap, strides, tod, [rough] trade (rough trade).

Verlan (French pronunciation: ​[vɛʁlɑ̃]), (verlan is the reverse of the expression "l'envers") is a type of argot in the French language, featuring inversion of syllables in a word, and is common in slang and youth language. It rests on a long French tradition of transposing syllables of individual words to create slang words.[20]:50 Some verlan words, such as meuf ("femme", which means "woman" roughly backwards), have become so commonplace that they have been included in the Petit Larousse. Like any slang, the purpose of verlan is to create a somewhat secret language that only its speakers can understand. Words becoming mainstream is counterproductive. As a result, such newly common words are re-verlanised: reversed a second time. The common meuf became feumeu.

Popular culture

Neologism development may be spurred, or at least spread, by popular culture. Examples of pop-culture neologisms include the American Alt-right (2010s), the Canadian portmanteau "Snowmageddon" (2009), the Russian parody "Monstration" (ca. 2004), Santorum (c. 2003).

Neologisms spread mainly through their exposure in mass media. The genericizing of brand names, such as "coke" for Coca-Cola, "kleenex" for Kleenex facial tissue, and "xerox" for Xerox photocopying, all spread through their popular use being enhanced by mass media.

However, in some limited cases, words break out of their original communities and spread through social media. "Doggo-Lingo", a term still below the threshold of a neologism according to Merriam-Webster, is an example of the latter which has specifically spread primarily through Facebook group and Twitter account use. The suspected origin of this way of referring to dogs stems from a Facebook group founded in 2008 and gaining popularity in 2014 in Australia. In Australian English it is common to use diminutives, often ending in –o, which could be where doggo-lingo was first used. The term has grown so that Merriam-Webster has acknowledged its use but notes the term needs to be found in published, edited work for a longer period of time before it can be deemed a new word, making it the perfect example of a neologism.

Translations

Because neologisms originate in one language, translations between languages can be difficult.

In the scientific community, where English is the predominant language for published research and studies, like-sounding translations (referred to as 'naturalization') are sometimes used. Alternatively, the English word is used along with a brief explanation of meaning. The four translation methods are emphasized in order to translate neologisms: transliteration, transcription, the use of analogues, calque or loan translation.

When translating from English to other languages, the naturalization method is most often used. The most common way that professional translators translate neologisms is through the Think aloud protocol (TAP), wherein translators find the most appropriate and natural sounding word through speech. As such, translators can use potential translations in sentences and test them with different structures and syntax. Correct translations from English for specific purposes into other languages is crucial in various industries and legal systems. Inaccurate translations can lead to 'translation asymmetry' or misunderstandings and miscommunication. Many technical glossaries of English translations exist to combat this issue in the medical, judicial, and technological fields.

Other uses

In psychiatry and neuroscience, the term neologism is used to describe words that have meaning only to the person who uses them, independent of their common meaning. This can be seen in schizophrenia, where a person may replace a word with a nonsensical one of their own invention, e.g. “I got so angry I picked up a dish and threw it at the geshinker.”  The use of neologisms may also be due to aphasia acquired after brain damage resulting from a stroke or head injury.

Replication crisis

From Wikipedia, the free encyclopedia
 

The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which it has been found that many scientific studies are difficult or impossible to replicate or reproduce. The replication crisis most severely affects the social sciences and medicine. The phrase was coined in the early 2010s as part of a growing awareness of the problem. The replication crisis represents an important body of research in the field of metascience.

Because the reproducibility of experimental results is an essential part of the scientific method, an inability to replicate the studies of others has potentially grave consequences for many fields of science in which significant theories are grounded on unreproducible experimental work. The replication crisis has been particularly widely discussed in the fields of medicine, where a number of efforts have been made to re-investigate classic results, to determine both the reliability of the results and, if found to be unreliable, the reasons for the failure of replication.

Scope

Overall

A 2016 poll of 1,500 scientists reported that 70% of them had failed to reproduce at least one other scientist's experiment (50% had failed to reproduce one of their own experiments). In 2009, 2% of scientists admitted to falsifying studies at least once and 14% admitted to personally knowing someone who did. Misconducts were reported more frequently by medical researchers than others.

In psychology

Several factors have combined to put psychology at the center of the controversy. According to a 2018 survey of 200 meta-analyses, "psychological research is, on average, afflicted with low statistical power". Much of the focus has been on the area of social psychology, although other areas of psychology such as clinical psychology, developmental psychology, and educational research have also been implicated.

Firstly, questionable research practices (QRPs) have been identified as common in the field. Such practices, while not intentionally fraudulent, involve capitalizing on the gray area of acceptable scientific practices or exploiting flexibility in data collection, analysis, and reporting, often in an effort to obtain a desired outcome. Examples of QRPs include selective reporting or partial publication of data (reporting only some of the study conditions or collected dependent measures in a publication), optional stopping (choosing when to stop data collection, often based on statistical significance of tests), post-hoc storytelling (framing exploratory analyses as confirmatory analyses), and manipulation of outliers (either removing outliers or leaving outliers in a dataset to cause a statistical test to be significant). A survey of over 2,000 psychologists indicated that a majority of respondents admitted to using at least one QRP. The publication bias (see Section "Causes" below) leads to an elevated number of false positive results. It is augmented by the pressure to publish as well as the author's own confirmation bias and is an inherent hazard in the field, requiring a certain degree of skepticism on the part of readers.

Secondly, psychology and social psychology in particular, has found itself at the center of several scandals involving outright fraudulent research, most notably the admitted data fabrication by Diederik Stapel as well as allegations against others. However, most scholars acknowledge that fraud is, perhaps, the lesser contribution to replication crises.

Thirdly, several effects in psychological science have been found to be difficult to replicate even before the current replication crisis. For example, the scientific journal Judgment and Decision Making has published several studies over the years that fail to provide support for the unconscious thought theory. Replications appear particularly difficult when research trials are pre-registered and conducted by research groups not highly invested in the theory under questioning.

These three elements together have resulted in renewed attention for replication supported by psychologist Daniel Kahneman. Scrutiny of many effects have shown that several core beliefs are hard to replicate. A 2014 special edition of the journal Social Psychology focused on replication studies and a number of previously held beliefs were found to be difficult to replicate. A 2012 special edition of the journal Perspectives on Psychological Science also focused on issues ranging from publication bias to null-aversion that contribute to the replication crises in psychology. In 2015, the first open empirical study of reproducibility in psychology was published, called the Reproducibility Project. Researchers from around the world collaborated to replicate 100 empirical studies from three top psychology journals. Fewer than half of the attempted replications were successful at producing statistically significant results in the expected directions, though most of the attempted replications did produce trends in the expected directions.

Many research trials and meta-analyses are compromised by poor quality and conflicts of interest that involve both authors and professional advocacy organizations, resulting in many false positives regarding the effectiveness of certain types of psychotherapy.

Although the British newspaper The Independent wrote that the results of the reproducibility project show that much of the published research is just "psycho-babble", the replication crisis does not necessarily mean that psychology is unscientific. Rather this process is part of the scientific process in which old ideas or those that cannot withstand careful scrutiny are pruned, although this pruning process is not always effective. The consequence is that some areas of psychology once considered solid, such as social priming, have come under increased scrutiny due to failed replications.

Nobel laureate and professor emeritus in psychology Daniel Kahneman argued that the original authors should be involved in the replication effort because the published methods are often too vague. Others such as Dr. Andrew Wilson disagree and argue that the methods should be written down in detail. An investigation of replication rates in psychology in 2012 indicated higher success rates of replication in replication studies when there was author overlap with the original authors of a study (91.7% successful replication rates in studies with author overlap compared to 64.6% success replication rates without author overlap).

Focus on the replication crisis has led to other renewed efforts in the discipline to re-test important findings. In response to concerns about publication bias and p-hacking, more than 140 psychology journals have adopted result-blind peer review where studies are accepted not on the basis of their findings and after the studies are completed, but before the studies are conducted and upon the basis of the methodological rigor of their experimental designs and the theoretical justifications for their statistical analysis techniques before data collection or analysis is done. Early analysis of this procedure has estimated that 61 percent of result-blind studies have led to null results, in contrast to an estimated 5 to 20 percent in earlier research. In addition, large-scale collaborations between researchers working in multiple labs in different countries and that regularly make their data openly available for different researchers to assess have become much more common in the field.

Psychology replication rates

A report by the Open Science Collaboration in August 2015 that was coordinated by Brian Nosek estimated the reproducibility of 100 studies in psychological science from three high-ranking psychology journals. Overall, 36% of the replications yielded significant findings (p value below 0.05) compared to 97% of the original studies that had significant effects. The mean effect size in the replications was approximately half the magnitude of the effects reported in the original studies.

The same paper examined the reproducibility rates and effect sizes by journal (Journal of Personality and Social Psychology [JPSP], Journal of Experimental Psychology: Learning, Memory, and Cognition [JEP:LMC], Psychological Science [PSCI]) and discipline (social psychology, developmental psychology). Study replication rates were 23% for JPSP, 48% for JEP:LMC, and 38% for PSCI. Studies in the field of cognitive psychology had a higher replication rate (50%) than studies in the field of social psychology (25%).

An analysis of the publication history in the top 100 psychology journals between 1900 and 2012 indicated that approximately 1.6% of all psychology publications were replication attempts. Articles were considered a replication attempt if the term "replication" appeared in the text. A subset of those studies (500 studies) was randomly selected for further examination and yielded a lower replication rate of 1.07% (342 of the 500 studies [68.4%] were actually replications). In the subset of 500 studies, analysis indicated that 78.9% of published replication attempts were successful.

A study published in 2018 in Nature Human Behaviour sought to replicate 21 social and behavioral science papers from Nature and Science, finding that only 13 could be successfully replicated. Similarly, in a study conducted under the auspices of the Center for Open Science, a team of 186 researchers from 60 different laboratories (representing 36 different nationalities from 6 different continents) conducted replications of 28 classic and contemporary findings in psychology. The focus of the study was not only on whether or not the findings from the original papers replicated, but also on the extent to which findings varied as a function of variations in samples and contexts. Overall, 14 of the 28 findings failed to replicate despite massive sample sizes. However, if a finding replicated, it replicated in most samples, while if a finding was not replicated, it failed to replicate with little variation across samples and contexts. This evidence is inconsistent with a popular explanation that failures to replicate in psychology are likely due to changes in the sample between the original and replication study.

A disciplinary social dilemma

Highlighting the social structure that discourages replication in psychology, Brian D. Earp and Jim A. C. Everett enumerated five points as to why replication attempts are uncommon:

  1. "Independent, direct replications of others' findings can be time-consuming for the replicating researcher"
  2. "[Replications] are likely to take energy and resources directly away from other projects that reflect one's own original thinking"
  3. "[Replications] are generally harder to publish (in large part because they are viewed as being unoriginal)"
  4. "Even if [replications] are published, they are likely to be seen as 'bricklaying' exercises, rather than as major contributions to the field"
  5. "[Replications] bring less recognition and reward, and even basic career security, to their authors"

For these reasons the authors advocated that psychology is facing a disciplinary social dilemma, where the interests of the discipline are at odds with the interests of the individual researcher.

"Methodological terrorism" controversy

With the replication crisis of psychology earning attention, Princeton University psychologist Susan Fiske drew controversy for calling out critics of psychology. She labeled these unidentified "adversaries" with names such as "methodological terrorist" and "self-appointed data police", and said that criticism of psychology should only be expressed in private or through contacting the journals. Columbia University statistician and political scientist Andrew Gelman, responded to Fiske, saying that she had found herself willing to tolerate the "dead paradigm" of faulty statistics and had refused to retract publications even when errors were pointed out. He added that her tenure as editor has been abysmal and that a number of published papers edited by her were found to be based on extremely weak statistics; one of Fiske's own published papers had a major statistical error and "impossible" conclusions.

In medicine

Out of 49 medical studies from 1990–2003 with more than 1000 citations, 45 claimed that the studied therapy was effective. Out of these studies, 16% were contradicted by subsequent studies, 16% had found stronger effects than did subsequent studies, 44% were replicated, and 24% remained largely unchallenged. The US Food and Drug Administration in 1977–1990 found flaws in 10–20% of medical studies. In a paper published in 2012, Glenn Begley, a biotech consultant working at Amgen, and Lee Ellis, at the University of Texas, found that only 11% of 53 pre-clinical cancer studies could be replicated. The irreproducible studies had a number of features in common, including that studies were not performed by investigators blinded to the experimental versus the control arms, there was a failure to repeat experiments, a lack of positive and negative controls, failure to show all the data, inappropriate use of statistical tests and use of reagents that were not appropriately validated.

A survey on cancer researchers found that half of them had been unable to reproduce a published result. A similar survey by Nature on 1,576 researchers who took a brief online questionnaire on reproducibility showed that more than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments. "Although 52% of those surveyed agree there is a significant 'crisis' of reproducibility, less than 31% think failure to reproduce published results means the result is probably wrong, and most say they still trust the published literature."

A 2016 article by John Ioannidis, Professor of Medicine and of Health Research and Policy at Stanford University School of Medicine and a Professor of Statistics at Stanford University School of Humanities and Sciences, elaborated on "Why Most Clinical Research Is Not Useful". In the article Ioannidis laid out some of the problems and called for reform, characterizing certain points for medical research to be useful again; one example he made was the need for medicine to be "patient centered" (e.g. in the form of the Patient-Centered Outcomes Research Institute) instead of the current practice to mainly take care of "the needs of physicians, investigators, or sponsors".

In marketing

Marketing is another discipline with a "desperate need" for replication. Many famous marketing studies fail to be repeated upon replication, a notable example being the "too-many-choices" effect, in which a high number of choices of product makes a consumer less likely to purchase. In addition to the previously mentioned arguments, replication studies in marketing are needed to examine the applicability of theories and models across countries and cultures, which is especially important because of possible influences of globalization.

In economics

A 2016 study in the journal Science found that one-third of 18 experimental studies from two top-tier economics journals (American Economic Review and the Quarterly Journal of Economics) failed to successfully replicate. A 2017 study in the Economic Journal suggested that "the majority of the average effects in the empirical economics literature are exaggerated by a factor of at least 2 and at least one-third are exaggerated by a factor of 4 or more".

In sports science

A 2018 study took the field of exercise and sports science to task for insufficient replication studies, limited reporting of both null and trivial results, and insufficient research transparency. Statisticians have criticized sports science for common use of a controversial statistical method called "magnitude-based inference" which has allowed sports scientists to extract apparently significant results from noisy data where ordinary hypothesis testing would have found none.

In water resource management

A 2019 study in Scientific Data suggested that only a small number of articles in water resources and management journals could be reproduced, while the majority of articles were not replicable due to data unavailability. The study estimated with 95% confidence that "results might be reproduced for only 0.6% to 6.8% of all 1,989 articles".

Political repercussions

In the US, science's reproducibility crisis has become a topic of political contention, linked to the attempt to diminish regulations – e.g. of emissions of pollutants, with the argument that these regulations are based on non-reproducible science. Previous attempts with the same aim accused studies used by regulators of being non-transparent.

Public awareness and perceptions

Concerns have been expressed within the scientific community that the general public may consider science less credible due to failed replications. Research supporting this concern is sparse, but a nationally representative survey in Germany showed that more than 75% of Germans have not heard of replication failures in science. The study also found that most Germans have positive perceptions of replication efforts: Only 18% think that non-replicability shows that science cannot be trusted, while 65% think that replication research shows that science applies quality control, and 80% agree that errors and corrections are part of science.

Causes 

A major cause of low reproducibility is the publication bias and the selection bias, in turn caused by the fact that statistically insignificant results are rarely published or discussed in publications on multiple potential effects. Among potential effects that are inexistent (or tiny), the statistical tests show significance (at the usual level) with 5% probability. If a large number of such effects are screened in a chase for significant results, these erroneously significant ones inundate the appropriately found ones, and they lead to (still erroneously) successful replications again with just 5% probability. An increasing proportion of such studies thus progressively lowers the replication rate corresponding to studies of plausibly relevant effects. Erroneously significant results may also come from questionable practices in data analysis called data dredging or P-hacking, HARKing, and researcher degrees of freedom.

Glenn Begley and John Ioannidis proposed these causes for the increase in the chase for significance:

  • Generation of new data/publications at an unprecedented rate.
  • Majority of these discoveries will not stand the test of time.
  • Failure to adhere to good scientific practice and the desperation to publish or perish.
  • Multiple varied stakeholders.

They conclude that no party is solely responsible, and no single solution will suffice.

These issues may lead to the canonization of false facts.

In fact, some predictions of an impending crisis in the quality control mechanism of science can be traced back several decades, especially among scholars in science and technology studies (STS). Derek de Solla Price – considered the father of scientometrics – predicted that science could reach 'senility' as a result of its own exponential growth. Some present day literature seems to vindicate this 'overflow' prophecy, lamenting the decay in both attention and quality.

Philosopher and historian of science Jerome R. Ravetz predicted in his 1971 book Scientific Knowledge and Its Social Problems that science – in its progression from "little" science composed of isolated communities of researchers, to "big" science or "techno-science" – would suffer major problems in its internal system of quality control. Ravetz recognized that the incentive structure for modern scientists could become dysfunctional, now known as the present 'publish or perish' challenge, creating perverse incentives to publish any findings, however dubious. According to Ravetz, quality in science is maintained only when there is a community of scholars linked by a set of shared norms and standards, all of whom are willing and able to hold one another accountable.

Historian Philip Mirowski offered a similar diagnosis in his 2011 book Science Mart (2011). In the title, the word 'Mart' is in reference to the retail giant 'Walmart', used by Mirowski as a metaphor for the commodification of science. In Mirowski's analysis, the quality of science collapses when it becomes a commodity being traded in a market. Mirowski argues his case by tracing the decay of science to the decision of major corporations to close their in-house laboratories. They outsourced their work to universities in an effort to reduce costs and increase profits. The corporations subsequently moved their research away from universities to an even cheaper option – Contract Research Organizations (CRO).

The crisis of science's quality control system is affecting the use of science for policy. This is the thesis of a recent work by a group of STS scholars, who identify in 'evidence based (or informed) policy' a point of present tension. Economist Noah Smith suggests that a factor in the crisis has been the overvaluing of research in academia and undervaluing of teaching ability, especially in fields with few major recent discoveries.

Social system theory, due to the German sociologist Niklas Luhmann offers another reading of the crisis . According to this theory each the systems such as 'economy', 'science', 'religion', 'media' and so on communicates using its own code, true/false for science, profit/loss for the economy, new/no-news for the media; according to some sociologists, science's mediatization, its commodification and its politicization – as a result of the structural coupling among systems – have led to a confusion of the original system codes. If science's code true/false is substituted for by those of the other systems, such as profit/loss, news/no-news, science's operation enters into an internal crisis.

Response 

Replication has been referred to as "the cornerstone of science". Replication studies attempt to evaluate whether published results reflect true findings or false positives. The integrity of scientific findings and reproducibility of research are important as they form the knowledge foundation on which future studies are built.

Metascience

Metascience is the use of scientific methodology to study science itself. Metascience seeks to increase the quality of scientific research while reducing waste. It is also known as "research on research" and "the science of science", as it uses research methods to study how research is done and where improvements can be made. Metascience concerns itself with all fields of research and has been described as "a bird's eye view of science." In the words of John Ioannidis, "Science is the best thing that has happened to human beings ... but we can do it better."

Meta-research continues to be conducted to identify the roots of the crisis and to address them. Methods of addressing the crisis include pre-registration of scientific studies and clinical trials as well as the founding of organizations such as CONSORT and the EQUATOR Network that issue guidelines for methodology and reporting. There are continuing efforts to reform the system of academic incentives, to improve the peer review process, to reduce the misuse of statistics, to combat bias in scientific literature, and to increase the overall quality and efficiency of the scientific process.

Tackling publication bias with pre-registration of studies

A recent innovation in scientific publishing to address the replication crisis is through the use of registered reports. The registered report format requires authors to submit a description of the study methods and analyses prior to data collection. Once the method and analysis plan is vetted through peer-review, publication of the findings is provisionally guaranteed, based on whether the authors follow the proposed protocol. One goal of registered reports is to circumvent the publication bias toward significant findings that can lead to implementation of questionable research practices and to encourage publication of studies with rigorous methods.

The journal Psychological Science has encouraged the preregistration of studies and the reporting of effect sizes and confidence intervals. The editor in chief also noted that the editorial staff will be asking for replication of studies with surprising findings from examinations using small sample sizes before allowing the manuscripts to be published.

Moreover, only a very small proportion of academic journals in psychology and neurosciences explicitly stated that they welcome submissions of replication studies in their aim and scope or instructions to authors. This phenomenon does not encourage the reporting or even attempt on replication studies.

Shift to a complex systems paradigm

It has been argued that research endeavours working within the conventional linear paradigm necessarily end up in replication difficulties. Problems arise if the causal processes in the system under study are "interaction-dominant" instead of "component dominant", multiplicative instead of additive, and with many small non-linear interactions producing macro-level phenomena, that are not reducible to their micro-level components. In the context of such complex systems, conventional linear models produce answers that are not reasonable, because it is not in principle possible to decompose the variance as suggested by the General Linear Model (GLM) framework – aiming to reproduce such a result is hence evidently problematic. The same questions are currently being asked in many fields of science, where researchers are starting to question assumptions underlying classical statistical methods.

Emphasizing replication attempts in teaching

Based on coursework in experimental methods at MIT, Stanford, and the University of Washington, it has been suggested that methods courses in psychology and other fields emphasize replication attempts rather than original studies. Such an approach would help students learn scientific methodology and provide numerous independent replications of meaningful scientific findings that would test the replicability of scientific findings. Some have recommended that graduate students should be required to publish a high-quality replication attempt on a topic related to their doctoral research prior to graduation.

Reducing the p-value required for claiming significance of new results

Many publications require a p-value of p < 0.05 to claim statistical significance. The paper "Redefine statistical significance", signed by a large number of scientists and mathematicians, proposes that in "fields where the threshold for defining statistical significance for new discoveries is p < 0.05, we propose a change to p < 0.005. This simple step would immediately improve the reproducibility of scientific research in many fields."

Their rationale is that "a leading cause of non-reproducibility (is that the) statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating 'statistically significant' findings with p < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems."

This call was subsequently criticised by another large group, who argued that "redefining" the threshold would not fix current problems, would lead to some new ones, and that in the end, all thresholds needed to be justified case-by-case instead of following general conventions.

Addressing the misinterpretation of p-values

Although statisticians are unanimous that use of the p < 0.05 provides weaker evidence than is generally appreciated, there is a lack of unanimity about what should be done about it. Some have advocated that Bayesian methods should replace p-values. This has not happened on a wide scale, partly because it is complicated, and partly because many users distrust the specification of prior distributions in the absence of hard data. A simplified version of the Bayesian argument, based on testing a point null hypothesis was suggested by Colquhoun (2014, 2017). The logical problems of inductive inference were discussed in "The problem with p-values" (2016).

The hazards of reliance on p-values were emphasized by pointing out that even observation of p = 0.001 was not necessarily strong evidence against the null hypothesis. Despite the fact that the likelihood ratio in favour of the alternative hypothesis over the null is close to 100, if the hypothesis was implausible, with a prior probability of a real effect being 0.1, even the observation of p = 0.001 would have a false positive risk of 8 percent. It would not even reach the 5 percent level.

It was recommended that the terms "significant" and "non-significant" should not be used. p-values and confidence intervals should still be specified, but they should be accompanied by an indication of the false positive risk. It was suggested that the best way to do this is to calculate the prior probability that would be necessary to believe in order to achieve a false positive risk of, say, 5%. The calculations can be done with R scripts that are provided, or, more simply, with a web calculator. This so-called reverse Bayesian approach, which was suggested by Matthews (2001), is one way to avoid the problem that the prior probability is rarely known.

Encouraging larger sample sizes

To improve the quality of replications, larger sample sizes than those used in the original study are often needed. Larger sample sizes are needed because estimates of effect sizes in published work are often exaggerated due to publication bias and large sampling variability associated with small sample sizes in an original study. Further, using significance thresholds usually leads to inflated effects, because particularly with small sample sizes, only the largest effects will become significant.

Sharing raw data in online repositories

Online repositories where data, protocols, and findings can be stored and evaluated by the public seek to improve the integrity and reproducibility of research. Examples of such repositories include the Open Science Framework, Registry of Research Data Repositories, and Psychfiledrawer.org. Sites like Open Science Framework offer badges for using open science practices in an effort to incentivize scientists. However, there has been concern that those who are most likely to provide their data and code for analyses are the researchers that are likely the most sophisticated. John Ioannidis at Stanford University suggested that "the paradox may arise that the most meticulous and sophisticated and method-savvy and careful researchers may become more susceptible to criticism and reputation attacks by reanalyzers who hunt for errors, no matter how negligible these errors are".

Funding for replication studies

In July 2016 the Netherlands Organisation for Scientific Research made €3 million available for replication studies. The funding is for replication based on reanalysis of existing data and replication by collecting and analysing new data. Funding is available in the areas of social sciences, health research and healthcare innovation.

In 2013 the Laura and John Arnold Foundation funded the launch of The Center for Open Science with a $5.25 million grant and by 2017 had provided an additional $10 million in funding. It also funded the launch of the Meta-Research Innovation Center at Stanford at Stanford University run by John Ioannidis and Steven Goodman to study ways to improve scientific research. It also provided funding for the AllTrials initiative led in part by Ben Goldacre.

Emphasize triangulation, not just replication

Marcus R. Munafò and George Davey Smith argue, in a piece published by Nature, that research should emphasize triangulation, not just replication. They claim that,

replication alone will get us only so far (and) might actually make matters worse ... We believe that an essential protection against flawed ideas is triangulation. This is the strategic use of multiple approaches to address one question. Each approach has its own unrelated assumptions, strengths and weaknesses. Results that agree across different methodologies are less likely to be artefacts. ... Maybe one reason replication has captured so much interest is the often-repeated idea that falsification is at the heart of the scientific enterprise. This idea was popularized by Karl Popper's 1950s maxim that theories can never be proved, only falsified. Yet an overemphasis on repeating experiments could provide an unfounded sense of certainty about findings that rely on a single approach. ... philosophers of science have moved on since Popper. Better descriptions of how scientists actually work include what epistemologist Peter Lipton called in 1991 "inference to the best explanation".

Raise the overall standards of methods presentation

Some authors have argued that the insufficient communication of experimental methods is a major contributor to the reproducibility crisis and that improving the quality of how experimental design and statistical analyses are reported would help improve the situation. These authors tend to plea for both a broad cultural change in the scientific community of how statistics are considered and a more coercive push from scientific journals and funding bodies.

Implications for the pharmaceutical industry

Pharmaceutical companies and venture capitalists maintain research laboratories or contract with private research service providers (e.g. Envigo and Smart Assays Biotechnologies) whose job is to replicate academic studies, in order to test if they are accurate prior to investing or trying to develop a new drug based on that research. The financial stakes are high for the company and investors, so it is cost effective for them to invest in exact replications. Execution of replication studies consume resources. Further, doing an expert replication requires not only generic expertise in research methodology, but specific expertise in the often narrow topic of interest. Sometimes research requires specific technical skills and knowledge, and only researchers dedicated to a narrow area of research might have those skills. Right now, funding agencies are rarely interested in bankrolling replication studies, and most scientific journals are not interested in publishing such results. Amgen Oncology's cancer researchers were only able to replicate 11 percent of the innovative studies they selected to pursue over a 10-year period; a 2011 analysis by researchers with pharmaceutical company Bayer found that the company's in-house findings agreed with the original results only a quarter of the time, at the most. The analysis also revealed that, when Bayer scientists were able to reproduce a result in a direct replication experiment, it tended to translate well into clinical applications; meaning that reproducibility is a useful marker of clinical potential.

Butane

From Wikipedia, the free encyclopedia ...