Search This Blog

Sunday, April 11, 2021

Metascience

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Metascience

Metascience (also known as meta-research) is the use of scientific methodology to study science itself. Metascience seeks to increase the quality of scientific research while reducing waste. It is also known as "research on research" and "the science of science", as it uses research methods to study how research is done and where improvements can be made. Metascience concerns itself with all fields of research and has been described as "a bird's eye view of science." In the words of John Ioannidis, "Science is the best thing that has happened to human beings ... but we can do it better."

In 1966, an early meta-research paper examined the statistical methods of 295 papers published in ten high-profile medical journals. It found that, "in almost 73% of the reports read ... conclusions were drawn when the justification for these conclusions was invalid." Meta-research in the following decades found many methodological flaws, inefficiencies, and poor practices in research across numerous scientific fields. Many scientific studies could not be reproduced, particularly in medicine and the soft sciences. The term "replication crisis" was coined in the early 2010s as part of a growing awareness of the problem.

Measures have been implemented to address the issues revealed by metascience. These measures include the pre-registration of scientific studies and clinical trials as well as the founding of organizations such as CONSORT and the EQUATOR Network that issue guidelines for methodology and reporting. There are continuing efforts to reduce the misuse of statistics, to eliminate perverse incentives from academia, to improve the peer review process, to combat bias in scientific literature, and to increase the overall quality and efficiency of the scientific process.

History

In 1966, an early meta-research paper examined the statistical methods of 295 papers published in ten high-profile medical journals. It found that, "in almost 73% of the reports read ... conclusions were drawn when the justification for these conclusions was invalid." In 2005, John Ioannidis published a paper titled "Why Most Published Research Findings Are False", which argued that a majority a papers in the medical field produce conclusions that are wrong. The paper went on to become the most downloaded paper in the Public Library of Science and is considered foundational to the field of metascience. In a related study with Jeremy Howick and Despina Koletsi, Ioannidis showed that only a minority of medical interventions are supported by 'high quality' evidence according to The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. Later meta-research identified widespread difficulty in replicating results in many scientific fields, including psychology and medicine. This problem was termed "the replication crisis". Metascience has grown as a reaction to the replication crisis and to concerns about waste in research.

Many prominent publishers are interested in meta-research and in improving the quality of their publications. Top journals such as Science, The Lancet, and Nature, provide ongoing coverage of meta-research and problems with reproducibility. In 2012 PLOS ONE launched a Reproducibility Initiative. In 2015 Biomed Central introduced a minimum-standards-of-reporting checklist to four titles.

The first international conference in the broad area of meta-research was the Research Waste/EQUATOR conference held in Edinburgh in 2015; the first international conference on peer review was the Peer Review Congress held in 1989. In 2016, Research Integrity and Peer Review was launched. The journal's opening editorial called for "research that will increase our understanding and suggest potential solutions to issues related to peer review, study reporting, and research and publication ethics".

Areas of meta-research

Metascience can be categorize into five major areas of interest: Methods, Reporting, Reproducibility, Evaluation, and Incentives. These correspond, respectively, with how to perform, communicate, verify, evaluate, and reward research.

Methods

Metascience seeks to identify poor research practices, including biases in research, poor study design, abuse of statistics, and to find methods to reduce these practices. Meta-research has identified numerous biases in scientific literature. Of particular note is the widespread misuse of p-values and abuse of statistical significance.

Reporting

Meta-research has identified poor practices in reporting, explaining, disseminating and popularizing research, particularly within the social and health sciences. Poor reporting makes it difficult to accurately interpret the results of scientific studies, to replicate studies, and to identify biases and conflicts of interest in the authors. Solutions include the implementation of reporting standards, and greater transparency in scientific studies (including better requirements for disclosure of conflicts of interest). There is an attempt to standardize reporting of data and methodology through the creation of guidelines by reporting agencies such as CONSORT and the larger EQUATOR Network.

Reproducibility

The replication crisis is an ongoing methodological crisis in which it has been found that many scientific studies are difficult or impossible to replicate. While the crisis has its roots in the meta-research of the mid- to late-1900s, the phrase "replication crisis" was not coined until the early 2010s as part of a growing awareness of the problem. The replication crisis particularly affects psychology (especially social psychology) and medicine. Replication is an essential part of the scientific process, and the widespread failure of replication puts into question the reliability of affected fields.

Moreover, replication of research (or failure to replicate) is considered less influential than original research, and is less likely to be published in many fields. This discourages the reporting of, and even attempts to replicate, studies.

Evaluation

Metascience seeks to create a scientific foundation for peer review. Meta-research evaluates peer review systems including pre-publication peer review, post-publication peer review, and open peer review. It also seeks to develop better research funding criteria.

Incentives

Metascience seeks to promote better research through better incentive systems. This includes studying the accuracy, effectiveness, costs, and benefits of different approaches to ranking and evaluating research and those who perform it. Critics argue that perverse incentives have created a publish-or-perish environment in academia which promotes the production of junk science, low quality research, and false positives. According to Brian Nosek, “The problem that we face is that the incentive system is focused almost entirely on getting research published, rather than on getting research right.” Proponents of reform seek to structure the incentive system to favor higher-quality results.

Reforms

Meta-research identifying flaws in scientific practice has inspired reforms in science. These reforms seek to address and fix problems in scientific practice which lead to low-quality or inefficient research.

Pre-registration

The practice of registering a scientific study before it is conducted is called pre-registration. It arose as a means to address the replication crisis. Pregistration requires the submission of a registered report, which is then accepted for publication or rejected by a journal based on theoretical justification, experimental design, and the proposed statistical analysis. Pre-registration of studies serves to prevent publication bias, reduce data dredging, and increase replicability.

Reporting standards

Studies showing poor consistency and quality of reporting have demonstrated the need for reporting standards and guidelines in science, which has led to the rise of organisations that produce such standards, such as CONSORT (Consolidated Standards of Reporting Trials) and the EQUATOR Network.

The EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network is an international initiative aimed at promoting transparent and accurate reporting of health research studies to enhance the value and reliability of medical research literature. The EQUATOR Network was established with the goals of raising awareness of the importance of good reporting of research, assisting in the development, dissemination and implementation of reporting guidelines for different types of study designs, monitoring the status of the quality of reporting of research studies in the health sciences literature, and conducting research relating to issues that impact the quality of reporting of health research studies. The Network acts as an "umbrella" organisation, bringing together developers of reporting guidelines, medical journal editors and peer reviewers, research funding bodies, and other key stakeholders with a mutual interest in improving the quality of research publications and research itself.

Applications

Medicine

Clinical research in medicine is often of low quality, and many studies cannot be replicated. An estimated 85% of research funding is wasted. Additionally, the presence of bias affects research quality. The pharmaceutical industry exerts substantial influence on the design and execution of medical research. Conflicts of interest are common among authors of medical literature and among editors of medical journals. While almost all medical journals require their authors to disclose conflicts of interest, editors are not required to do so. Financial conflicts of interest have been linked to higher rates of positive study results. In antidepressant trials, pharmaceutical sponsorship is the best predictor of trial outcome.

Blinding is another focus of meta-research, as error caused by poor blinding is a source of experimental bias. Blinding is not well reported in medical literature, and widespread misunderstanding of the subject has resulted in poor implementation of blinding in clinical trials. Furthermore, failure of blinding is rarely measured or reported. Research showing the failure of blinding in antidepressant trials has led some scientists to argue that antidepressants are no better than placebo. In light of meta-research showing failures of blinding, CONSORT standards recommend that all clinical trials assess and report the quality of blinding.

Studies have shown that systematic reviews of existing research evidence are sub-optimally used in planning a new research or summarizing the results. Cumulative meta-analyses of studies evaluating the effectiveness of medical interventions have shown that many clinical trials could have been avoided if a systematic review of existing evidence was done prior to conducting a new trial. For example, Lau et al. analyzed 33 clinical trials (involving 36974 patients) evaluating the effectiveness of intravenous streptokinase for acute myocardial infarction. Their cumulative meta-analysis demonstrated that 25 of 33 trials could have been avoided if a systematic review was conducted prior to conducting a new trial. In other words, randomizing 34542 patients was potentially unnecessary. One study analyzed 1523 clinical trials included in 227 meta-analyses and concluded that "less than one quarter of relevant prior studies" were cited. They also confirmed earlier findings that most clinical trial reports do not present systematic review to justify the research or summarize the results.

Many treatments used in modern medicine have been proven to be ineffective, or even harmful. A 2007 study by John Ioannidis found that it took an average of ten years for the medical community to stop referencing popular practices after their efficacy was unequivocally disproven.

Psychology

Metascience has revealed significant problems in psychological research. The field suffers from high bias, low reproducibility, and widespread misuse of statistics. The replication crisis affects psychology more strongly than any other field; as many as two-thirds of highly publicized findings may be impossible to replicate. Meta-research finds that 80-95% of psychological studies support their initial hypotheses, which strongly implies the existence of publication bias.

The replication crisis has led to renewed efforts to re-test important findings. In response to concerns about publication bias and p-hacking, more than 140 psychology journals have adopted result-blind peer review, in which studies are pre-registered and published without regard for their outcome. An analysis of these reforms estimated that 61 percent of result-blind studies produce null results, in contrast with 5 to 20 percent in earlier research. This analysis shows that result-blind peer review substantially reduces publication bias.

Psychologists routinely confuse statistical significance with practical importance, enthusiastically reporting great certainty in unimportant facts. Some psychologists have responded with an increased use of effect size statistics, rather than sole reliance on the p values.

Physics

Richard Feynman noted that estimates of physical constants were closer to published values than would be expected by chance. This was believed to be the result of confirmation bias: results that agreed with existing literature were more likely to be believed, and therefore published. Physicists now implement blinding to prevent this kind of bias.

Associated fields

Journalology

Journalology, also known as publication science, is the scholarly study of all aspects of the academic publishing process. The field seeks to improve the quality of scholarly research by implementing evidence-based practices in academic publishing. The term "journalology" was coined by Stephen Lock, the former editor-in-chief of the BMJ. The first Peer Review Congress, held in 1989 in Chicago, Illinois, is considered a pivotal moment in the founding of journalology as a distinct field. The field of journalology has been influential in pushing for study pre-registration in science, particularly in clinical trials. Clinical-trial registration is now expected in most countries.

Scientometrics

Scientometrics concerns itself with measuring bibliographic data in scientific publications. Major research issues include the measurement of the impact of research papers and academic journals, the understanding of scientific citations, and the use of such measurements in policy and management contexts.

Scientific data science

Scientific data science is the use of data science to analyse research papers. It encompasses both qualitative and quantitative methods. Research in scientific data science includes fraud detection and citation network analysis.

Invalid science

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Invalid_science

Invalid science consists of scientific claims based on experiments that cannot be reproduced or that are contradicted by experiments that can be reproduced. Recent analyses indicate that the proportion of retracted claims in the scientific literature is steadily increasing. The number of retractions has grown tenfold over the past decade, but they still make up approximately 0.2% of the 1.4m papers published annually in scholarly journals.

The U.S. Office of Research Integrity (ORI), investigates scientific misconduct.

Incidence

Science magazine ranked first for the number of articles retracted at 70, just edging out PNAS, which retracted 69. Thirty-two of Science's retractions were due to fraud or suspected fraud, and 37 to error. A subsequent "retraction index" indicated that journals with relatively high impact factors, such as Science, Nature and Cell, had a higher rate of retractions. Under 0.1% of papers in PubMed had were retracted of more than 25 million papers going back to the 1940s.

The fraction of retracted papers due to scientific misconduct was estimated at two-thirds, according to studies of 2047 papers published since 1977. Misconducted included fraud and plagiarism. Another one-fifth were retracted because of mistakes, and the rest were pulled for unknown or other reasons.

A separate study analyzed 432 claims of genetic links for various health risks that vary between men and women. Only one of these claims proved to be consistently reproducible. Another meta review, found that of the 49 most-cited clinical research studies published between 1990 and 2003, more than 40 percent of them were later shown to be either totally wrong or significantly incorrect.

Biological sciences

In 2012 biotech firm Amgen was able to reproduce just six of 53 important studies in cancer research. Earlier, a group at Bayer, a drug company, successfully repeated only one fourth of 67 important papers. In 2000-10 roughly 80,000 patients took part in clinical trials based on research that was later retracted because of mistakes or improprieties.

Paleontology

Nathan Mhyrvold failed repeatedly to replicate the findings of several papers on dinosaur growth. Dinosaurs added a layer to their bones each year. Tyrannosaurus rex was thought to have increased in size by more than 700 kg a year, until Mhyrvold showed that this was a factor of 2 too large. In 4 of 12 papers he examined, the original data had been lost. In three, the statistics were correct, while three had serious errors that invalidated their conclusions. Two papers mistakenly relied on data from these three. He discovered that some of the paper's graphs did not reflect the data. In one case, he found that only four of nine points on the graph came from data cited in the paper.

Major retractions

Torcetrapib was originally hyped as a drug that could block a protein that converts HDL cholesterol into LDL with the potential to "redefine cardiovascular treatment". One clinical trial showed that the drug could increase HDL and decrease LDL. Two days after Pfizer announced its plans for the drug, it ended the Phase III clinical trial due to higher rates of chest pain and heart failure and a 60 percent increase in overall mortality. Pfizer had invested more than $1 billion in developing the drug.

An in-depth review of the most highly cited biomarkers (whose presence are used to infer illness and measure treatment effects) claimed that 83 percent of supposed correlations became significantly weaker in subsequent studies. Homocysteine is an amino acid whose levels correlated with heart disease. However, a 2010 study showed that lowering homocysteine by nearly 30 percent had no effect on heart attack or stroke.

Priming

Priming studies claim that decisions can be influenced by apparently irrelevant events that a subject witnesses just before making a choice. Nobel Prize-winner Daniel Kahneman alleges that much of it is poorly founded. Researchers have been unable to replicate some of the more widely cited examples. A paper in PLoS ONE reported that nine separate experiments could not reproduce a study purporting to show that thinking about a professor before taking an intelligence test leads to a higher score than imagining a football hooligan. A further systematic replication involving 40 different labs around the world did not replicate the main finding. However, this latter systematic replication showed that participants who did not think there was a relation between thinking about a hooligan or a professor where significantly more susceptible to the priming manipulation.

Potential causes

Competition

In the 1950s, when academic research accelerated during the cold war, the total number of scientists was a few hundred thousand. In the new century 6m-7m researchers are active. The number of research jobs has not matched this increase. Every year six new PhDs compete for every academic post. Replicating other researcher’s results is not perceived to be valuable. The struggle to compete encourages exaggeration of findings and biased data selection. A recent survey found that one in three researchers knows of a colleague who has at least somewhat distorted their results.

Publication bias

Major journals reject in excess of 90% of submitted manuscripts and tend to favor the most dramatic claims. The statistical measures that researchers use to test their claims allow a fraction of false claims to appear valid. Invalid claims are more likely to be dramatic (because they are false.) Without replication, such errors are less likely to be caught.

Conversely, failures to prove a hypothesis are rarely even offered for publication. “Negative results” now account for only 14% of published papers, down from 30% in 1990. Knowledge of what is not true is as important as of what is true.

Peer review

Peer review is the primary validation technique employed by scientific publications. However, a prominent medical journal tested the system and found major failings. It supplied research with induced errors and found that most reviewers failed to spot the mistakes, even after being told of the tests.

A pseudonymous fabricated paper on the effects of a chemical derived from lichen on cancer cells was submitted to 304 journals for peer review. The paper was filled with errors of study design, analysis and interpretation. 157 lower-rated journals accepted it. Another study sent an article containing eight deliberate mistakes in study design, analysis and interpretation to more than 200 of the British Medical Journal’s regular reviewers. On average, they reported fewer than two of the problems.

Peer reviewers typically do not re-analyse data from scratch, checking only that the authors’ analysis is properly conceived.

Statistics

Type I and type II errors

Scientists divide errors into type I, incorrectly asserting the truth of a hypothesis (false positive) and type II, rejecting a correct hypothesis (false negative). Statistical checks assess the probability that data which seem to support a hypothesis come about simply by chance. If the probability is less than 5%, the evidence is rated “statistically significant”. One definitional consequence is a type one error rate of one in 20.

Statistical power

In 2005 Stanford epidemiologist John Ioannidis showed that the idea that only one paper in 20 gives a false-positive result was incorrect. He claimed, “most published research findings are probably false.” He found three categories of problems: insufficient “statistical power” (avoiding type II errors); the unlikeliness of the hypothesis; and publication bias favoring novel claims.

A statistically powerful study identifies factors with only small effects on data. In general studies with more repetitions that run the experiment more times on more subjects have greater power. A power of 0.8 means that of ten true hypotheses tested, the effects of two are missed. Ioannidis found that in neuroscience the typical statistical power is 0.21; another study found that psychology studies average 0.35.

Unlikeliness is a measure of the degree of surprise in a result. Scientists prefer surprising results, leading them to test hypotheses that are unlikely to very unlikely. Ioannidis claimed that in epidemiology, some one in ten hypotheses should be true. In exploratory disciplines like genomics, which rely on examining voluminous data about genes and proteins, only one in a thousand should prove correct.

In a discipline in which 100 out of 1,000 hypotheses are true, studies with a power of 0.8 will find 80 and miss 20. Of the 900 incorrect hypotheses, 5% or 45 will be accepted because of type I errors. Adding the 45 false positives to the 80 true positives gives 125 positive results, or 36% specious. Dropping statistical power to 0.4, optimistic for many fields, would still produce 45 false positives but only 40 true positives, less than half.

Negative results are more reliable. Statistical power of 0.8 produces 875 negative results of which only 20 are false, giving an accuracy of over 97%. Negative results however account for a minority of published results, varying by discipline. A study of 4,600 papers found that the proportion of published negative results dropped from 30% to 14% between 1990 and 2007.

Subatomic physics sets an acceptable false-positive rate of one in 3.5m (known as the five-sigma standard). However, even this does not provide perfect protection. The problem invalidates some 3/4s of machine learning studies according to one review.

Statistical significance

Statistical significance is a measure for testing statistical correlation. It was invented by English mathematician Ronald Fisher in the 1920s. It defines a “significant” result as any data point that would be produced by chance less than 5 (or more stringently, 1) percent of the time. A significant result is widely seen as an important indicator that the correlation is not random.

While correlations track the relationship between truly independent measurements, such as smoking and cancer, they are much less effective when variables cannot be isolated, a common circumstance in biological systems. For example, statistics found a high correlation between lower back pain and abnormalities in spinal discs, although it was later discovered that serious abnormalities were present in two-thirds of pain-free patients.

Minimum threshold publishers

Journals such as PLoS One use a “minimal-threshold” standard, seeking to publish as much science as possible, rather than to pick out the best work. Their peer reviewers assess only whether a paper is methodologically sound. Almost half of their submissions are still rejected on that basis.

Unpublished research

Only 22% of the clinical trials financed by the National Institutes of Health (NIH) released summary results within one year of completion, even though the NIH requires it. Fewer than half published within 30 months; a third remained unpublished after 51 months. When other scientists rely on invalid research, they may waste time on lines of research that are themselves invalid. The failure to report failures means that researchers waste money and effort exploring blind alleys already investigated by other scientists.

Fraud

In 21 surveys of academics (mostly in the biomedical sciences but also in civil engineering, chemistry and economics) carried out between 1987 and 2008, 2% admitted fabricating data, but 28% claimed to know of colleagues who engaged in questionable research practices.

Lack of access to data and software

Clinical trials are generally too costly to rerun. Access to trial data is the only practical approach to reassessment. A campaign to persuade pharmaceutical firms to make all trial data available won its first convert in February 2013 when GlaxoSmithKline became the first to agree.

Software used in a trial is generally considered to be proprietary intellectual property and is not available to replicators, further complicating matters. Journals that insist on data-sharing tend not to do the same for software.

Even well-written papers may not include sufficient detail and/or tacit knowledge (subtle skills and extemporisations not considered notable) for the replication to succeed. One cause of replication failure is insufficient control of the protocol, which can cause disputes between the original and replicating researchers.

Reform

Statistics training

Geneticists have begun more careful reviews, particularly of the use of statistical techniques. The effect was to stop a flood of specious results from genome sequencing.

Protocol registration

Registering research protocols in advance and monitoring them over the course of a study can prevent researchers from modifying the protocol midstream to highlight preferred results. Providing raw data for other researchers to inspect and test can also better hold researchers to account.

Post-publication review

Replacing peer review with post-publication evaluations can encourage researchers to think more about the long-term consequences of excessive or unsubstantiated claims. That system was adopted in physics and mathematics with good results.

Replication

Few researchers, especially junior workers, seek opportunities to replicate others' work, partly to protect relationships with senior researchers.

Reproduction benefits from access to the original study's methods and data. More than half of 238 biomedical papers published in 84 journals failed to identify all the resources (such as chemical reagents) necessary to reproduce the results. In 2008 some 60% of researchers said they would share raw data; in 2013 just 45% do. Journals have begun to demand that at least some raw data be made available, although only 143 of 351 randomly selected papers covered by some data-sharing policy actually complied.

The Reproducibility Initiative is a service allowing life scientists to pay to have their work validated by an independent lab. In October 2013 the initiative received funding to review 50 of the highest-impact cancer findings published between 2010 and 2012. Blog Syn is a website run by graduate students that is dedicated to reproducing chemical reactions reported in papers.

In 2013 replication efforts received greater attention. Nature and related publications introduced an 18-point checklist for life science authors in May, in its effort to ensure that its published research can be reproduced. Expanded "methods" sections and all data were to be available online. The Centre for Open Science opened as an independent laboratory focused on replication. The journal Perspectives on Psychological Science announced a section devoted to replications. Another project announced plans to replicate 100 studies published in the first three months of 2008 in three leading psychology journals.

Major funders, including the European Research Council, the US National Science Foundation and Research Councils UK have not changed their preference for new work over replications.

 

Publication bias

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Publication_bias

Publication bias is a type of bias that occurs in published academic research. It occurs when the outcome of an experiment or research study influences the decision whether to publish or otherwise distribute it. Publishing only results that show a significant finding disturbs the balance of findings, and inserts bias in favor of positive results. The study of publication bias is an important topic in metascience.

Studies with significant results can be of the same standard as studies with a null result with respect to quality of execution and design. However, statistically significant results are three times more likely to be published than papers with null results. A consequence of this is that researchers are unduly motivated to manipulate their practices to ensure that a statistically significant result is reported.

Multiple factors contribute to publication bias. For instance, once a scientific finding is well established, it may become newsworthy to publish reliable papers that fail to reject the null hypothesis. It has been found that the most common reason for non-publication is simply that investigators decline to submit results, leading to non-response bias. Factors cited as underlying this effect include investigators assuming they must have made a mistake, failure to support a known finding, loss of interest in the topic, or anticipation that others will be uninterested in the null results. The nature of these issues and the problems that have been triggered, have been referred to as the 5 diseases that threaten science, which include: "significosis, an inordinate focus on statistically significant results; neophilia, an excessive appreciation for novelty; theorrhea, a mania for new theory; arigorium, a deficiency of rigor in theoretical and empirical work; and finally, disjunctivitis, a proclivity to produce large quantities of redundant, trivial, and incoherent works."

Attempts to identify unpublished studies often prove difficult or are unsatisfactory. In an effort to combat this problem, some journals require that studies submitted for publication are pre-registered (registering a study prior to collection of data and analysis) with organizations like the Center for Open Science.

Other proposed strategies to detect and control for publication bias include p-curve analysis and disfavoring small and non-randomised studies because of their demonstrated high susceptibility to error and bias.

Definition

Publication bias occurs when the publication of research results depends not just on the quality of the research but also on the hypothesis tested, and the significance and direction of effects detected. The subject was first discussed in 1959 by statistician Theodore Sterling to refer to fields in which "successful" research is more likely to be published. As a result, "the literature of such a field consists in substantial part of false conclusions resulting from errors of the first kind in statistical tests of significance". In the worst case, false conclusions could canonize as being true if the publication rate of negative results is too low.

Publication bias is sometimes called the file-drawer effect, or file-drawer problem. This term suggests that results not supporting the hypotheses of researchers often go no further than the researchers' file drawers, leading to a bias in published research. The term "file drawer problem" was coined by psychologist Robert Rosenthal in 1979.

Positive-results bias, a type of publication bias, occurs when authors are more likely to submit, or editors are more likely to accept, positive results than negative or inconclusive results. Outcome reporting bias occurs when multiple outcomes are measured and analyzed, but the reporting of these outcomes is dependent on the strength and direction of its results. A generic term coined to describe these post-hoc choices is HARKing ("Hypothesizing After the Results are Known").

Evidence

Meta-analysis of stereotype threat on girls' math scores showing asymmetry typical of publication bias. From Flore, P. C., & Wicherts, J. M. (2015)

There is extensive meta-research on publication bias in the biomedical field. Investigators following clinical trials from the submission of their protocols to ethics committees (or regulatory authorities) until the publication of their results observed that those with positive results are more likely to be published. In addition, studies often fail to report negative results when published, as demonstrated by research comparing study protocols with published articles.

The presence of publication bias was investigated in meta-analyses. The largest such analysis investigated the presence of publication bias in systematic reviews of medical treatments from the Cochrane Library. The study showed that statistically positive significant findings are 27% more likely to be included in meta-analyses of efficacy than other findings. Results showing no evidence of adverse effects have a 78% greater probability of inclusion in safety studies than statistically significant results showing adverse effects. Evidence of publication bias was found in meta-analyses published in prominent medical journals.

Impact on meta-analysis

Where publication bias is present, published studies are no longer a representative sample of the available evidence. This bias distorts the results of meta-analyses and systematic reviews. For example, evidence-based medicine is increasingly reliant on meta-analysis to assess evidence.

Meta-analyses and systematic reviews can account for publication bias by including evidence from unpublished studies and the grey literature. The presence of publication bias can also be explored by constructing a funnel plot in which the estimate of the reported effect size is plotted against a measure of precision or sample size. The premise is that the scatter of points should reflect a funnel shape, indicating that the reporting of effect sizes is not related to their statistical significance. However, when small studies are predominately in one direction (usually the direction of larger effect sizes), asymmetry will ensue and this may be indicative of publication bias.

Because an inevitable degree of subjectivity exists in the interpretation of funnel plots, several tests have been proposed for detecting funnel plot asymmetry. These are often based on linear regression, and may adopt a multiplicative or additive dispersion parameter to adjust for the presence of between-study heterogeneity. Some approaches may even attempt to compensate for the (potential) presence of publication bias, which is particularly useful to explore the potential impact on meta-analysis results.

Compensation examples

Two meta-analyses of the efficacy of reboxetine as an antidepressant demonstrated attempts to detect publication bias in clinical trials. Based on positive trial data, reboxetine was originally passed as a treatment for depression in many countries in Europe and the UK in 2001 (though in practice it is rarely used for this indication). A 2010 meta-analysis concluded that reboxetine was ineffective and that the preponderance of positive-outcome trials reflected publication bias, mostly due to trials published by the drug manufacturer Pfizer. A subsequent meta-analysis published in 2011, based on the original data, found flaws in the 2010 analyses and suggested that the data indicated reboxetine was effective in severe depression. Examples of publication bias are given by Ben Goldacre and Peter Wilmshurst.

In the social sciences, a study of published papers exploring the relationship between corporate social and financial performance found that "in economics, finance, and accounting journals, the average correlations were only about half the magnitude of the findings published in Social Issues Management, Business Ethics, or Business and Society journals".

One example cited as an instance of publication bias is the refusal to publish attempted replications of Bem's work that claimed evidence for precognition by The Journal of Personality and Social Psychology (the original publisher of Bem's article).

An analysis comparing studies of gene-disease associations originating in China to those originating outside China found that those conducted within the country reported a stronger association and a more statistically significant result.

Risks

John Ioannidis argues that "claimed research findings may often be simply accurate measures of the prevailing bias." He lists the following factors as those that make a paper with a positive result more likely to enter the literature and suppress negative-result papers:

  • The studies conducted in a field have small sample sizes.
  • The effect sizes in a field tend to be smaller.
  • There is both a greater number and lesser preselection of tested relationships.
  • There is greater flexibility in designs, definitions, outcomes, and analytical modes.
  • There are prejudices (financial interest, political, or otherwise).
  • The scientific field is hot and there are more scientific teams pursuing publication.

Other factors include experimenter bias and white hat bias.

Remedies

Publication bias can be contained through better-powered studies, enhanced research standards, and careful consideration of true and non-true relationships. Better-powered studies refer to large studies that deliver definitive results or test major concepts and lead to low-bias meta-analysis. Enhanced research standards such as the pre-registration of protocols, the registration of data collections and adherence to established protocols are other techniques. To avoid false-positive results, the experimenter must consider the chances that they are testing a true or non-true relationship. This can be undertaken by properly assessing the false positive report probability based on the statistical power of the test and reconfirming (whenever ethically acceptable) established findings of prior studies known to have minimal bias.

Study registration

In September 2004, editors of prominent medical journals (including the New England Journal of Medicine, The Lancet, Annals of Internal Medicine, and JAMA) announced that they would no longer publish results of drug research sponsored by pharmaceutical companies, unless that research was registered in a public clinical trials registry database from the start. Furthermore, some journals (e.g. Trials), encourage publication of study protocols in their journals.

The World Health Organization (WHO) agreed that basic information about all clinical trials should be registered at the study's inception, and that this information should be publicly accessible through the WHO International Clinical Trials Registry Platform. Additionally, public availability of complete study protocols, alongside reports of trials, is becoming more common for studies.


Grok

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Grok

Grok /ˈɡrɒk/ is a neologism coined by American writer Robert A. Heinlein for his 1961 science fiction novel Stranger in a Strange Land. While the Oxford English Dictionary summarizes the meaning of grok as "to understand intuitively or by empathy, to establish rapport with" and "to empathize or communicate sympathetically (with); also, to experience enjoyment", Heinlein's concept is far more nuanced, with critic Istvan Csicsery-Ronay Jr. observing that "the book's major theme can be seen as an extended definition of the term." The concept of grok garnered significant critical scrutiny in the years after the book's initial publication. The term and aspects of the underlying concept have become part of communities such as computer science.

Descriptions of grok in Stranger in a Strange Land

Critic David E. Wright Sr. points out that in the 1991 "uncut" edition of Stranger, the word grok "was used first without any explicit definition on page 22" and continued to be used without being explicitly defined until page 253 (emphasis in original). He notes that this first intensional definition is simply "to drink", but that this is only a metaphor "much as English 'I see' often means the same as 'I understand'". Critics have bridged this absence of explicit definition by citing passages from Stranger that illustrate the term. A selection of these passages follows:

Grok means "to understand", of course, but Dr. Mahmoud, who might be termed the leading Terran expert on Martians, explains that it also means, "to drink" and "a hundred other English words, words which we think of as antithetical concepts. 'Grok' means all of these. It means 'fear', it means 'love', it means 'hate' – proper hate, for by the Martian 'map' you cannot hate anything unless you grok it, understand it so thoroughly that you merge with it and it merges with you – then you can hate it. By hating yourself. But this implies that you love it, too, and cherish it and would not have it otherwise. Then you can hate – and (I think) Martian hate is an emotion so black that the nearest human equivalent could only be called mild distaste.

Grok means "identically equal". The human cliché "This hurts me worse than it does you" has a distinctly Martian flavor. The Martian seems to know instinctively what we learned painfully from modern physics, that observer acts with observed through the process of observation. Grok means to understand so thoroughly that the observer becomes a part of the observed – to merge, blend, intermarry, lose identity in group experience. It means almost everything that we mean by religion, philosophy, and science and it means as little to us as color does to a blind man.

The Martian Race had encountered the people of the fifth planet, grokked them completely, and had taken action; asteroid ruins were all that remained, save that the Martians continued to praise and cherish the people they had destroyed.

All that groks is God.

Etymology

Robert A. Heinlein originally coined the term grok in his 1961 novel Stranger in a Strange Land as a Martian word that could not be defined in Earthling terms, but can be associated with various literal meanings such as "water", "to drink", "life", or "to live", and had a much more profound figurative meaning that is hard for terrestrial culture to understand because of its assumption of a singular reality.

According to the book, drinking water is a central focus on Mars, where it is scarce. Martians use the merging of their bodies with water as a simple example or symbol of how two entities can combine to create a new reality greater than the sum of its parts. The water becomes part of the drinker, and the drinker part of the water. Both grok each other. Things that once had separate realities become entangled in the same experiences, goals, history, and purpose. Within the book, the statement of divine immanence verbalized among the main characters, "thou art God", is logically derived from the concept inherent in the term grok.

Heinlein describes Martian words as "guttural" and "jarring". Martian speech is described as sounding "like a bullfrog fighting a cat". Accordingly, grok is generally pronounced as a guttural gr terminated by a sharp k with very little or no vowel sound (a narrow IPA transcription might be [ɡɹ̩kʰ]). William Tenn suggests Heinlein in creating the word might have been influenced by Tenn's very similar concept of griggo, earlier introduced in Tenn's story "Venus and the Seven Sexes" (published in 1949). In his later afterword to the story, Tenn says Heinlein considered such influence "very possible".

Adoption and modern usage

In computer programmer culture

Uses of the word in the decades after the 1960s are more concentrated in computer culture, such as a 1984 appearance in InfoWorld: "There isn't any software! Only different internal states of hardware. It's all hardware! It's a shame programmers don't grok that better."

The Jargon File, which describes itself as a "Hacker's Dictionary" and has been published under that name three times, puts grok in a programming context:

When you claim to "grok" some knowledge or technique, you are asserting that you have not merely learned it in a detached instrumental way but that it has become part of you, part of your identity. For example, to say that you "know" Lisp is simply to assert that you can code in it if necessary – but to say you "grok" Lisp is to claim that you have deeply entered the world-view and spirit of the language, with the implication that it has transformed your view of programming. Contrast zen, which is a similar supernatural understanding experienced as a single brief flash.

The entry existed in the very earliest forms of the Jargon File, dating from the early 1980s. A typical tech usage from the Linux Bible, 2005 characterizes the Unix software development philosophy as "one that can make your life a lot simpler once you grok the idea".

The book Perl Best Practices defines grok as understanding a portion of computer code in a profound way. It goes on to suggest that to re-grok code is to reload the intricacies of that portion of code into one's memory after some time has passed and all the details of it are no longer remembered. In that sense, to grok means to load everything into memory for immediate use. It is analogous to the way a processor caches memory for short term use, but the only implication by this reference was that it was something a human (or perhaps a Martian) would do.

The main web page for cURL, an open source tool and programming library, describes the function of cURL as "cURL groks URLs".

The book Cyberia covers its use in this subculture extensively:

This is all latter day usage, the original derivation was from an early text processing utility from so long ago that no one remembers but, grok was the output when it understood the file. K&R would remember.

The keystroke logging software used by the NSA for its remote intelligence gathering operations is named GROK.

One of the most powerful parsing filters used in ElasticSearch software's logstash component is named grok.

A reference book by Carey Bunks on the use of the GNU Image Manipulation Program is titled Grokking the GIMP

In counterculture

Tom Wolfe, in his book The Electric Kool-Aid Acid Test (1968), describes a character's thoughts during an acid trip: "He looks down, two bare legs, a torso rising up at him and like he is just noticing them for the first time ... he has never seen any of this flesh before, this stranger. He groks over that ..."

In his counterculture Volkswagen repair manual, How to Keep Your Volkswagen Alive: A Manual of Step-by-Step Procedures for the Compleat Idiot (1969), dropout aerospace engineer John Muir instructs prospective used VW buyers to "grok the car" before buying.

The word was used numerous times by Robert Anton Wilson in his works The Illuminatus! Trilogy and Schrödinger's Cat Trilogy.

The term inspired actress Mayim Bialik's women's lifestyle site, Grok Nation.

 

Neologism

From Wikipedia, the free encyclopedia

A neologism (/nˈɒləɪzəm/; from Greek νέο- néo-, "new" and λόγος lógos, "speech, utterance") is a relatively recent or isolated term, word, or phrase that may be in the process of entering common use, but that has not yet been fully accepted into mainstream language. Neologisms are often driven by changes in culture and technology. In the process of language formation, neologisms are more mature than protologisms. A word whose development stage is between that of the protologism (freshly coined) and neologism (new word) is a prelogism.

Popular examples of neologisms can be found in science, fiction (notably science fiction), films and television, branding, literature, jargon, cant, linguistic and popular culture.

Examples include laser (1960) from Light Amplification by Stimulated Emission of Radiation; robotics (1941) from Czech writer Karel Čapek's play R.U.R. (Rossum's Universal Robots); and agitprop (1930) (a portmanteau of "agitation" and "propaganda").

Background

Neologisms are often formed by combining existing words (see compound noun and adjective) or by giving words new and unique suffixes or prefixes. Neologisms can also be formed by blending words, for example, "brunch" is a blend of the words "breakfast" and "lunch", or through abbreviation or acronym, by intentionally rhyming with existing words or simply through playing with sounds.

Neologisms can become popular through memetics, through mass media, the Internet, and word of mouth, including academic discourse in many fields renowned for their use of distinctive jargon, and often become accepted parts of the language. Other times, they disappear from common use just as readily as they appeared. Whether a neologism continues as part of the language depends on many factors, probably the most important of which is acceptance by the public. It is unusual for a word to gain popularity if it does not clearly resemble other words.

History and meaning

The term neologism is first attested in English in 1772, borrowed from French néologisme (1734), being called the "neologist-in-chief". In an academic sense, there is no professional Neologist, because the study of such things (cultural or ethnic vernacular, for example) is interdisciplinary. Anyone such as a lexicographer or an etymologist might study neologisms, how their uses span the scope of human expression, and how, due to science and technology, they spread more rapidly than ever before in the present times.

The term neologism has a broader meaning which also includes "a word which has gained a new meaning". Sometimes, the latter process is called semantic shifting, or semantic extension. Neologisms are distinct from a person's idiolect, one's unique patterns of vocabulary, grammar, and pronunciation.

Neologisms are usually introduced when it is found that a specific notion is lacking a term, or when the existing vocabulary lacks detail, or when a speaker is unaware of the existing vocabulary. The law, governmental bodies, and technology have a relatively high frequency of acquiring neologisms. Another trigger that motivates the coining of a neologism is to disambiguate a term which may be unclear due to having many meanings.

Literature

Neologisms may come from a word used in the narrative of fiction such as novels and short stories. Examples include "grok" (to intuitively understand) from the science fiction novel about a Martian, entitled Stranger in a Strange Land by Robert A. Heinlein; "McJob" ( precarious, poorly-paid employment) from Generation X: Tales for an Accelerated Culture by Douglas Coupland; "cyberspace" (widespread, interconnected digital technology) from Neuromancer by William Gibson and "quark" (Slavic slang for "rubbish"; German for a type of dairy product) from James Joyce's Finnegans Wake.

The title of a book may become a neologism, for instance, Catch-22 (from the title of Joseph Heller's novel). Alternatively, the author's name may give rise to the neologism, although the term is sometimes based on only one work of that author. This includes such words as "Orwellian" (from George Orwell, referring to his dystopian novel Nineteen Eighty-Four) and "Kafkaesque" (from Franz Kafka), which refers to arbitrary, complex bureaucratic systems.

Names of famous characters are another source of literary neologisms, e.g. quixotic (referring to the romantic and misguided title character in Don Quixote by Miguel de Cervantes), scrooge (from the avaricious main character in Charles Dickens' A Christmas Carol) and pollyanna (from the unfailingly optimistic character in Eleanor H. Porter's book of the same name).

Cant

Polari is a cant used by some actors, circus performers, and the gay subculture to communicate without outsiders understanding. Some Polari terms have crossed over into mainstream slang, in part through their usage in pop song lyrics and other works. Example include: acdc, barney, blag, butch, camp, khazi, cottaging, hoofer, mince, ogle, scarper, slap, strides, tod, [rough] trade (rough trade).

Verlan (French pronunciation: ​[vɛʁlɑ̃]), (verlan is the reverse of the expression "l'envers") is a type of argot in the French language, featuring inversion of syllables in a word, and is common in slang and youth language. It rests on a long French tradition of transposing syllables of individual words to create slang words.[20]:50 Some verlan words, such as meuf ("femme", which means "woman" roughly backwards), have become so commonplace that they have been included in the Petit Larousse. Like any slang, the purpose of verlan is to create a somewhat secret language that only its speakers can understand. Words becoming mainstream is counterproductive. As a result, such newly common words are re-verlanised: reversed a second time. The common meuf became feumeu.

Popular culture

Neologism development may be spurred, or at least spread, by popular culture. Examples of pop-culture neologisms include the American Alt-right (2010s), the Canadian portmanteau "Snowmageddon" (2009), the Russian parody "Monstration" (ca. 2004), Santorum (c. 2003).

Neologisms spread mainly through their exposure in mass media. The genericizing of brand names, such as "coke" for Coca-Cola, "kleenex" for Kleenex facial tissue, and "xerox" for Xerox photocopying, all spread through their popular use being enhanced by mass media.

However, in some limited cases, words break out of their original communities and spread through social media. "Doggo-Lingo", a term still below the threshold of a neologism according to Merriam-Webster, is an example of the latter which has specifically spread primarily through Facebook group and Twitter account use. The suspected origin of this way of referring to dogs stems from a Facebook group founded in 2008 and gaining popularity in 2014 in Australia. In Australian English it is common to use diminutives, often ending in –o, which could be where doggo-lingo was first used. The term has grown so that Merriam-Webster has acknowledged its use but notes the term needs to be found in published, edited work for a longer period of time before it can be deemed a new word, making it the perfect example of a neologism.

Translations

Because neologisms originate in one language, translations between languages can be difficult.

In the scientific community, where English is the predominant language for published research and studies, like-sounding translations (referred to as 'naturalization') are sometimes used. Alternatively, the English word is used along with a brief explanation of meaning. The four translation methods are emphasized in order to translate neologisms: transliteration, transcription, the use of analogues, calque or loan translation.

When translating from English to other languages, the naturalization method is most often used. The most common way that professional translators translate neologisms is through the Think aloud protocol (TAP), wherein translators find the most appropriate and natural sounding word through speech. As such, translators can use potential translations in sentences and test them with different structures and syntax. Correct translations from English for specific purposes into other languages is crucial in various industries and legal systems. Inaccurate translations can lead to 'translation asymmetry' or misunderstandings and miscommunication. Many technical glossaries of English translations exist to combat this issue in the medical, judicial, and technological fields.

Other uses

In psychiatry and neuroscience, the term neologism is used to describe words that have meaning only to the person who uses them, independent of their common meaning. This can be seen in schizophrenia, where a person may replace a word with a nonsensical one of their own invention, e.g. “I got so angry I picked up a dish and threw it at the geshinker.”  The use of neologisms may also be due to aphasia acquired after brain damage resulting from a stroke or head injury.

Operator (computer programming)

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Operator_(computer_programmin...