A Medley of Potpourri

Friday, December 7, 2018

The Library of Babel

From Wikipedia, the free encyclopedia

"The Library of Babel"
English language cover
Author	Jorge Luis Borges
Original title	"La biblioteca de Babel"
Translator	numerous
Country	Argentina
Language	Spanish
Genre(s)	Fantasy
Published in	El Jardín de senderos que se bifurcan
Publisher	Editorial Sur
Publication date	1941
Published in English	1962

"The Library of Babel" (Spanish: La biblioteca de Babel) is a short story by Argentine author and librarian Jorge Luis Borges (1899–1986), conceiving of a universe in the form of a vast library containing all possible 410-page books of a certain format and character set.

The story was originally published in Spanish in Borges' 1941 collection of stories El Jardín de senderos que se bifurcan (The Garden of Forking Paths). That entire book was, in turn, included within his much-reprinted Ficciones (1944). Two English-language translations appeared approximately simultaneously in 1962, one by James E. Irby in a diverse collection of Borges's works titled Labyrinths and the other by Anthony Kerrigan as part of a collaborative translation of the entirety of Ficciones.

Plot summary

Borges' narrator describes how his universe consists of an enormous expanse of adjacent hexagonal rooms, each of which contains the bare necessities for human survival—and four walls of bookshelves. Though the order and content of the books are random and apparently completely meaningless, the inhabitants believe that the books contain every possible ordering of just 25 basic characters (22 letters, the period, the comma, and space). Though the vast majority of the books in this universe are pure gibberish, the library also must contain, somewhere, every coherent book ever written, or that might ever be written, and every possible permutation or slightly erroneous version of every one of those books. The narrator notes that the library must contain all useful information, including predictions of the future, biographies of any person, and translations of every book in all languages. Conversely, for many of the texts, some language could be devised that would make it readable with any of a vast number of different contents.

Despite—indeed, because of—this glut of information, all books are totally useless to the reader, leaving the librarians in a state of suicidal despair. This leads some librarians to superstitious and cult-like behaviors, such as the "Purifiers", who arbitrarily destroy books they deem nonsense as they scour through the library seeking the "Crimson Hexagon" and its illustrated, magical books. Others believe that since all books exist in the library, somewhere one of the books must be a perfect index of the library's contents; some even believe that a messianic figure known as the "Man of the Book" has read it, and they travel through the library seeking him.

Themes

Borges in 1976

The story repeats the theme of Borges' 1939 essay "The Total Library" ("La Biblioteca total"), which in turn acknowledges the earlier development of this theme by Kurd Lasswitz in his 1901 story "The Universal Library" ("Die Universalbibliothek"):

Certain examples that Aristotle attributes to Democritus and Leucippus clearly prefigure it, but its belated inventor is Gustav Theodor Fechner, and its first exponent, Kurd Lasswitz. [...] In his book The Race with the Tortoise (Berlin, 1919), Dr Theodor Wolff suggests that it is a derivation from, or a parody of, Ramón Llull's thinking machine [...T]he elements of his game are the universal orthographic symbols, not the words of a language [...] Lasswitz arrives at twenty-five symbols (twenty-two letters, the space, the period, the comma), whose recombinations and repetitions encompass everything possible to express in all languages. The totality of such variations would form a Total Library of astronomical size. Lasswitz urges mankind to construct that inhuman library, which chance would organize and which would eliminate intelligence. (Wolff's The Race with the Tortoise expounds the execution and the dimensions of that impossible enterprise.)

Many of Borges' signature motifs are featured in the story, including infinity, reality, cabalistic reasoning, and labyrinths. The concept of the library is often compared to Borel's dactylographic monkey theorem. There is no reference to monkeys or typewriters in "The Library of Babel", although Borges had mentioned that analogy in "The Total Library": "[A] half-dozen monkeys provided with typewriters would, in a few eternities, produce all the books in the British Museum." In this story, the closest equivalent is the line, "A blasphemous sect suggested [...] that all men should juggle letters and symbols until they constructed, by an improbable gift of chance, these canonical books."

Borges would examine a similar idea in his 1975 story, "The Book of Sand" in which there is an infinite book (or book with an indefinite number of pages) rather than an infinite library. Moreover, the story's Book of Sand is said to be written in an unknown alphabet and its content is not obviously random. In The Library of Babel, Borges interpolates Italian mathematician Bonaventura Cavalieri's suggestion that any solid body could be conceptualized as the superimposition of an infinite number of planes.

The concept of the library is also overtly analogous to the view of the universe as a sphere having its center everywhere and its circumference nowhere. The mathematician and philosopher Blaise Pascal employed this metaphor, and in an earlier essay Borges noted that Pascal's manuscript called the sphere effroyable, or "frightful".

In any case, a library containing all possible books, arranged at random, might as well be a library containing zero books, as any true information would be buried in, and rendered indistinguishable from, all possible forms of false information; the experience of opening to any page of any of the library's books has been simulated by websites which create screenfuls of random letters.

The quote at the beginning of the story, "By this art you may contemplate the variation of the twenty-three letters," is from Robert Burton's 1621 The Anatomy of Melancholy.

Philosophical implications

There are numerous philosophical implications within the idea of the finite library which exhausts all possibilities. Every book in the library is "intelligible" if one decodes it correctly, simply because it can be decoded from any other book in the library using a third book as a one-time pad. This lends itself to the philosophical idea proposed by Immanuel Kant, that our mind helps to structure our experience of reality; thus the rules of reality (as we know it) are intrinsic to the mind. So if we identify these rules, we can better decode 'reality'. One might speculate that these rules are contained in the crimson hexagon room which is the key to decoding the others. The library becomes a temptation, even an obsession, because it contains these gems of enlightenment while also burying them in deception. On a psychological level, the infinite storehouse of information is a hindrance and a distraction, because it lures one away from writing one's own book (i.e. living one's life). Anything one might write would of course already exist. One can see any text as being pulled from the library by the act of the author defining the search letter by letter until they reach a text close enough to the one they intended to write. The text already existed theoretically, but had to be found by the act of the author's imagination. Another implication is an argument against certain proofs of the existence of God, as it is carried out by David Hume using the thought experiment of a similar library of books generated not by human mind, but by nature.

Infinite extent

In mainstream theories of natural language syntax, every syntactically-valid utterance can be extended to produce a new, longer one, because of recursion. If this process can be continued indefinitely, then there is no upper bound on the length of a well-formed utterance and the number of unique well-formed strings of any language is countably infinite. However, the books in the Library of Babel are of bounded length ("each book is of four hundred and ten pages; each page, of forty lines, each line, of some eighty letters"), so the Library can only contain a finite number of distinct strings, and thus cannot contain all possible well-formed utterances. Borges' narrator notes this fact, but believes that the Library is nevertheless infinite; he speculates that it repeats itself periodically, given an eventual "order" to the "disorder" of the seemingly-random arrangement of books.

Quine's reduction

In a short essay, W. V. O. Quine noted the interesting fact that the Library of Babel is finite (that is, we will theoretically come to a point in history where everything has been written), and that the Library of Babel can be constructed in its entirety simply by writing a dot on one piece of paper and a dash on another. These two sheets of paper could then be alternated at random to produce every possible text, in Morse code or equivalently binary. Writes Quine, "The ultimate absurdity is now staring us in the face: a universal library of two volumes, one containing a single dot and the other a dash. Persistent repetition and alternation of the two are sufficient, we well know, for spelling out any and every truth. The miracle of the finite but universal library is a mere inflation of the miracle of binary notation: everything worth saying, and everything else as well, can be said with two characters."

Comparison with biology

The full possible set of protein sequences (Protein sequence space) has been compared to the Library of Babel. In the Library of Babel, finding any book that made sense was impossible due to the sheer number and lack of order. The same would be true of protein sequences if it were not for natural selection, which has picked out only protein sequences that make sense. Additionally, each protein sequence is surrounded by a set of neighbors (point mutants) that are likely to have at least some function. Daniel Dennett's 1995 book Darwin's Dangerous Idea includes an elaboration of the Library of Babel concept to imagine the set of all possible genetic sequences, which he calls the Library of Mendel, in order to illustrate the mathematics of genetic variation. Dennett uses this concept again later in the book to imagine all possible algorithms that can be included in his Toshiba computer, which he calls the Library of Toshiba. He describes the Library of Mendel and the Library of Toshiba as subsets within the Library of Babel.

Influence on later writers

Umberto Eco's postmodern novel The Name of the Rose (1980) features a labyrinthine library, presided over by a blind monk named Jorge of Burgos.
In "The Net of Babel", published in Interzone in 1995, David Langford imagines the Library becoming computerized for easy access. This aids the librarians in searching for specific text while also highlighting the futility of such searches as they can find anything, but nothing of meaning as such. The sequel continues many of Borges's themes, while also highlighting the difference between data and information, and satirizing the Internet.
Russell Standish's Theory of Nothing uses the concept of the Library of Babel to illustrate how an ultimate ensemble containing all possible descriptions would in sum contain zero information and would thus be the simplest possible explanation for the existence of the universe. This theory, therefore, implies the reality of all universes.
Michael Ende reused the idea of a universe of hexagonal rooms in the Temple of a Thousand Doors from The Neverending Story, which contained all the possible characteristics of doors in the fantastic realm. A later chapter features the infinite monkey theorem.
Terry Pratchett uses the concept of the infinite library in his Discworld novels. The knowledgeable librarian is a human wizard transformed into an orangutan.
The Unimaginable Mathematics of Borges' Library of Babel (2008) by William Goldbloom Bloch explores the short story from a mathematical perspective. Bloch analyzes the hypothetical library presented by Borges using the ideas of topology, information theory, and geometry.
In Greg Bear's novel City at the End of Time (2008), the sum-runners carried by the protagonists are intended by their creator to be combined to form a 'Babel', an infinite library containing every possible permutation of every possible character in every possible language. Bear has stated that this was inspired by Borges, who is also namechecked in the novel. Borges is described as an unknown Argentinian who commissioned an encyclopedia of impossible things, a reference to either "Tlön, Uqbar, Orbis Tertius" or the Book of Imaginary Beings.
Fone, a short comic novel drawn by Milo Manara, features a human astronaut and his alien partner stranded on a planet named Borges Profeta. The planet is overflowed by books containing all the possible permutations of letters.
Steven L. Peck wrote a novella entitled A Short Stay in Hell (2012) in which the protagonist must find the book containing his life story in an afterlife replica of Borges' Library of Babel.
The third season of Carmilla, a Canadian single-frame web series based on the novella by J. Sheridan Le Fanu, is set in a mystical library described as "non-Euclidean" and omnipotent. It contains a door that, depending on the knocking pattern on its panels, can be opened into any universe. It also creates a temporary parallel universe and is able to shift a character between the parallel and the original. As the parallel universe collapses, darkness falls, and a character perishes in the void after uttering the words, "O time thy pyramids," which are contained on the second-to-last page of a book in the Library of Babel.
In Christopher Nolan's film Interstellar, the protagonist, Cooper, played by Matthew McConaughey, becomes trapped in a world which mirrors that of Borges' i.e. Cooper's universe consists of an enormous expanse of adjacent hexagonal rooms, or libraries, each of which contains the bare necessities for survival. Though the order and content of the books and rooms are random and apparently completely meaningless, Cooper can, by manipulating the books, affect change in the "real" world and is, as such, analogous to the "Man of the Book", the messianic figure in The Library of Babel. Unlike the Man of the Book, however, Cooper is something more than just a metaphor and has a transformative role in his Universe, becoming a catalyst and an agent of change.
Jonathan Basile enterprised to recreate the Library in Borges' story on his website http://libraryofbabel.info, adapted to the English language. An algorithm he created generates a 'book' by iterating every permutation of 29 characters: the 26 English letters, space, comma, and period. Each book is marked by a coordinate, corresponding to its place on the hexagonal library (hexagon name, wall number, shelf number, and book name) so that every book can be found at the same place every time. The website is said to contain "all possible pages of 3200 characters, about 10⁴⁶⁷⁷ books".

Design of experiments

From Wikipedia, the free encyclopedia

Design of experiments with full factorial design (left), response surface with second-degree polynomial (right)

The design of experiments (DOE, DOX, or experimental design) is the design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

In its simplest form, an experiment aims at predicting the outcome by introducing a change of the preconditions, which is represented by one or more independent variables, also referred to as "input variables" or "predictor variables." The change in one or more independent variables is generally hypothesized to result in a change in one or more dependent variables, also referred to as "output variables" or "response variables." The experimental design may also identify control variables that must be held constant to prevent external factors from affecting the results. Experimental design involves not only the selection of suitable independent, dependent, and control variables, but planning the delivery of the experiment under statistically optimal conditions given the constraints of available resources. There are multiple approaches for determining the set of design points (unique combinations of the settings of the independent variables) to be used in the experiment.

Main concerns in experimental design include the establishment of validity, reliability, and replicability. For example, these concerns can be partially addressed by carefully choosing the independent variable, reducing the risk of measurement error, and ensuring that the documentation of the method is sufficiently detailed. Related concerns include achieving appropriate levels of statistical power and sensitivity.

Correctly designed experiments advance knowledge in the natural and social sciences and engineering. Other applications include marketing and policy making.

History

Systematic clinical trials

In 1747, while serving as surgeon on HMS Salisbury, James Lind carried out a systematic clinical trial to compare remedies for scurvy. This systematic clinical trial constitutes a type of DOE.

Lind selected 12 men from the ship, all suffering from scurvy. Lind limited his subjects to men who "were as similar as I could have them," that is, he provided strict entry requirements to reduce extraneous variation. He divided them into six pairs, giving each pair different supplements to their basic diet for two weeks. The treatments were all remedies that had been proposed:

A quart of cider every day.
Twenty five gutts (drops) of vitriol (sulphuric acid) three times a day upon an empty stomach.
One half-pint of seawater every day.
A mixture of garlic, mustard, and horseradish in a lump the size of a nutmeg.
Two spoonfuls of vinegar three times a day.
Two oranges and one lemon every day.

The citrus treatment stopped after six days when they ran out of fruit, but by that time one sailor was fit for duty while the other had almost recovered. Apart from that, only group one (cider) showed some effect of its treatment. The remainder of the crew presumably served as a control, but Lind did not report results from any control (untreated) group.

Statistical experiments, following Charles S. Peirce

A theory of statistical inference was developed by Charles S. Peirce in "Illustrations of the Logic of Science" (1877–1878) and "A Theory of Probable Inference" (1883), two publications that emphasized the importance of randomization-based inference in statistics.

Randomized experiments

Charles S. Peirce randomly assigned volunteers to a blinded, repeated-measures design to evaluate their ability to discriminate weights. Peirce's experiment inspired other researchers in psychology and education, which developed a research tradition of randomized experiments in laboratories and specialized textbooks in the 1800s.

Optimal designs for regression models

Charles S. Peirce also contributed the first English-language publication on an optimal design for regression models in 1876. A pioneering optimal design for polynomial regression was suggested by Gergonne in 1815. In 1918, Kirstine Smith published optimal designs for polynomials of degree six (and less).

Sequences of experiments

The use of a sequence of experiments, where the design of each may depend on the results of previous experiments, including the possible decision to stop experimenting, is within the scope of Sequential analysis, a field that was pioneered by Abraham Wald in the context of sequential tests of statistical hypotheses. Herman Chernoff wrote an overview of optimal sequential designs, while adaptive designs have been surveyed by S. Zacks. One specific type of sequential design is the "two-armed bandit", generalized to the multi-armed bandit, on which early work was done by Herbert Robbins in 1952.

Fisher's principles

A methodology for designing experiments was proposed by Ronald Fisher, in his innovative books: The Arrangement of Field Experiments (1926) and The Design of Experiments (1935). Much of his pioneering work dealt with agricultural applications of statistical methods. As a mundane example, he described how to test the lady tasting tea hypothesis, that a certain lady could distinguish by flavour alone whether the milk or the tea was first placed in the cup. These methods have been broadly adapted in the physical and social sciences, are still used in agricultural engineering and differ from the design and analysis of computer experiments.

Comparison: In some fields of study it is not possible to have independent measurements to a traceable metrology standard. Comparisons between treatments are much more valuable and are usually preferable, and often compared against a scientific control or traditional treatment that acts as baseline.

Randomization: Random assignment is the process of assigning individuals at random to groups or to different groups in an experiment, so that each individual of the population has the same chance of becoming a participant in the study. The random assignment of individuals to groups (or conditions within a group) distinguishes a rigorous, "true" experiment from an observational study or "quasi-experiment". There is an extensive body of mathematical theory that explores the consequences of making the allocation of units to treatments by means of some random mechanism (such as tables of random numbers, or the use of randomization devices such as playing cards or dice). Assigning units to treatments at random tends to mitigate confounding, which makes effects due to factors other than the treatment to appear to result from the treatment.

The risks associated with random allocation (such as having a serious imbalance in a key characteristic between a treatment group and a control group) are calculable and hence can be managed down to an acceptable level by using enough experimental units. However, if the population is divided into several subpopulations that somehow differ, and the research requires each subpopulation to be equal in size, stratified sampling can be used. In that way, the units in each subpopulation are randomized, but not the whole sample. The results of an experiment can be generalized reliably from the experimental units to a larger statistical population of units only if the experimental units are a random sample from the larger population; the probable error of such an extrapolation depends on the sample size, among other things.

Statistical replication: Measurements are usually subject to variation and measurement uncertainty; thus they are repeated and full experiments are replicated to help identify the sources of variation, to better estimate the true effects of treatments, to further strengthen the experiment's reliability and validity, and to add to the existing knowledge of the topic. However, certain conditions must be met before the replication of the experiment is commenced: the original research question has been published in a peer-reviewed journal or widely cited, the researcher is independent of the original experiment, the researcher must first try to replicate the original findings using the original data, and the write-up should state that the study conducted is a replication study that tried to follow the original study as strictly as possible.

Blocking: Blocking is the non-random arrangement of experimental units into groups (blocks/lots) consisting of units that are similar to one another. Blocking reduces known but irrelevant sources of variation between units and thus allows greater precision in the estimation of the source of variation under study.

Orthogonality

Example of orthogonal factorial design

Orthogonality concerns the forms of comparison (contrasts) that can be legitimately and efficiently carried out. Contrasts can be represented by vectors and sets of orthogonal contrasts are uncorrelated and independently distributed if the data are normal. Because of this independence, each orthogonal treatment provides different information to the others. If there are T treatments and T – 1 orthogonal contrasts, all the information that can be captured from the experiment is obtainable from the set of contrasts.

Factorial experiments: Use of factorial experiments instead of the one-factor-at-a-time method. These are efficient at evaluating the effects and possible interactions of several factors (independent variables). Analysis of experiment design is built on the foundation of the analysis of variance, a collection of models that partition the observed variance into components, according to what factors the experiment must estimate or test.

Example

This example is attributed to Harold Hotelling. It conveys some of the flavor of those aspects of the subject that involve combinatorial designs.

Weights of eight objects are measured using a pan balance and set of standard weights. Each weighing measures the weight difference between objects in the left pan vs. any objects in the right pan by adding calibrated weights to the lighter pan until the balance is in equilibrium. Each measurement has a random error. The average error is zero; the standard deviations of the probability distribution of the errors is the same number σ on different weighings; errors on different weighings are independent. Denote the true weights by

\theta _{1},\dots ,\theta _{8}.\,

We consider two different experiments:

Weigh each object in one pan, with the other pan empty. Let X_i be the measured weight of the object, for i = 1, ..., 8.
Do the eight weighings according to the following schedule and let Y_i be the measured difference for i = 1, ..., 8:

{\displaystyle {\begin{array}{lcc}&{\text{left pan}}&{\text{right pan}}\\\hline {\text{1st weighing:}}&1\ 2\ 3\ 4\ 5\ 6\ 7\ 8&{\text{(empty)}}\\{\text{2nd:}}&1\ 2\ 3\ 8\ &4\ 5\ 6\ 7\\{\text{3rd:}}&1\ 4\ 5\ 8\ &2\ 3\ 6\ 7\\{\text{4th:}}&1\ 6\ 7\ 8\ &2\ 3\ 4\ 5\\{\text{5th:}}&2\ 4\ 6\ 8\ &1\ 3\ 5\ 7\\{\text{6th:}}&2\ 5\ 7\ 8\ &1\ 3\ 4\ 6\\{\text{7th:}}&3\ 4\ 7\ 8\ &1\ 2\ 5\ 6\\{\text{8th:}}&3\ 5\ 6\ 8\ &1\ 2\ 4\ 7\end{array}}}

Then the estimated value of the weight θ₁ is

{\widehat {\theta }}_{1}={\frac {Y_{1}+Y_{2}+Y_{3}+Y_{4}-Y_{5}-Y_{6}-Y_{7}-Y_{8}}{8}}.

Similar estimates can be found for the weights of the other items. For example

{\displaystyle {\begin{aligned}{\widehat {\theta }}_{2}&={\frac {Y_{1}+Y_{2}-Y_{3}-Y_{4}+Y_{5}+Y_{6}-Y_{7}-Y_{8}}{8}}.\\[5pt]{\widehat {\theta }}_{3}&={\frac {Y_{1}+Y_{2}-Y_{3}-Y_{4}-Y_{5}-Y_{6}+Y_{7}+Y_{8}}{8}}.\\[5pt]{\widehat {\theta }}_{4}&={\frac {Y_{1}-Y_{2}+Y_{3}-Y_{4}+Y_{5}-Y_{6}+Y_{7}-Y_{8}}{8}}.\\[5pt]{\widehat {\theta }}_{5}&={\frac {Y_{1}-Y_{2}+Y_{3}-Y_{4}-Y_{5}+Y_{6}-Y_{7}+Y_{8}}{8}}.\\[5pt]{\widehat {\theta }}_{6}&={\frac {Y_{1}-Y_{2}-Y_{3}+Y_{4}+Y_{5}-Y_{6}-Y_{7}+Y_{8}}{8}}.\\[5pt]{\widehat {\theta }}_{7}&={\frac {Y_{1}-Y_{2}-Y_{3}+Y_{4}-Y_{5}+Y_{6}+Y_{7}-Y_{8}}{8}}.\\[5pt]{\widehat {\theta }}_{8}&={\frac {Y_{1}+Y_{2}+Y_{3}+Y_{4}+Y_{5}+Y_{6}+Y_{7}+Y_{8}}{8}}.\end{aligned}}}

The question of design of experiments is: which experiment is better?

The variance of the estimate X₁ of θ₁ is σ² if we use the first experiment. But if we use the second experiment, the variance of the estimate given above is σ²/8. Thus the second experiment gives us 8 times as much precision for the estimate of a single item, and estimates all items simultaneously, with the same precision. What the second experiment achieves with eight would require 64 weighings if the items are weighed separately. However, note that the estimates for the items obtained in the second experiment have errors that correlate with each other.

Many problems of the design of experiments involve combinatorial designs, as in this example and others.

Avoiding false positives

False positive conclusions, often resulting from the pressure to publish or the author's own confirmation bias, are an inherent hazard in many fields. A good way to prevent biases potentially leading to false positives in the data collection phase is to use a double-blind design. When a double-blind design is used, participants are randomly assigned to experimental groups but the researcher is unaware of what participants belong to which group. Therefore, the researcher can not affect the participants' response to the intervention. Experimental designs with undisclosed degrees of freedom are a problem. This can lead to conscious or unconscious "p-hacking": trying multiple things until you get the desired result. It typically involves the manipulation - perhaps unconsciously - of the process of statistical analysis and the degrees of freedom until they return a figure below the p less than 0.05 level of statistical significance. So the design of the experiment should include a clear statement proposing the analyses to be undertaken. P-hacking can be prevented by preregistering researches, in which researchers have to send their data analysis plan to the journal they wish to publish their paper in before they even start their data collection, so no data manipulation is possible (https://osf.io). Another way to prevent this is taking the double-blind design to the data-analysis phase, where the data are sent to a data-analyst unrelated to the research who scrambles up the data so there is no way to know which participants belong to before they are potentially taken away as outliers.

Clear and complete documentation of the experimental methodology is also important in order to support replication of results.

Discussion topics when setting up an experimental design

An experimental design or randomized clinical trial requires careful consideration of several factors before actually doing the experiment. An experimental design is the laying out of a detailed experimental plan in advance of doing the experiment. Some of the following topics have already been discussed in the principles of experimental design section:

How many factors does the design have, and are the levels of these factors fixed or random?
Are control conditions needed, and what should they be?
Manipulation checks; did the manipulation really work?
What are the background variables?
What is the sample size. How many units must be collected for the experiment to be generalisable and have enough power?
What is the relevance of interactions between factors?
What is the influence of delayed effects of substantive factors on outcomes?
How do response shifts affect self-report measures?
How feasible is repeated administration of the same measurement instruments to the same units at different occasions, with a post-test and follow-up tests?
What about using a proxy pretest?
Are there lurking variables?
Should the client/patient, researcher or even the analyst of the data be blind to conditions?
What is the feasibility of subsequent application of different conditions to the same units?
How many of each control and noise factors should be taken into account?

The independent variable of a study often has many levels or different groups. In a true experiment, researchers can have an experimental group, which is where their intervention testing the hypothesis is implemented, and a control group, which has all the same element as the experimental group, without the interventional element. Thus, when everything else except for one intervention is held constant, researchers can certify with some certainty that this one element is what caused the observed change. In some instances, having a control group is not ethical. This is sometimes solved using two different experimental groups. In some cases, independent variables cannot be manipulated, for example when testing the difference between two groups who have a different disease, or testing the difference between genders (obviously variables that would be hard or unethical to assign participants to). In these cases, a quasi-experimental design may be used.

Causal attributions

In the pure experimental design, the independent (predictor) variable is manipulated by the researcher - that is - every participant of the research is chosen randomly from the population, and each participant chosen is assigned randomly to conditions of the independent variable. Only when this is done is it possible to certify with high probability that the reason for the differences in the outcome variables are caused by the different conditions. Therefore, researchers should choose the experimental design over other design types whenever possible. However, the nature of the independent variable does not always allow for manipulation. In those cases, researchers must be aware of not certifying about causal attribution when their design doesn't allow for it. For example, in observational designs, participants are not assigned randomly to conditions, and so if there are differences found in outcome variables between conditions, it is likely that there is something other than the differences between the conditions that causes the differences in outcomes, that is - a third variable. The same goes for studies with correlational design. (Adér & Mellenbergh, 2008).

Statistical control

It is best that a process be in reasonable statistical control prior to conducting designed experiments. When this is not possible, proper blocking, replication, and randomization allow for the careful conduct of designed experiments. To control for nuisance variables, researchers institute control checks as additional measures. Investigators should ensure that uncontrolled influences (e.g., source credibility perception) do not skew the findings of the study. A manipulation check is one example of a control check. Manipulation checks allow investigators to isolate the chief variables to strengthen support that these variables are operating as planned.

One of the most important requirements of experimental research designs is the necessity of eliminating the effects of spurious, intervening, and antecedent variables. In the most basic model, cause (X) leads to effect (Y). But there could be a third variable (Z) that influences (Y), and X might not be the true cause at all. Z is said to be a spurious variable and must be controlled for. The same is true for intervening variables (a variable in between the supposed cause (X) and the effect (Y)), and anteceding variables (a variable prior to the supposed cause (X) that is the true cause). When a third variable is involved and has not been controlled for, the relation is said to be a zero order relationship. In most practical applications of experimental research designs there are several causes (X1, X2, X3). In most designs, only one of these causes is manipulated at a time.

Experimental designs after Fisher

Some efficient designs for estimating several main effects were found independently and in near succession by Raj Chandra Bose and K. Kishen in 1940 at the Indian Statistical Institute, but remained little known until the Plackett–Burman designs were published in Biometrika in 1946. About the same time, C. R. Rao introduced the concepts of orthogonal arrays as experimental designs. This concept played a central role in the development of Taguchi methods by Genichi Taguchi, which took place during his visit to Indian Statistical Institute in early 1950s. His methods were successfully applied and adopted by Japanese and Indian industries and subsequently were also embraced by US industry albeit with some reservations.

In 1950, Gertrude Mary Cox and William Gemmell Cochran published the book Experimental Designs, which became the major reference work on the design of experiments for statisticians for years afterwards.

Developments of the theory of linear models have encompassed and surpassed the cases that concerned early writers. Today, the theory rests on advanced topics in linear algebra, algebra and combinatorics.

As with other branches of statistics, experimental design is pursued using both frequentist and Bayesian approaches: In evaluating statistical procedures like experimental designs, frequentist statistics studies the sampling distribution while Bayesian statistics updates a probability distribution on the parameter space.

Some important contributors to the field of experimental designs are C. S. Peirce, R. A. Fisher, F. Yates, C. R. Rao, R. C. Bose, J. N. Srivastava, Shrikhande S. S., D. Raghavarao, W. G. Cochran, O. Kempthorne, W. T. Federer, V. V. Fedorov, A. S. Hedayat, J. A. Nelder, R. A. Bailey, J. Kiefer, W. J. Studden, A. Pázman, F. Pukelsheim, D. R. Cox, H. P. Wynn, A. C. Atkinson, G. E. P. Box and G. Taguchi. The textbooks of D. Montgomery, R. Myers, and G. Box/W. Hunter/J.S. Hunter have reached generations of students and practitioners.

Some discussion of experimental design in the context of system identification (model building for static or dynamic models) is given in and.

Human participant constraints

Laws and ethical considerations preclude some carefully designed experiments with human subjects. Legal constraints are dependent on jurisdiction. Constraints may involve institutional review boards, informed consent and confidentiality affecting both clinical (medical) trials and behavioral and social science experiments. In the field of toxicology, for example, experimentation is performed on laboratory animals with the goal of defining safe exposure limits for humans. Balancing the constraints are views from the medical field. Regarding the randomization of patients, "... if no one knows which therapy is better, there is no ethical imperative to use one therapy or another." (p 380) Regarding experimental design, "...it is clearly not ethical to place subjects at risk to collect data in a poorly designed study when this situation can be easily avoided...". (p 393)