A Medley of Potpourri: Feb 19, 2024

Monday, February 19, 2024

World Brain

From Wikipedia, the free encyclopedia

World Brain is a collection of essays and addresses by the English science fiction pioneer, social reformer, evolutionary biologist and historian H. G. Wells, dating from the period of 1936–1938. Throughout the book, Wells describes his vision of the World Brain: a new, free, synthetic, authoritative, permanent "World Encyclopaedia" that could help world citizens make the best use of universal information resources and make the best contribution to world peace.

Background

Plans for creating a global knowledge network long predate Wells. Andrew Michael Ramsay described, c. 1737, an objective of freemasonry as follows:

... to furnish the materials for a Universal Dictionary ... By this means the lights of all nations will be united in one single work, which will be a universal library of all that is beautiful, great, luminous, solid, and useful in all the sciences and in all noble arts. This work will augment in each century, according to the increase of knowledge.

The Encyclopedist movement in France in the mid-eighteenth century was a major attempt to actualize this philosophy. However, efforts to encompass all knowledge came to seem less possible as the available corpus expanded exponentially.

In 1926, extending the analogy between global telegraphy and the nervous system, Nikola Tesla speculated that:

When wireless is perfectly applied the whole earth will be converted into a huge brain … Not only this, but through television and telephony we shall see and hear one another as perfectly as though we were face to face, despite intervening distances of thousands of miles; and the instruments through which we shall be able to do this will be amazingly simple compared with our present telephone. A man will be able to carry one in his vest pocket.

Paul Otlet, a contemporary of Wells and information science pioneer, revived this movement in the twentieth century. Otlet wrote in 1935, "Man would no longer need documentation if he were assimilated into a being that has become omniscient, in the manner of God himself." Otlet, like Wells, supported the internationalist efforts of the League of Nations and its International Institute of Intellectual Cooperation.

For his part, Wells had advocated world government for at least a decade, arguing in such books as The Open Conspiracy for control of education by a scientific elite.

Synopsis

In the wake of the first World War, Wells believed that people needed to become more educated and conversant with events and knowledge that surrounded them. In order to do this he offered the idea of the knowledge system of the World Brain that all humans could access.

World Encyclopedia

This section, Wells's first expression of his dream of a World Brain, was delivered as a lecture at the Royal Institution of Great Britain, Weekly Evening Meeting, Friday, 20 November 1936.

Wells begins the lecture with a statement on his preference for cohesive worldviews rather than isolated facts. Correspondingly, he wishes the world to be such a whole "as coherent and consistent as possible". He mentions The Work, Wealth and Happiness of Mankind (1931), one of his own attempts at providing intellectual synthesis, and calls it disappointingly unmatched.

He expresses dismay at the ignorance of social science among the Treaty of Versailles and League of Nations framers. He mentions some recent works on the role of science in society and states his main problem as follows:

We want the intellectual worker to become a more definitely organised factor in the human scheme. How is that factor to be organised? Is there any way of implementing knowledge for ready and universal effect?

In answer he introduces the doctrine of New Encyclopaedism as a framework for integrating intellectuals into an organic whole. For the ordinary man, who will necessarily be an educated citizen in the modern state:

From his point of view the World Encyclopaedia would be a row of volumes in his own home or in some neighbouring house or in a convenient public library or in any school or college, and in this row of volumes he would, without any great toil or difficulty, find in clear understandable language, and kept up to date, the ruling concepts of our social order, the outlines and main particulars in all fields of knowledge, an exact and reasonably detailed picture of our universe, a general history of the world, and if by any chance he wanted to pursue a question into its ultimate detail, a trustworthy and complete system of reference to primary sources of knowledge. In fields where wide varieties of method and opinion existed, he would find, not casual summaries of opinions, but very carefully chosen and correlated statements and arguments. [...] This World Encyclopaedia would be the mental background of every intelligent man in the world. It would be alive and growing and changing continually under revision, extension and replacement from the original thinkers in the world everywhere. Every university and research institution should be feeding it. Every fresh mind should be brought into contact with its standing editorial organisation. And on the other hand its contents would be the standard source of material for the instructional side of school and college work, for the verification of facts and the testing of statements—everywhere in the world. Even journalists would deign to use it; even newspaper proprietors might be made to respect it.

Such an encyclopedia would be akin to a secular bible. Universal acceptance would be possible due to the underlying similarity of human brains. For specialists and intellectuals, the World Encyclopedia will provide valuable coordination with other intellectuals working in similar areas.

Wells calls for the formation of an Encyclopaedia Society to promote the project and defend it from exploitation (e.g. by an "enterprising publisher" trying to profit from it). This society would also organize departments for production. Of course, the existence of a society has its own risks:

And there will be a constant danger that some of the early promoters may feel and attempt to realise a sort of proprietorship in the organisation, to make a group or a gang of it. But to recognise that danger is half-way to averting it.

The language of the World Encyclopedia would be English because of its greater range, precision, and subtlety.

Intellectual workers across the world would be increasingly bound together through their participation.

Wells wishes that wise world citizens would ensure world peace. He suggests that a world intellectual project will have more positive impact to this end than will any political movement such as communism, fascism, imperialism, pacifism, etc.

He ended his lecture as follows:

[W]hat I am saying ... is this, that without a World Encyclopaedia to hold men's minds together in something like a common interpretation of reality, there is no hope whatever of anything but an accidental and transitory alleviation of any of our world troubles.

The Brain Organization of the Modern World

This section was first delivered as a lecture in America, October and November 1937.

This lecture promotes the doctrine New Encyclopedism described previously. Wells begins with the observation that the world has become a single interconnected community due to the enormously increased speed of telecommunications. Secondly, he says that energy is available on a new scale, enabling, among other things, the capability for mass destruction. Consequently, the establishment of a new world order is imperative:

One needs an exceptional stupidity even to question the urgency we are under to establish some effective World Pax, before gathering disaster overwhelms us. The problem of reshaping human affairs on a world-scale, this World problem, is drawing together an ever-increasing multitude of minds.

Neither Christianity nor socialism can solve the World Problem. The solution is a modernized "World Knowledge Apparatus"—the World Encyclopedia—"a sort of mental clearing house for the mind, a depot where knowledge and ideas are received, sorted, summarized, digested, clarified and compared".^[1]^: 49 Wells thought that technological advances such as microfilm could be used towards this end so that "any student, in any part of the world, will be able to sit with his projector in his own study at his or her convenience to examine any book, any document, in an exact replica".

In this lecture Wells develops the analogy of the encyclopedia to a brain, saying, "it would be a clearing house for universities and research institutions; it would play the role of a cerebral cortex to these essential ganglia".

He mentions the International Committee on Intellectual Cooperation, an advisory branch of the League of Nations, and the 1937 World Congress of Universal Documentation as contemporary forerunners of the world brain.

A Permanent World Encyclopedia

This section was first published in Harper's Magazine, April 1937, and contributed to the new Encyclopédie française, August 1937.

In this essay, Wells explains how current encyclopaedias have failed to adapt to both the growing increase in recorded knowledge and the expansion of people requiring information that was accurate and readily accessible. He asserts that these 19th-century encyclopaedias continue to follow the 18th-century pattern, organisation and scale. "Our contemporary encyclopedias are still in the coach-and-horse phase of development," he argued, "rather than in the phase of the automobile and the aeroplane."

Wells saw the potential for world-altering impacts this technology could bring. He felt that the creation of the encyclopaedia could bring about the peaceful days of the past, "with a common understanding and the conception of a common purpose, and of a commonwealth such as now we hardly dream of".

Wells anticipated the effect and contribution that his World Brain would have on the university system as well. He wanted to see universities contributing to it, helping it grow, and feeding its search for holistic information. "Every university and research institution should be feeding it" (p. 14). Elsewhere Wells wrote: "It would become the logical nucleus of the world's research universities and post-graduate studies." He suggested that the organization he was proposing "would outgrow in scale and influence alike any single university that exists, and it would inevitably take the place of the loose-knit university system of the world in the concentration of research and thought and the direction of the general education of mankind". In fact the new encyclopedism he was advocating was "the only possible method I can imagine, of bringing the universities and research institutions around the world into effective cooperation and creating an intellectual authority sufficient to control and direct collective life". Ultimately the World Encyclopaedia would be "a permanent institution, a mighty super-university, holding together, utilizing and dominating all of the teaching and research organizations at present in existence".

Speech to the Congrès Mondial De La Documentation Universelle

This section provides a brief excerpt of Wells's speech at the World Congress of Universal Documentation, 20 August 1937. He tells the participants directly that they are participating in the creation of a world brain. He says:

I am speaking of a process of mental organisation throughout the world which I believe to be as inevitable as anything can be in human affairs. The world has to pull its mind together, and this is the beginning of its effort. The world is a Phoenix. It perishes in flames and even as it dies it is born again. This synthesis of knowledge is the necessary beginning to the new world.

The Informative Content of Education

This section was delivered as the Presidential Address to the Educational Science Section of the British Association for the Advancement of Science, 2 September 1937.

Wells expresses his dismay at the general state of public ignorance, even among the educated, and suggest that the Educational Science Section focus on the bigger picture:

For this year I suggest we give the questions of drill, skills, art, music, the teaching of languages, mathematics and other symbols, physical, aesthetic, moral and religious training and development, a rest, and that we concentrate on the inquiry: What are we telling young people directly about the world in which they are to live?

He asks how the "irreducible minimum of knowledge" can be imparted to all people within ten years of education—realistically, he says, amounting to 2400 hours of classroom instruction. He suggests minimizing the teaching of names and dates in British history and focusing instead on newly available information about prehistory, early civilisation (without the traditionally heavy emphasis on Palestine and the Israelites), and the broad contours of world history. He suggests better education in geography, with an inventory of the world's natural resources, and a better curriculum in money and economics. He calls for a "modernised type of teacher", better paid, with better equipment, and continually updated training.

Influence

1930s: World Congress of Universal Documentation

One of the stated goals of this Congress, held in Paris, France, in 1937, was to discuss ideas and methods for implementing Wells's ideas of the World Brain. Wells himself gave a lecture at the Congress.

Reginald Arthur Smith extended Wells's ideas in the book A Living Encyclopædia: A Contribution to Mr. Wells's New Encyclopædism (London: Andrew Dakers Ltd., 1941).

1960s: The World Brain as a supercomputer

From World Library to World Brain

In his 1962 book Profiles of the Future, Arthur C. Clarke predicted that the construction of what H. G. Wells called the World Brain would take place in two stages. He identified the first stage as the construction of the World Library, which is basically Wells's concept of a universal encyclopaedia accessible to everyone from their home on computer terminals. He predicted this phase would be established (at least in the developed countries) by the year 2000. The second stage, the World Brain, would be a superintelligent artificially intelligent supercomputer that humans would be able to mutually interact with to solve various world problems. The "World Library" would be incorporated into the "World Brain" as a subsection of it. He suggested that this supercomputer should be installed in the former war rooms of the United States and the Soviet Union once the superpowers had matured enough to agree to co-operate rather than conflict with each other. Clarke predicted the construction of the "World Brain" would be completed by the year 2100.

In 1964, Eugene Garfield published an article in the journal Science introducing the Science Citation Index; the article's first sentence invoked Wells's "magnificent, if premature, plea for the establishment of a world information center", and Garfield predicted that the Science Citation Index "is a harbinger of things to come—a forerunner of the World Brain".

1990s: World Wide Web of documents

World Wide Web as a World Brain

Brian R. Gaines in his 1996 paper "Convergence to the Information Highway" saw the World Wide Web as an extension of Wells's "World Brain" that individuals can access using personal computers. In papers published in 1996 and 1997 (that did not cite Wells), Francis Heylighen and Ben Goertzel envisaged the further development of the World Wide Web into a global brain, i.e. an intelligent network of people and computers at the planetary level. The difference between "global brain" and "world brain" is that the latter, as envisaged by Wells, is centrally controlled, while the former is fully decentralised and self-organizing.

In 2001, Doug Schuler, a professor at Evergreen State University, proposed a worldwide civic intelligence network as the fulfillment of Wells's world brain. As examples he cited Sustainable Seattle and the "Technology Healthy City" project in Seattle.

Wikipedia as a World Brain

A number of commentators have suggested that Wikipedia represents the World Brain as described by Wells. Joseph Reagle has compared Wells's warning about the need to defend the World Encyclopedia from propaganda with Wikipedia's "Neutral Point of View" norm:

In keeping with the universal vision, and anticipating a key Wikipedia norm, H. G. Wells was concerned that his World Brain be an "encyclopedia appealing to all mankind," and therefore it must remain open to corrective criticism, be skeptical of myths (no matter how "venerated") and guard against "narrowing propaganda." This strikes me as similar to the pluralism inherent in the Wikipedia "Neutral Point of View" goal of "representing significant views fairly, proportionately, and without bias."

Infinite monkey theorem

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Infinite_monkey_theorem

The infinite monkey theorem states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, including the complete works of William Shakespeare. In fact, the monkey would almost surely type every possible finite text an infinite number of times. The theorem can be generalized to state that any sequence of events that has a non-zero probability of happening will almost certainly occur an infinite number of times, given an infinite amount of time or a Universe that is infinite in size.

In this context, "almost surely" is a mathematical term meaning the event happens with probability 1, and the "monkey" is not an actual monkey, but a metaphor for an abstract device that produces an endless random sequence of letters and symbols. Variants of the theorem include multiple and even infinitely many typists, and the target text varies between an entire library and a single sentence.

One of the earliest instances of the use of the "monkey metaphor" is that of French mathematician Émile Borel in 1913, but the first instance may have been even earlier. Jorge Luis Borges traced the history of this idea from Aristotle's On Generation and Corruption and Cicero's De Natura Deorum (On the Nature of the Gods), through Blaise Pascal and Jonathan Swift, up to modern statements with their iconic simians and typewriters. In the early 20th century, Borel and Arthur Eddington used the theorem to illustrate the timescales implicit in the foundations of statistical mechanics.

Solution

Direct proof

There is a straightforward proof of this theorem. As an introduction, recall that if two events are statistically independent, then the probability of both happening equals the product of the probabilities of each one happening independently. For example, if the chance of rain in Moscow on a particular day in the future is 0.4 and the chance of an earthquake in San Francisco on any particular day is 0.00003, then the chance of both happening on the same day is 0.4 × 0.00003 = 0.000012, assuming that they are indeed independent.

Consider the probability of typing the word banana on a typewriter with 50 keys. Suppose that the keys are pressed randomly and independently, meaning that each key has an equal chance of being pressed regardless of what keys had been pressed previously. The chance that the first letter typed is 'b' is 1/50, and the chance that the second letter typed is 'a' is also 1/50, and so on. Therefore, the probability of the first six letters spelling banana is

(1/50) × (1/50) × (1/50) × (1/50) × (1/50) × (1/50) = (1/50)⁶ = 1/15,625,000,000.

The result is less than one in 15 billion, but not zero.

From the above, the chance of not typing banana in a given block of 6 letters is 1 − (1/50)⁶. Because each block is typed independently, the chance X_n of not typing banana in any of the first n blocks of 6 letters is

X_{n} = {(1 - \frac{1}{50^{6}})}^{n} .

As n grows, X_n gets smaller. For n = 1 million, X_n is roughly 0.9999, but for n = 10 billion X_n is roughly 0.53 and for n = 100 billion it is roughly 0.0017. As n approaches infinity, the probability X_n approaches zero; that is, by making n large enough, X_n can be made as small as is desired, and the chance of typing banana approaches 100%. Thus, the probability of the word banana appearing at some point in an infinite sequence of keystrokes is equal to one.

The same argument applies if we replace one monkey typing n consecutive blocks of text with n monkeys each typing one block (simultaneously and independently). In this case, X_n = (1 − (1/50)⁶)ⁿ is the probability that none of the first n monkeys types banana correctly on their first try. Therefore, at least one of infinitely many monkeys will (with probability equal to one) produce a text as quickly as it would be produced by a perfectly accurate human typist copying it from the original.

Infinite strings

This can be stated more generally and compactly in terms of strings, which are sequences of characters chosen from some finite alphabet:

Given an infinite string where each character is chosen uniformly at random, any given finite string almost surely occurs as a substring at some position.
Given an infinite sequence of infinite strings, where each character of each string is chosen uniformly at random, any given finite string almost surely occurs as a prefix of one of these strings.

Both follow easily from the second Borel–Cantelli lemma. For the second theorem, let E_k be the event that the kth string begins with the given text. Because this has some fixed nonzero probability p of occurring, the E_k are independent, and the below sum diverges,

\sum_{k = 1}^{\infty} P (E_{k}) = \sum_{k = 1}^{\infty} p = \infty,

the probability that infinitely many of the E_k occur is 1. The first theorem is shown similarly; one can divide the random string into nonoverlapping blocks matching the size of the desired text, and make E_k the event where the kth block equals the desired string.

Probabilities

However, for physically meaningful numbers of monkeys typing for physically meaningful lengths of time the results are reversed. If there were as many monkeys as there are atoms in the observable universe typing extremely fast for trillions of times the life of the universe, the probability of the monkeys replicating even a single page of Shakespeare is unfathomably small.

Ignoring punctuation, spacing, and capitalization, a monkey typing letters uniformly at random has a chance of one in 26 of correctly typing the first letter of Hamlet. It has a chance of one in 676 (26 × 26) of typing the first two letters. Because the probability shrinks exponentially, at 20 letters it already has only a chance of one in 26²⁰ = 19,928,148,895,209,409,152,340,197,376 (almost 2 × 10²⁸). In the case of the entire text of Hamlet, the probabilities are so vanishingly small as to be inconceivable. The text of Hamlet contains approximately 130,000 letters. Thus there is a probability of one in 3.4 × 10^183,946 to get the text right at the first trial. The average number of letters that needs to be typed until the text appears is also 3.4 × 10^183,946, or including punctuation, 4.4 × 10^360,783.

Even if every proton in the observable universe (which is estimated at roughly 10⁸⁰) were a monkey with a typewriter, typing from the Big Bang until the end of the universe (when protons might no longer exist), they would still need a far greater amount of time – more than three hundred and sixty thousand orders of magnitude longer – to have even a 1 in 10⁵⁰⁰ chance of success. To put it another way, for a one in a trillion chance of success, there would need to be 10^360,641 observable universes made of protonic monkeys. As Kittel and Kroemer put it in their textbook on thermodynamics, the field whose statistical foundations motivated the first known expositions of typing monkeys, "The probability of Hamlet is therefore zero in any operational sense of an event ...", and the statement that the monkeys must eventually succeed "gives a misleading conclusion about very, very large numbers."

In fact there is less than a one in a trillion chance of success that such a universe made of monkeys could type any particular document a mere 79 characters long.

Almost surely

The probability that an infinite randomly generated string of text will contain a particular finite substring is 1. However, this does not mean the substring's absence is "impossible", despite the absence having a prior probability of 0. For example, the immortal monkey could randomly type G as its first letter, G as its second, and G as every single letter thereafter, producing an infinite string of Gs; at no point must the monkey be "compelled" to type anything else. (To assume otherwise implies the gambler's fallacy.) However long a randomly generated finite string is, there is a small but nonzero chance that it will turn out to consist of the same character repeated throughout; this chance approaches zero as the string's length approaches infinity. There is nothing special about such a monotonous sequence except that it is easy to describe; the same fact applies to any nameable specific sequence, such as "RGRGRG" repeated forever, or "a-b-aa-bb-aaa-bbb-...", or "Three, Six, Nine, Twelve…".

If the hypothetical monkey has a typewriter with 90 equally likely keys that include numerals and punctuation, then the first typed keys might be "3.14" (the first three digits of pi) with a probability of (1/90)⁴, which is 1/65,610,000. Equally probable is any other string of four characters allowed by the typewriter, such as "GGGG", "mATh", or "q%8e". The probability that 100 randomly typed keys will consist of the first 99 digits of pi (including the separator key), or any other particular sequence of that length, is much lower: (1/90)¹⁰⁰. If the monkey's allotted length of text is infinite, the chance of typing only the digits of pi is 0, which is just as possible (mathematically probable) as typing nothing but Gs (also probability 0).

The same applies to the event of typing a particular version of Hamlet followed by endless copies of itself; or Hamlet immediately followed by all the digits of pi; these specific strings are equally infinite in length, they are not prohibited by the terms of the thought problem, and they each have a prior probability of 0. In fact, any particular infinite sequence the immortal monkey types will have had a prior probability of 0, even though the monkey must type something.

This is an extension of the principle that a finite string of random text has a lower and lower probability of being a particular string the longer it is (though all specific strings are equally unlikely). This probability approaches 0 as the string approaches infinity. Thus, the probability of the monkey typing an endlessly long string, such as all of the digits of pi in order, on a 90-key keyboard is (1/90)^∞ which equals (1/∞) which is essentially 0. At the same time, the probability that the sequence contains a particular subsequence (such as the word MONKEY, or the 12th through 999th digits of pi, or a version of the King James Bible) increases as the total string increases. This probability approaches 1 as the total string approaches infinity, and thus the original theorem is correct.

Correspondence between strings and numbers

In a simplification of the thought experiment, the monkey could have a typewriter with just two keys: 1 and 0. The infinitely long string thusly produced would correspond to the binary digits of a particular real number between 0 and 1. A countably infinite set of possible strings end in infinite repetitions, which means the corresponding real number is rational. Examples include the strings corresponding to one-third (010101...), five-sixths (11010101...) and five-eighths (1010000...). Only a subset of such real number strings (albeit a countably infinite subset) contains the entirety of Hamlet (assuming that the text is subjected to a numerical encoding, such as ASCII).

Meanwhile, there is an uncountably infinite set of strings which do not end in such repetition; these correspond to the irrational numbers. These can be sorted into two uncountably infinite subsets: those which contain Hamlet and those which do not. However, the "largest" subset of all the real numbers are those which not only contain Hamlet, but which contain every other possible string of any length, and with equal distribution of such strings. These irrational numbers are called normal. Because almost all numbers are normal, almost all possible strings contain all possible finite substrings. Hence, the probability of the monkey typing a normal number is 1. The same principles apply regardless of the number of keys from which the monkey can choose; a 90-key keyboard can be seen as a generator of numbers written in base 90.

History

Statistical mechanics

In one of the forms in which probabilists now know this theorem, with its "dactylographic" [i.e., typewriting] monkeys (French: singes dactylographes; the French word singe covers both the monkeys and the apes), appeared in Émile Borel's 1913 article "Mécanique Statique et Irréversibilité" (Static mechanics and irreversibility), and in his book "Le Hasard" in 1914. His "monkeys" are not actual monkeys; rather, they are a metaphor for an imaginary way to produce a large, random sequence of letters. Borel said that if a million monkeys typed ten hours a day, it was extremely unlikely that their output would exactly equal all the books of the richest libraries of the world; and yet, in comparison, it was even more unlikely that the laws of statistical mechanics would ever be violated, even briefly.

The physicist Arthur Eddington drew on Borel's image further in The Nature of the Physical World (1928), writing:

If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel.

These images invite the reader to consider the incredible improbability of a large but finite number of monkeys working for a large but finite amount of time producing a significant work, and compare this with the even greater improbability of certain physical events. Any physical process that is even less likely than such monkeys' success is effectively impossible, and it may safely be said that such a process will never happen. It is clear from the context that Eddington is not suggesting that the probability of this happening is worthy of serious consideration. On the contrary, it was a rhetorical illustration of the fact that below certain levels of probability, the term improbable is functionally equivalent to impossible.

Origins and "The Total Library"

In a 1939 essay entitled "The Total Library", Argentine writer Jorge Luis Borges traced the infinite-monkey concept back to Aristotle's Metaphysics. Explaining the views of Leucippus, who held that the world arose through the random combination of atoms, Aristotle notes that the atoms themselves are homogeneous and their possible arrangements only differ in shape, position and ordering. In On Generation and Corruption, the Greek philosopher compares this to the way that a tragedy and a comedy consist of the same "atoms", i.e., alphabetic characters. Three centuries later, Cicero's De natura deorum (On the Nature of the Gods) argued against the Epicurean atomist worldview:

Is it possible for any man to behold these things, and yet imagine that certain solid and individual bodies move by their natural force and gravitation, and that a world so beautifully adorned was made by their fortuitous concourse? He who believes this may as well believe that if a great quantity of the one-and-twenty letters, composed either of gold or any other matter, were thrown upon the ground, they would fall into such order as legibly to form the Annals of Ennius. I doubt whether fortune could make a single verse of them.

Borges follows the history of this argument through Blaise Pascal and Jonathan Swift, then observes that in his own time, the vocabulary had changed. By 1939, the idiom was "that a half-dozen monkeys provided with typewriters would, in a few eternities, produce all the books in the British Museum." (To which Borges adds, "Strictly speaking, one immortal monkey would suffice.") Borges then imagines the contents of the Total Library which this enterprise would produce if carried to its fullest extreme:

Everything would be in its blind volumes. Everything: the detailed history of the future, Aeschylus' The Egyptians, the exact number of times that the waters of the Ganges have reflected the flight of a falcon, the secret and true name of Rome, the encyclopedia Novalis would have constructed, my dreams and half-dreams at dawn on August 14, 1934, the proof of Pierre Fermat's theorem, the unwritten chapters of Edwin Drood, those same chapters translated into the language spoken by the Garamantes, the paradoxes Berkeley invented concerning Time but didn't publish, Urizen's books of iron, the premature epiphanies of Stephen Dedalus, which would be meaningless before a cycle of a thousand years, the Gnostic Gospel of Basilides, the song the sirens sang, the complete catalog of the Library, the proof of the inaccuracy of that catalog. Everything: but for every sensible line or accurate fact there would be millions of meaningless cacophonies, verbal farragoes, and babblings. Everything: but all the generations of mankind could pass before the dizzying shelves – shelves that obliterate the day and on which chaos lies – ever reward them with a tolerable page.

Borges' total library concept was the main theme of his widely read 1941 short story "The Library of Babel", which describes an unimaginably vast library consisting of interlocking hexagonal chambers, together containing every possible volume that could be composed from the letters of the alphabet and some punctuation characters.

Actual monkeys

In 2002, lecturers and students from the University of Plymouth MediaLab Arts course used a £2,000 grant from the Arts Council to study the literary output of real monkeys. They left a computer keyboard in the enclosure of six Celebes crested macaques in Paignton Zoo in Devon, England from May 1 to June 22, with a radio link to broadcast the results on a website.

Not only did the monkeys produce nothing but five total pages largely consisting of the letter "S", the lead male began striking the keyboard with a stone, and other monkeys followed by urinating and defecating on the machine. Mike Phillips, director of the university's Institute of Digital Arts and Technology (i-DAT), said that the artist-funded project was primarily performance art, and they had learned "an awful lot" from it. He concluded that monkeys "are not random generators. They're more complex than that. ... They were quite interested in the screen, and they saw that when they typed a letter, something happened. There was a level of intention there."

Applications and criticisms

Evolution

In his 1931 book The Mysterious Universe, Eddington's rival James Jeans attributed the monkey parable to a "Huxley", presumably meaning Thomas Henry Huxley. This attribution is incorrect. Today, it is sometimes further reported that Huxley applied the example in a now-legendary debate over Charles Darwin's On the Origin of Species with the Anglican Bishop of Oxford, Samuel Wilberforce, held at a meeting of the British Association for the Advancement of Science at Oxford on 30 June 1860. This story suffers not only from a lack of evidence, but the fact that in 1860 the typewriter was not yet commercially available.

Despite the original mix-up, monkey-and-typewriter arguments are now common in arguments over evolution. As an example of Christian apologetics Doug Powell argued that even if a monkey accidentally types the letters of Hamlet, it has failed to produce Hamlet because it lacked the intention to communicate. His parallel implication is that natural laws could not produce the information content in DNA. A more common argument is represented by Reverend John F. MacArthur, who claimed that the genetic mutations necessary to produce a tapeworm from an amoeba are as unlikely as a monkey typing Hamlet's soliloquy, and hence the odds against the evolution of all life are impossible to overcome.

Evolutionary biologist Richard Dawkins employs the typing monkey concept in his book The Blind Watchmaker to demonstrate the ability of natural selection to produce biological complexity out of random mutations. In a simulation experiment Dawkins has his weasel program produce the Hamlet phrase METHINKS IT IS LIKE A WEASEL, starting from a randomly typed parent, by "breeding" subsequent generations and always choosing the closest match from progeny that are copies of the parent, with random mutations. The chance of the target phrase appearing in a single step is extremely small, yet Dawkins showed that it could be produced rapidly (in about 40 generations) using cumulative selection of phrases. The random choices furnish raw material, while cumulative selection imparts information. As Dawkins acknowledges, however, the weasel program is an imperfect analogy for evolution, as "offspring" phrases were selected "according to the criterion of resemblance to a distant ideal target." In contrast, Dawkins affirms, evolution has no long-term plans and does not progress toward some distant goal (such as humans). The weasel program is instead meant to illustrate the difference between non-random cumulative selection, and random single-step selection. In terms of the typing monkey analogy, this means that Romeo and Juliet could be produced relatively quickly if placed under the constraints of a nonrandom, Darwinian-type selection because the fitness function will tend to preserve in place any letters that happen to match the target text, improving each successive generation of typing monkeys.

A different avenue for exploring the analogy between evolution and an unconstrained monkey lies in the problem that the monkey types only one letter at a time, independently of the other letters. Hugh Petrie argues that a more sophisticated setup is required, in his case not for biological evolution but the evolution of ideas:

In order to get the proper analogy, we would have to equip the monkey with a more complex typewriter. It would have to include whole Elizabethan sentences and thoughts. It would have to include Elizabethan beliefs about human action patterns and the causes, Elizabethan morality and science, and linguistic patterns for expressing these. It would probably even have to include an account of the sorts of experiences which shaped Shakespeare's belief structure as a particular example of an Elizabethan. Then, perhaps, we might allow the monkey to play with such a typewriter and produce variants, but the impossibility of obtaining a Shakespearean play is no longer obvious. What is varied really does encapsulate a great deal of already-achieved knowledge.

James W. Valentine, while admitting that the classic monkey's task is impossible, finds that there is a worthwhile analogy between written English and the metazoan genome in this other sense: both have "combinatorial, hierarchical structures" that greatly constrain the immense number of combinations at the alphabet level.

Zipf's law

Zipf's law states that the frequency of words is a power law function of its frequency rank:

word frequency \propto \frac{1}{(word rank + b)^{a}}

where

a, b

are real numbers. Assuming that a monkey is typing randomly, with fixed and nonzero probability of hitting each letter key or white space, then the text produced by the monkey follows Zipf's law.

Literary theory

R. G. Collingwood argued in 1938 that art cannot be produced by accident, and wrote as a sarcastic aside to his critics,

... some ... have denied this proposition, pointing out that if a monkey played with a typewriter ... he would produce ... the complete text of Shakespeare. Any reader who has nothing to do can amuse himself by calculating how long it would take for the probability to be worth betting on. But the interest of the suggestion lies in the revelation of the mental state of a person who can identify the 'works' of Shakespeare with the series of letters printed on the pages of a book ...

Nelson Goodman took the contrary position, illustrating his point along with Catherine Elgin by the example of Borges' "Pierre Menard, Author of the Quixote",

What Menard wrote is simply another inscription of the text. Any of us can do the same, as can printing presses and photocopiers. Indeed, we are told, if infinitely many monkeys ... one would eventually produce a replica of the text. That replica, we maintain, would be as much an instance of the work, Don Quixote, as Cervantes' manuscript, Menard's manuscript, and each copy of the book that ever has been or will be printed.

In another writing, Goodman elaborates, "That the monkey may be supposed to have produced his copy randomly makes no difference. It is the same text, and it is open to all the same interpretations. ..." Gérard Genette dismisses Goodman's argument as begging the question.

For Jorge J. E. Gracia, the question of the identity of texts leads to a different question, that of author. If a monkey is capable of typing Hamlet, despite having no intention of meaning and therefore disqualifying itself as an author, then it appears that texts do not require authors. Possible solutions include saying that whoever finds the text and identifies it as Hamlet is the author; or that Shakespeare is the author, the monkey his agent, and the finder merely a user of the text. These solutions have their own difficulties, in that the text appears to have a meaning separate from the other agents: What if the monkey operates before Shakespeare is born, or if Shakespeare is never born, or if no one ever finds the monkey's typescript?

Random document generation

The theorem concerns a thought experiment which cannot be fully carried out in practice, since it is predicted to require prohibitive amounts of time and resources. Nonetheless, it has inspired efforts in finite random text generation.

One computer program run by Dan Oliver of Scottsdale, Arizona, according to an article in The New Yorker, came up with a result on 4 August 2004: After the group had worked for 42,162,500,000 billion billion monkey-years, one of the "monkeys" typed, "VALENTINE. Cease toIdor:eFLP0FRjWK78aXzVOwm)-‘;8.t" The first 19 letters of this sequence can be found in "The Two Gentlemen of Verona". Other teams have reproduced 18 characters from "Timon of Athens", 17 from "Troilus and Cressida", and 16 from "Richard II".

A website entitled The Monkey Shakespeare Simulator, launched on 1 July 2003, contained a Java applet that simulated a large population of monkeys typing randomly, with the stated intention of seeing how long it takes the virtual monkeys to produce a complete Shakespearean play from beginning to end. For example, it produced this partial line from Henry IV, Part 2, reporting that it took "2,737,850 million billion billion billion monkey-years" to reach 24 matching characters:

RUMOUR. Open your ears; 9r"5j5&?OWTY Z0d

Due to processing power limitations, the program used a probabilistic model (by using a random number generator or RNG) instead of actually generating random text and comparing it to Shakespeare. When the simulator "detected a match" (that is, the RNG generated a certain value or a value within a certain range), the simulator simulated the match by generating matched text.

Testing of random-number generators

Questions about the statistics describing how often an ideal monkey is expected to type certain strings translate into practical tests for random-number generators; these range from the simple to the "quite sophisticated". Computer-science professors George Marsaglia and Arif Zaman report that they used to call one such category of tests "overlapping m-tuple tests" in lectures, since they concern overlapping m-tuples of successive elements in a random sequence. But they found that calling them "monkey tests" helped to motivate the idea with students. They published a report on the class of tests and their results for various RNGs in 1993.

In popular culture

The infinite monkey theorem and its associated imagery is considered a popular and proverbial illustration of the mathematics of probability, widely known to the general public because of its transmission through popular culture rather than through formal education. This is helped by the innate humor stemming from the image of literal monkeys rattling away on a set of typewriters, and is a popular visual gag.

A quotation attributed to a 1996 speech by Robert Wilensky stated, "We've heard that a million monkeys at a million keyboards could produce the complete works of Shakespeare; now, thanks to the Internet, we know that is not true."

The enduring, widespread popularity of the theorem was noted in the introduction to a 2001 paper, "Monkeys, Typewriters and Networks: The Internet in the Light of the Theory of Accidental Excellence". In 2002, an article in The Washington Post said, "Plenty of people have had fun with the famous notion that an infinite number of monkeys with an infinite number of typewriters and an infinite amount of time could eventually write the works of Shakespeare". In 2003, the previously mentioned Arts Council funded experiment involving real monkeys and a computer keyboard received widespread press coverage. In 2007, the theorem was listed by Wired magazine in a list of eight classic thought experiments.

American playwright David Ives' short one-act play Words, Words, Words, from the collection All in the Timing, pokes fun of the concept of the infinite monkey theorem.

In 2015 Balanced Software released Monkey Typewriter on the Microsoft Store. The software generates random text using the Infinite Monkey theorem string formula. The software queries the generated text for user inputted phrases. However the software should not be considered true to life representation of the theory. This is a more of a practical presentation of the theory rather than scientific model on how to randomly generate text.

Academic discipline

From Wikipedia, the free encyclopedia

An academic discipline or academic field is a subdivision of knowledge that is taught and researched at the college or university level. Disciplines are defined (in part) and recognized by the academic journals in which research is published, and the learned societies and academic departments or faculties within colleges and universities to which their practitioners belong. Academic disciplines are conventionally divided into the humanities, including language, art and cultural studies, and the scientific disciplines, such as physics, chemistry, and biology; the social sciences are sometimes considered a third category.

Individuals associated with academic disciplines are commonly referred to as experts or specialists. Others, who may have studied liberal arts or systems theory rather than concentrating in a specific academic discipline, are classified as generalists.

While academic disciplines in and of themselves are more or less focused practices, scholarly approaches such as multidisciplinarity/interdisciplinarity, transdisciplinarity, and cross-disciplinarity integrate aspects from multiple academic disciplines, therefore addressing any problems that may arise from narrow concentration within specialized fields of study. For example, professionals may encounter trouble communicating across academic disciplines because of differences in language, specified concepts, or methodology.

Some researchers believe that academic disciplines may, in the future, be replaced by what is known as Mode 2 or "post-academic science", which involves the acquisition of cross-disciplinary knowledge through the collaboration of specialists from various academic disciplines.

It is also known as a field of study, field of inquiry, research field and branch of knowledge. The different terms are used in different countries and fields.

History of the concept

The University of Paris in 1231 consisted of four faculties: Theology, Medicine, Canon Law and Arts. Educational institutions originally used the term "discipline" to catalog and archive the new and expanding body of information produced by the scholarly community. Disciplinary designations originated in German universities during the beginning of the nineteenth century.

Most academic disciplines have their roots in the mid-to-late-nineteenth century secularization of universities, when the traditional curricula were supplemented with non-classical languages and literatures, social sciences such as political science, economics, sociology and public administration, and natural science and technology disciplines such as physics, chemistry, biology, and engineering.

In the early twentieth century, new academic disciplines such as education and psychology were added. In the 1970s and 1980s, there was an explosion of new academic disciplines focusing on specific themes, such as media studies, women's studies, and Africana studies. Many academic disciplines designed as preparation for careers and professions, such as nursing, hospitality management, and corrections, also emerged in the universities. Finally, interdisciplinary scientific fields of study such as biochemistry and geophysics gained prominence as their contribution to knowledge became widely recognized. Some new disciplines, such as public administration, can be found in more than one disciplinary setting; some public administration programs are associated with business schools (thus emphasizing the public management aspect), while others are linked to the political science field (emphasizing the policy analysis aspect).

As the twentieth century approached, these designations were gradually adopted by other countries and became the accepted conventional subjects. However, these designations differed between various countries. In the twentieth century, the natural science disciplines included: physics, chemistry, biology, geology, and astronomy. The social science disciplines included: economics, politics, sociology, and psychology.

Prior to the twentieth century, categories were broad and general, which was expected due to the lack of interest in science at the time. With rare exceptions, practitioners of science tended to be amateurs and were referred to as "natural historians" and "natural philosophers"—labels that date back to Aristotle—instead of "scientists". Natural history referred to what we now call life sciences and natural philosophy referred to the current physical sciences.

Prior to the twentieth century, few opportunities existed for science as an occupation outside the educational system. Higher education provided the institutional structure for scientific investigation, as well as economic support for research and teaching. Soon, the volume of scientific information rapidly increased and researchers realized the importance of concentrating on smaller, narrower fields of scientific activity. Because of this narrowing, scientific specializations emerged. As these specializations developed, modern scientific disciplines in universities also improved their sophistication. Eventually, academia's identified disciplines became the foundations for scholars of specific specialized interests and expertise.

Functions and criticism

An influential critique of the concept of academic disciplines came from Michel Foucault in his 1975 book, Discipline and Punish. Foucault asserts that academic disciplines originate from the same social movements and mechanisms of control that established the modern prison and penal system in eighteenth-century France, and that this fact reveals essential aspects they continue to have in common: "The disciplines characterize, classify, specialize; they distribute along a scale, around a norm, hierarchize individuals in relation to one another and, if necessary, disqualify and invalidate." (Foucault, 1975/1979, p. 223)

Communities of academic disciplines

Communities of academic disciplines can be found outside academia within corporations, government agencies, and independent organizations, where they take the form of associations of professionals with common interests and specific knowledge. Such communities include corporate think tanks, NASA, and IUPAC. Communities such as these exist to benefit the organizations affiliated with them by providing specialized new ideas, research, and findings.

Nations at various developmental stages will find the need for different academic disciplines during different times of growth. A newly developing nation will likely prioritize government, political matters and engineering over those of the humanities, arts and social sciences. On the other hand, a well-developed nation may be capable of investing more in the arts and social sciences. Communities of academic disciplines would contribute at varying levels of importance during different stages of development.

Interactions

These categories explain how the different academic disciplines interact with one another.

Multidisciplinary

Multidisciplinary knowledge is associated with more than one existing academic discipline or profession.

A multidisciplinary community or project is made up of people from different academic disciplines and professions. These people are engaged in working together as equal stakeholders in addressing a common challenge. A multidisciplinary person is one with degrees from two or more academic disciplines. This one person can take the place of two or more people in a multidisciplinary community. Over time, multidisciplinary work does not typically lead to an increase or a decrease in the number of academic disciplines. One key question is how well the challenge can be decomposed into subparts, and then addressed via the distributed knowledge in the community. The lack of shared vocabulary between people and communication overhead can sometimes be an issue in these communities and projects. If challenges of a particular type need to be repeatedly addressed so that each one can be properly decomposed, a multidisciplinary community can be exceptionally efficient and effective.

There are many examples of a particular idea appearing in different academic disciplines, all of which came about around the same time. One example of this scenario is the shift from the approach of focusing on sensory awareness of the whole, "an attention to the 'total field'", a "sense of the whole pattern, of form and function as a unity", an "integral idea of structure and configuration". This has happened in art (in the form of cubism), physics, poetry, communication and educational theory. According to Marshall McLuhan, this paradigm shift was due to the passage from the era of mechanization, which brought sequentiality, to the era of the instant speed of electricity, which brought simultaneity.

Multidisciplinary approaches also encourage people to help shape the innovation of the future. The political dimensions of forming new multidisciplinary partnerships to solve the so-called societal Grand Challenges were presented in the Innovation Union and in the European Framework Programme, the Horizon 2020 operational overlay. Innovation across academic disciplines is considered the pivotal foresight of the creation of new products, systems, and processes for the benefit of all societies' growth and wellbeing. Regional examples such as Biopeople and industry-academia initiatives in translational medicine such as SHARE.ku.dk in Denmark provide evidence of the successful endeavour of multidisciplinary innovation and facilitation of the paradigm shift.

Transdisciplinary

In practice, transdisciplinary can be thought of as the union of all interdisciplinary efforts. While interdisciplinary teams may be creating new knowledge that lies between several existing disciplines, a transdisciplinary team is more holistic and seeks to relate all disciplines into a coherent whole.

Cross-disciplinary

Cross-disciplinary knowledge is that which explains aspects of one discipline in terms of another. Common examples of cross-disciplinary approaches are studies of the physics of music or the politics of literature.

Bibliometric studies of disciplines

Bibliometrics can be used to map several issues in relation to disciplines, for example, the flow of ideas within and among disciplines (Lindholm-Romantschuk, 1998) or the existence of specific national traditions within disciplines. Scholarly impact and influence of one discipline on another may be understood by analyzing the flow of citations.

The Bibliometrics approach is described as straightforward because it is based on simple counting. The method is also objective but the quantitative method may not be compatible with a qualitative assessment and therefore manipulated. The number of citations is dependent on the number of persons working in the same domain instead of inherent quality or published result's originality.

Law of large numbers

From Wikipedia, the free encyclopedia

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent and identical random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

The LLN is important because it guarantees stable long-term results for the averages of some random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. Importantly, the law applies (as the name indicates) only when a large number of observations are considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others (see the gambler's fallacy).

The LLN only applies to the average of the results obtained from repeated trials and claims that this average converges to the expected value; it does not claim that the sum of n results gets close to the expected value times n as n increases.

Throughout its history, many mathematicians have refined this law. Today, the LLN is used in many fields including statistics, probability theory, economics, and insurance.

Examples

For example, a single roll of a fair, six-sided die produces one of the numbers 1, 2, 3, 4, 5, or 6, each with equal probability. Therefore, the expected value of the average of the rolls is:

\frac{1 + 2 + 3 + 4 + 5 + 6}{6} = 3.5

According to the law of large numbers, if a large number of six-sided dice are rolled, the average of their values (sometimes called the sample mean) will approach 3.5, with the precision increasing as more dice are rolled.

It follows from the law of large numbers that the empirical probability of success in a series of Bernoulli trials will converge to the theoretical probability. For a Bernoulli random variable, the expected value is the theoretical probability of success, and the average of n such variables (assuming they are independent and identically distributed (i.i.d.)) is precisely the relative frequency.

For example, a fair coin toss is a Bernoulli trial. When a fair coin is flipped once, the theoretical probability that the outcome will be heads is equal to 1⁄2. Therefore, according to the law of large numbers, the proportion of heads in a "large" number of coin flips "should be" roughly 1⁄2. In particular, the proportion of heads after n flips will almost surely converge to 1⁄2 as n approaches infinity.

Although the proportion of heads (and tails) approaches 1⁄2, almost surely the absolute difference in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the absolute difference is a small number approaches zero as the number of flips becomes large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero. Intuitively, the expected difference grows, but at a slower rate than the number of flips.

Another good example of the LLN is the Monte Carlo method. These methods are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The larger the number of repetitions, the better the approximation tends to be. The reason that this method is important is mainly that, sometimes, it is difficult or impossible to use other approaches.

Limitation

The average of the results obtained from a large number of trials may fail to converge in some cases. For instance, the average of n results taken from the Cauchy distribution or some Pareto distributions (α<1) will not converge as n becomes larger; the reason is heavy tails. The Cauchy distribution and the Pareto distribution represent two cases: the Cauchy distribution does not have an expectation, whereas the expectation of the Pareto distribution (α<1) is infinite. One way to generate the Cauchy-distributed example is where the random numbers equal the tangent of an angle uniformly distributed between −90° and +90°. The median is zero, but the expected value does not exist, and indeed the average of n such variables have the same distribution as one such variable. It does not converge in probability toward zero (or any other value) as n goes to infinity.

And if the trials embed a selection bias, typical in human economic/rational behaviour, the law of large numbers does not help in solving the bias. Even if the number of trials is increased the selection bias remains.

History

Diffusion is an example of the law of large numbers. Initially, there are solute molecules on the left side of a barrier (magenta line) and none on the right. The barrier is removed, and the solute diffuses to fill the whole container.
*Top:* With a single molecule, the motion appears to be quite random.
*Middle:* With more molecules, there is clearly a trend where the solute fills the container more and more uniformly, but there are also random fluctuations.
*Bottom:* With an enormous number of solute molecules (too many to see), the randomness is essentially gone: The solute appears to move smoothly and systematically from high-concentration areas to low-concentration areas. In realistic situations, chemists can describe diffusion as a deterministic macroscopic phenomenon (see Fick's laws), despite its underlying random nature.

The Italian mathematician Gerolamo Cardano (1501–1576) stated without proof that the accuracies of empirical statistics tend to improve with the number of trials. This was then formalized as a law of large numbers. A special form of the LLN (for a binary random variable) was first proved by Jacob Bernoulli. It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his Ars Conjectandi (The Art of Conjecturing) in 1713. He named this his "Golden Theorem" but it became generally known as "Bernoulli's theorem". This should not be confused with Bernoulli's principle, named after Jacob Bernoulli's nephew Daniel Bernoulli. In 1837, S. D. Poisson further described it under the name "la loi des grands nombres" ("the law of large numbers").Thereafter, it was known under both names, but the "law of large numbers" is most frequently used.

After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including Chebyshev, Markov, Borel, Cantelli, Kolmogorov and Khinchin. Markov showed that the law can apply to a random variable that does not have a finite variance under some other weaker assumption, and Khinchin showed in 1929 that if the series consists of independent identically distributed random variables, it suffices that the expected value exists for the weak law of large numbers to be true. These further studies have given rise to two prominent forms of the LLN. One is called the "weak" law and the other the "strong" law, in reference to two different modes of convergence of the cumulative sample means to the expected value; in particular, as explained below, the strong form implies the weak.

Forms

There are two different versions of the law of large numbers that are described below. They are called the strong law of large numbers and the weak law of large numbers. Stated for the case where X₁, X₂, ... is an infinite sequence of independent and identically distributed (i.i.d.) Lebesgue integrable random variables with expected value E(X₁) = E(X₂) = ... = µ, both versions of the law state that the sample average

{\bar{X}}_{n} = \frac{1}{n} (X_{1} + \dots + X_{n})

converges to the expected value:

{\bar{X}}_{n} \to μ as n \to \infty .

(1)

(Lebesgue integrability of X_j means that the expected value E(X_j) exists according to Lebesgue integration and is finite. It does not mean that the associated probability measure is absolutely continuous with respect to Lebesgue measure.)

Introductory probability texts often additionally assume identical finite variance $Var (X_{i}) = σ^{2}$ (for all $i$ ) and no correlation between random variables. In that case, the variance of the average of n random variables is

Var ({\bar{X}}_{n}) = Var (\frac{1}{n} (X_{1} + \dots + X_{n})) = \frac{1}{n^{2}} Var (X_{1} + \dots + X_{n}) = \frac{n σ^{2}}{n^{2}} = \frac{σ^{2}}{n} .

which can be used to shorten and simplify the proofs. This assumption of finite variance is not necessary. Large or infinite variance will make the convergence slower, but the LLN holds anyway.

Mutual independence of the random variables can be replaced by pairwise independence or exchangeability in both versions of the law.

The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see Convergence of random variables.

Weak law

Simulation illustrating the law of large numbers. Each frame, a coin that is red on one side and blue on the other is flipped, and a dot is added in the corresponding column. A pie chart shows the proportion of red and blue so far. Notice that while the proportion varies significantly at first, it approaches 50% as the number of trials increases.

The weak law of large numbers (also called Khinchin's law) states that given a collection of iid samples from a random variable with finite mean, the sample mean converges in probability to the expected value

{\bar{X}}_{n} \overset{P}{\to} μ when n \to \infty .

(2)

That is, for any positive number ε,

lim_{n \to \infty} Pr (| {\bar{X}}_{n} - μ | < ε) = 1.

Interpreting this result, the weak law states that for any nonzero margin specified (ε), no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin.

As mentioned earlier, the weak law applies in the case of i.i.d. random variables, but it also applies in some other cases. For example, the variance may be different for each random variable in the series, keeping the expected value constant. If the variances are bounded, then the law applies, as shown by Chebyshev as early as 1867. (If the expected values change during the series, then we can simply apply the law to the average deviation from the respective expected values. The law then states that this converges in probability to zero.) In fact, Chebyshev's proof works so long as the variance of the average of the first n values goes to zero as n goes to infinity. As an example, assume that each random variable in the series follows a Gaussian distribution (normal distribution) with mean zero, but with variance equal to $2 n / \log (n + 1)$ , which is not bounded. At each stage, the average will be normally distributed (as the average of a set of normally distributed variables). The variance of the sum is equal to the sum of the variances, which is asymptotic to $n^{2} / \log n$ . The variance of the average is therefore asymptotic to $1 / \log n$ and goes to zero.

There are also examples of the weak law applying even though the expected value does not exist.

Strong law

The strong law of large numbers (also called Kolmogorov's law) states that the sample average converges almost surely to the expected value

{\bar{X}}_{n} \overset{a.s.}{⟶} μ when n \to \infty .

(3)

That is,

Pr (lim_{n \to \infty} {\bar{X}}_{n} = μ) = 1.

What this means is that the probability that, as the number of trials n goes to infinity, the average of the observations converges to the expected value, is equal to one. The modern proof of the strong law is more complex than that of the weak law, and relies on passing to an appropriate subsequence.

The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem. This view justifies the intuitive interpretation of the expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the "long-term average".

Law 3 is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). However the weak law is known to hold in certain conditions where the strong law does not hold and then the convergence is only weak (in probability). See differences between the weak law and the strong law.

The strong law applies to independent identically distributed random variables having an expected value (like the weak law). This was proved by Kolmogorov in 1930. It can also apply in other cases. Kolmogorov also showed, in 1933, that if the variables are independent and identically distributed, then for the average to converge almost surely on something (this can be considered another statement of the strong law), it is necessary that they have an expected value (and then of course the average will converge almost surely on that).

If the summands are independent but not identically distributed, then

{\bar{X}}_{n} - E [{\bar{X}}_{n}] \overset{a.s.}{⟶} 0,

(2)

provided that each X_k has a finite second moment and

\sum_{k = 1}^{\infty} \frac{1}{k^{2}} Var [X_{k}] < \infty .

This statement is known as Kolmogorov's strong law, see e.g. Sen & Singer (1993, Theorem 2.3.10).

Differences between the weak law and the strong law

The weak law states that for a specified large n, the average ${\bar{X}}_{n}$ is likely to be near μ. Thus, it leaves open the possibility that $| {\bar{X}}_{n} - μ | > ε$ happens an infinite number of times, although at infrequent intervals. (Not necessarily $| {\bar{X}}_{n} - μ | \neq 0$ for all n).

The strong law shows that this almost surely will not occur. It does not imply that with probability 1, we have that for any $ε > 0$ the inequality $| {\bar{X}}_{n} - μ | < ε$ holds for all large enough n, since the convergence is not necessarily uniform on the set where it holds.

The strong law does not hold in the following cases, but the weak law does.

Let X be an exponentially distributed random variable with parameter 1. The random variable $\sin (X) e^{X} X^{- 1}$ has no expected value according to Lebesgue integration, but using conditional convergence and interpreting the integral as a Dirichlet integral, which is an improper Riemann integral, we can say: $E (\frac{\sin (X) e^{X}}{X}) = \int_{x = 0}^{\infty} \frac{\sin (x) e^{x}}{x} e^{- x} d x = \frac{π}{2}$
Let X be a geometrically distributed random variable with probability 0.5. The random variable $2^{X} (- 1)^{X} X^{- 1}$ does not have an expected value in the conventional sense because the infinite series is not absolutely convergent, but using conditional convergence, we can say: $E (\frac{2^{X} (- 1)^{X}}{X}) = \sum_{x = 1}^{\infty} \frac{2^{x} (- 1)^{x}}{x} 2^{- x} = - \ln (2)$
If the cumulative distribution function of a random variable is ${\begin{cases} 1 - F (x) & = \frac{e}{2 x \ln (x)}, & x \geq e \\ F (x) & = \frac{e}{- 2 x \ln (- x)}, & x \leq - e \end{cases}$ then it has no expected value, but the weak law is true.
Let X_k be plus or minus $\sqrt{k / \log \log \log k}$ (starting at sufficiently large k so that the denominator is positive) with probability 1⁄2 for each. The variance of X_k is then $k / \log \log \log k .$ Kolmogorov's strong law does not apply because the partial sum in his criterion up to k = n is asymptotic to $\log n / \log \log \log n$ and this is unbounded. If we replace the random variables with Gaussian variables having the same variances, namely $\sqrt{k / \log \log \log k}$ , then the average at any point will also be normally distributed. The width of the distribution of the average will tend toward zero (standard deviation asymptotic to $1 / \sqrt{2 \log \log \log n}$ ), but for a given ε, there is probability which does not go to zero with n, while the average sometime after the nth trial will come back up to ε. Since the width of the distribution of the average is not zero, it must have a positive lower bound p(ε), which means there is a probability of at least p(ε) that the average will attain ε after n trials. It will happen with probability p(ε)/2 before some m which depends on n. But even after m, there is still a probability of at least p(ε) that it will happen. (This seems to indicate that p(ε)=1 and the average will attain ε an infinite number of times.)

Uniform laws of large numbers

There are extensions of the law of large numbers to collections of estimators, where the convergence is uniform over the collection; thus the name uniform law of large numbers.

Suppose f(x,θ) is some function defined for θ ∈ Θ, and continuous in θ. Then for any fixed θ, the sequence {f(X₁,θ), f(X₂,θ), ...} will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E[f(X,θ)]. This is the pointwise (in θ) convergence.

A particular example of a uniform law of large numbers states the conditions under which the convergence happens uniformly in θ. If

Θ is compact,
f(x,θ) is continuous at each θ ∈ Θ for almost all xs, and measurable function of x at each θ.
there exists a dominating function d(x) such that E[d(X)] < ∞, and $‖ f (x, θ) ‖ \leq d (x) for all θ \in Θ .$

Then E[f(X,θ)] is continuous in θ, and

sup_{θ \in Θ} ‖ \frac{1}{n} \sum_{i = 1}^{n} f (X_{i}, θ) - E [f (X, θ)] ‖ \overset{P}{\to} 0.

This result is useful to derive consistency of a large class of estimators (see Extremum estimator).

Borel's law of large numbers

Borel's law of large numbers, named after Émile Borel, states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event is expected to occur approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if E denotes the event in question, p its probability of occurrence, and N_n(E) the number of times E occurs in the first n trials, then with probability one,

\frac{N_{n} (E)}{n} \to p as n \to \infty .

This theorem makes rigorous the intuitive notion of probability as the expected long-run relative frequency of an event's occurrence. It is a special case of any of several more general laws of large numbers in probability theory.

Chebyshev's inequality. Let X be a random variable with finite expected value μ and finite non-zero variance σ². Then for any real number $k > 0$ ,

Pr (| X - μ | \geq k σ) \leq \frac{1}{k^{2}} .

Proof of the weak law

Given X₁, X₂, ... an infinite sequence of i.i.d. random variables with finite expected value $E (X_{1}) = E (X_{2}) = \dots = μ < \infty$ , we are interested in the convergence of the sample average

{\bar{X}}_{n} = \frac{1}{n} (X_{1} + \dots + X_{n}) .

The weak law of large numbers states:

{\bar{X}}_{n} \overset{P}{\to} μ when n \to \infty .

(2)

Proof using Chebyshev's inequality assuming finite variance

This proof uses the assumption of finite variance $Var (X_{i}) = σ^{2}$ (for all $i$ ). The independence of the random variables implies no correlation between them, and we have that

Var ({\bar{X}}_{n}) = Var (\frac{1}{n} (X_{1} + \dots + X_{n})) = \frac{1}{n^{2}} Var (X_{1} + \dots + X_{n}) = \frac{n σ^{2}}{n^{2}} = \frac{σ^{2}}{n} .

The common mean μ of the sequence is the mean of the sample average:

E ({\bar{X}}_{n}) = μ .

Using Chebyshev's inequality on ${\bar{X}}_{n}$ results in

P (| {\bar{X}}_{n} - μ | \geq ε) \leq \frac{σ^{2}}{n ε^{2}} .

This may be used to obtain the following:

P (| {\bar{X}}_{n} - μ | < ε) = 1 - P (| {\bar{X}}_{n} - μ | \geq ε) \geq 1 - \frac{σ^{2}}{n ε^{2}} .

As n approaches infinity, the expression approaches 1. And by definition of convergence in probability, we have obtained

{\bar{X}}_{n} \overset{P}{\to} μ when n \to \infty .

(2)

Proof using convergence of characteristic functions

By Taylor's theorem for complex functions, the characteristic function of any random variable, X, with finite mean μ, can be written as

φ_{X} (t) = 1 + i t μ + o (t), t \to 0.

All X₁, X₂, ... have the same characteristic function, so we will simply denote this φ_X.

Among the basic properties of characteristic functions there are

φ_{\frac{1}{n} X} (t) = φ_{X} (\frac{t}{n}) and φ_{X + Y} (t) = φ_{X} (t) φ_{Y} (t)

if X and Y are independent.

These rules can be used to calculate the characteristic function of ${\bar{X}}_{n}$ in terms of φ_X:

φ_{{\bar{X}}_{n}} (t) = {[φ_{X} (\frac{t}{n})]}^{n} = {[1 + i μ \frac{t}{n} + o (\frac{t}{n})]}^{n} \to e^{i t μ}, as n \to \infty .

The limit e^itμ is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem, ${\bar{X}}_{n}$ converges in distribution to μ:

{\bar{X}}_{n} \overset{D}{\to} μ for n \to \infty .

μ is a constant, which implies that convergence in distribution to μ and convergence in probability to μ are equivalent (see Convergence of random variables.) Therefore,

{\bar{X}}_{n} \overset{P}{\to} μ when n \to \infty .

(2)

This shows that the sample mean converges in probability to the derivative of the characteristic function at the origin, as long as the latter exists.

Proof of the strong law

We give a relatively simple proof of the strong law under the assumptions that the $X_{i}$ are iid, $E [X_{i}] =: μ < \infty$ , $Var (X_{i}) = σ^{2} < \infty$ , and $E [X_{i}^{4}] =: τ < \infty$ .

Let us first note that without loss of generality we can assume that $μ = 0$ by centering. In this case, the strong law says that

Pr (lim_{n \to \infty} {\bar{X}}_{n} = 0) = 1,

Pr (ω : lim_{n \to \infty} \frac{S_{n} (ω)}{n} = 0) = 1.

It is equivalent to show that

Pr (ω : lim_{n \to \infty} \frac{S_{n} (ω)}{n} \neq 0) = 0,

Note that

lim_{n \to \infty} \frac{S_{n} (ω)}{n} \neq 0 ⟺ \exists ϵ > 0, | \frac{S_{n} (ω)}{n} | \geq ϵ infinitely often,

and thus to prove the strong law we need to show that for every

ϵ > 0

, we have

Pr (ω : | S_{n} (ω) | \geq n ϵ infinitely often) = 0.

Define the events

A_{n} = {ω : | S_{n} | \geq n ϵ}

, and if we can show that

\sum_{n = 1}^{\infty} Pr (A_{n}) < \infty,

then the Borel-Cantelli Lemma implies the result. So let us estimate

Pr (A_{n})

We compute

E [S_{n}^{4}] = E [{(\sum_{i = 1}^{n} X_{i})}^{4}] = E [\sum_{1 \leq i, j, k, l \leq n} X_{i} X_{j} X_{k} X_{l}] .

We first claim that every term of the form

X_{i}^{3} X_{j}, X_{i}^{2} X_{j} X_{k}, X_{i} X_{j} X_{k} X_{l}

where all subscripts are distinct, must have zero expectation. This is because

E [X_{i}^{3} X_{j}] = E [X_{i}^{3}] E [X_{j}]

by independence, and the last term is zero --- and similarly for the other terms. Therefore the only terms in the sum with nonzero expectation are

E [X_{i}^{4}]

and

E [X_{i}^{2} X_{j}^{2}]

. Since the

X_{i}

are identically distributed, all of these are the same, and moreover

E [X_{i}^{2} X_{j}^{2}] = (E [X_{i}^{2}])^{2}

There are $n$ terms of the form $E [X_{i}^{4}]$ and $3 n (n - 1)$ terms of the form $(E [X_{i}^{2}])^{2}$ , and so

E [S_{n}^{4}] = n τ + 3 n (n - 1) σ^{4} .

Note that the right-hand side is a quadratic polynomial in

n

, and as such there exists a

C > 0

such that

E [S_{n}^{4}] \leq C n^{2}

for

n

sufficiently large. By Markov,

Pr (| S_{n} | \geq n ϵ) \leq \frac{1}{(n ϵ)^{4}} E [S_{n}^{4}] \leq \frac{C}{ϵ^{4} n^{2}},

for

n

sufficiently large, and therefore this series is summable. Since this holds for any

ϵ > 0

, we have established the Strong LLN.

Another proof can be found in

For a proof without the added assumption of a finite fourth moment, see Section 22 of.

Consequences

The law of large numbers provides an expectation of an unknown distribution from a realization of the sequence, but also any feature of the probability distribution. By applying Borel's law of large numbers, one could easily obtain the probability mass function. For each event in the objective probability mass function, one could approximate the probability of the event's occurrence with the proportion of times that any specified event occurs. The larger the number of repetitions, the better the approximation. As for the continuous case: $C = (a - h, a + h]$ , for small positive h. Thus, for large n:

\frac{N_{n} (C)}{n} \approx p = P (X \in C) = \int_{a - h}^{a + h} f (x) d x \approx 2 h f (a)

With this method, one can cover the whole x-axis with a grid (with grid size 2h) and obtain a bar graph which is called a histogram.

Applications

One application of the LLN is the use of an important method of approximation, the Monte Carlo Method. This method uses a random sampling of numbers to approximate numerical results. The algorithm to compute an integral of f(x) on an interval [a,b] is as follows:

Simulate uniform random variables X₁, X₂, ..., X_n which can be done using a software, and use a random number table that gives U₁, U₂, ..., U_n independent and identically distributed (i.i.d.) random variables on [0,1]. Then let X_i = a+(b - a)U_i for i= 1, 2, ..., n. Then X₁, X₂, ..., X_n are independent and identically distributed uniform random variables on [a, b].
Evaluate f(X₁), f(X₂), ..., f(X_n)
Take the average of f(X₁), f(X₂), ..., f(X_n) by computing $(b - a) \frac{f (X_{1}) + f (X_{2}) + . . . + f (X_{n})}{n}$ and then by the Strong Law of Large Numbers, this converges to $(b - a) E (f (X_{1}))$ = $(b - a) \int_{a}^{b} f (x) \frac{1}{b - a} d x$ = $\int_{a}^{b} f (x) d x$

We can find the integral of $f (x) = c o s^{2} (x) \sqrt{x^{3} + 1}$ on [-1,2]. Using traditional methods to compute this integral is very difficult, so the Monte Carlo Method can be used here. Using the above algorithm, we get

$\int_{- 1}^{2} f (x) d x$ = 0.905 when n=25

and

$\int_{- 1}^{2} f (x) d x$ = 1.028 when n=250

We observe that as n increases, the numerical value also increases. When we get the actual results for the integral we get

$\int_{- 1}^{2} f (x) d x$ = 1.000194

By using the LLN, the approximation of the integral was more accurate and was closer to its true value.

Another example is the integration of f(x) = $\frac{e^{x} - 1}{e - 1}$ on [0,1]. Using the Monte Carlo Method and the LLN, we can see that as the number of samples increases, the numerical value gets closer to 0.4180233.

Search This Blog

Monday, February 19, 2024

World Brain

Background

Synopsis

World Encyclopedia

The Brain Organization of the Modern World

A Permanent World Encyclopedia

Speech to the Congrès Mondial De La Documentation Universelle

The Informative Content of Education

Influence

1930s: World Congress of Universal Documentation

1960s: The World Brain as a supercomputer

From World Library to World Brain

1990s: World Wide Web of documents

World Wide Web as a World Brain

Wikipedia as a World Brain

Infinite monkey theorem

Solution

Direct proof

Infinite strings

Probabilities

Almost surely

Correspondence between strings and numbers

History

Statistical mechanics

Origins and "The Total Library"

Actual monkeys

Applications and criticisms

Evolution

Zipf's law

Literary theory

Random document generation

Testing of random-number generators

In popular culture

Academic discipline

History of the concept

Functions and criticism

Communities of academic disciplines

Interactions

Multidisciplinary

Transdisciplinary

Cross-disciplinary

Bibliometric studies of disciplines

Law of large numbers

Examples

Limitation

History

Forms

Weak law

Strong law

Differences between the weak law and the strong law

Uniform laws of large numbers

Borel's law of large numbers

Proof of the weak law

Proof using Chebyshev's inequality assuming finite variance

Proof using convergence of characteristic functions

Proof of the strong law

Consequences

Applications

Supramolecular chemistry