Search This Blog

Wednesday, January 25, 2023

Origin of speech

From Wikipedia, the free encyclopedia

The origin of speech refers to the general problem of the origin of language in the context of the physiological development of the human speech organs such as the tongue, lips, and vocal organs used to produce phonological units in all spoken languages. The origin of speech has been studied through many fields and topics such as: evolution, anatomy, and history of linguistics. The origin of speech is related to the more general problem of the origin of language, the evolution of distinctively human speech capacities has become a distinct and in many ways separate area of scientific research. The topic is a separate one because language is not necessarily spoken: it can equally be written or signed. Speech is in this sense optional, although it is the default modality for language.

Background

Places of articulation (passive and active):
1. Exo-labial, 2. Endo-labial, 3. Dental, 4. Alveolar, 5. Post-alveolar, 6. Pre-palatal, 7. Palatal, 8. Velar, 9. Uvular, 10. Pharyngeal, 11. Glottal, 12. Epiglottal, 13. Radical, 14. Postero-dorsal, 15. Antero-dorsal, 16. Laminal, 17. Apical, 18. Sub-apical

There are many different theories and ideas that give us a theoretical framework of how speech in humans originated. Multiple of these theories play on the idea of how humans evolved over time.

Monkeys, apes and humans, like many other animals, have evolved specialized mechanisms for producing sound for purposes of social communication. On the other hand, no monkey or ape uses its tongue for such purposes. The human species' unprecedented use of the tongue, lips and other moveable parts seems to place speech in a quite separate category, making its evolutionary emergence an intriguing theoretical challenge in the eyes of many scholars.

Nevertheless, recent insights in human evolution – more specifically, human Pleistocene littoral evolution – help understand how human speech evolved: different biological pre-adaptations to spoken language find their origin in humanity's waterside past, such as a larger brain (thanks to DHA and other brain-specific nutrients in seafoods), voluntary breathing (breath-hold diving for shellfish, etc.), and suction feeding of soft-slippery seafoods. Suction feeding explains why humans, as opposed to other hominoids, evolved hyoidal descent (tongue-bone descended in the throat), closed tooth-rows (with incisiform canine teeth) and a globular tongue perfectly fitting in a vaulted and smooth palate (without transverse ridges as in apes): all this allowed the pronunciation of consonants. Other, probably older, pre-adaptations to human speech are territorial songs and gibbon-like duetting and vocal learning.

Vocal learning, the ability to imitate sounds – as in many birds and bats and a number of Cetacea and Pinnipedia – is arguably required for locating or finding back (amid the foliage or in the sea) the offspring or parents. Indeed, independent lines of evidence (comparative, fossil, archeological, paleo-environmental, isotopic, nutritional, and physiological) show that early-Pleistocene "archaic" Homo spread intercontinentally along the Indian Ocean shores (they even reached overseas islands such as Flores) where they regularly dived for littoral foods such as shell- and crayfish, which are extremely rich in brain-specific nutrients, explaining Homo's brain enlargement. Shallow diving for seafoods requires voluntary airway control, a prerequisite for spoken language. Seafood such as shellfish generally does not require biting and chewing, but stone tool use and suction feeding. This finer control of the oral apparatus was arguably another biological pre-adaptation to human speech, especially for the production of consonants.

Modality-independence

Language Areas of the human brain. The angular gyrus is represented in orange, supramarginal gyrus is represented in yellow, Broca's area is represented in blue, Wernicke's area is represented in green and the primary auditory cortex is represented in pink.

The term modality means the chosen representational format for encoding and transmitting information. A striking feature of language is that it is modality-independent. Should an impaired child be prevented from hearing or producing sound, its innate capacity to master a language may equally find expression in signing. Sign languages of the deaf are independently invented and have all the major properties of spoken language except for the modality of transmission. From this it appears that the language centres of the human brain must have evolved to function optimally, irrespective of the selected modality.

"The detachment from modality-specific inputs may represent a substantial change in neural organization, one that affects not only imitation but also communication; only humans can lose one modality (e.g. hearing) and make up for this deficit by communicating with complete competence in a different modality (i.e. signing)."

— Marc Hauser, Noam Chomsky, and W. Tecumseh Fitch, 2002. The Faculty of Language: What Is It, Who Has It, and How Did It Evolve?
Figure 18 from Charles Darwin's The Expression of the Emotions in Man and Animals. Caption reads "Chimpanzee disappointed and sulky. Drawn from life by Mr. Wood".

Animal communication systems routinely combine visible with audible properties and effects, but none is modality-independent. For example, no vocally-impaired whale, dolphin, or songbird could express its song repertoire equally in visual display. Indeed, in the case of animal communication, message and modality are not capable of being disentangled. Whatever message is being conveyed stems from the intrinsic properties of the signal.

Modality independence should not be confused with the ordinary phenomenon of multimodality. Monkeys and apes rely on a repertoire of species-specific "gesture-calls" – emotionally-expressive vocalisations inseparable from the visual displays which accompany them. Humans also have species-specific gesture-calls – laughs, cries, sobs, etc. – together with involuntary gestures accompanying speech. Many animal displays are polymodal in that each appears designed to exploit multiple channels simultaneously.

The human linguistic property of modality independence is conceptually distinct from polymodality. It allows the speaker to encode the informational content of a message in a single channel whilst switching between channels as necessary. Modern city-dwellers switch effortlessly between the spoken word and writing in its various forms – handwriting, typing, email, etc. Whichever modality is chosen, it can reliably transmit the full message content without external assistance of any kind. When talking on the telephone, for example, any accompanying facial or manual gestures, however natural to the speaker, are not strictly necessary. When typing or manually signing, conversely, there is no need to add sounds. In many Australian Aboriginal cultures, a section of the population – perhaps women observing a ritual taboo – traditionally restrict themselves for extended periods to a silent (manually-signed) version of their language. Then, when released from the taboo, these same individuals resume narrating stories by the fireside or in the dark, switching to pure sound without sacrifice of informational content.

Evolution of the speech organs

Human vocal tract

Speaking is the default modality for language in all cultures. Humans' first recourse is to encode our thoughts in sound – a method which depends on sophisticated capacities for controlling the lips, tongue and other components of the vocal apparatus.

The speech organs evolved in the first instance not for speech but for more basic bodily functions such as feeding and breathing. Nonhuman primates have broadly similar organs, but with different neural controls. Non-human apes use their highly-flexible, maneuverable tongues for eating but not for vocalizing. When an ape is not eating, fine motor control over its tongue is deactivated. Either it is performing gymnastics with its tongue or it is vocalising; it cannot perform both activities simultaneously. Since this applies to mammals in general, Homo sapiens are exceptional in harnessing mechanisms designed for respiration and ingestion for the radically different requirements of articulate speech.

Tongue

Spectrogram of American English vowels [i, u, ɑ] showing the formants f1 and f2

The word "language" derives from the Latin lingua, "tongue". Phoneticians agree that the tongue is the most important speech articulator, followed by the lips. A natural language can be viewed as a particular way of using the tongue to express thought.

The human tongue has an unusual shape. In most mammals, it is a long, flat structure contained largely within the mouth. It is attached at the rear to the hyoid bone, situated below the oral level in the pharynx. In humans, the tongue has an almost circular sagittal (midline) contour, much of it lying vertically down an extended pharynx, where it is attached to a hyoid bone in a lowered position. Partly as a result of this, the horizontal (inside-the-mouth) and vertical (down-the-throat) tubes forming the supralaryngeal vocal tract (SVT) are almost equal in length (whereas in other species, the vertical section is shorter). As we move our jaws up and down, the tongue can vary the cross-sectional area of each tube independently by about 10:1, altering formant frequencies accordingly. That the tubes are joined at a right angle permits pronunciation of the vowels [i], [u] and [a], which nonhuman primates cannot do. Even when not performed particularly accurately, in humans the articulatory gymnastics needed to distinguish these vowels yield consistent, distinctive acoustic results, illustrating the quantal nature of human speech sounds. It may not be coincidental that [i], [u] and [a] are the most common vowels in the world's languages. Human tongues are a lot shorter and thinner than other mammals and are composed of a large number of muscles, which helps shape a variety of sounds within the oral cavity. The diversity of sound production is also increased with the human’s ability to open and close the airway, allowing varying amounts of air to exit through the nose. The fine motor movements associated with the tongue and the airway, make humans more capable of producing a wide range of intricate shapes in order to produce sounds at different rates and intensities.

Lips

In humans, the lips are important for the production of stops and fricatives, in addition to vowels. Nothing, however, suggests that the lips evolved for those reasons. During primate evolution, a shift from nocturnal to diurnal activity in tarsiers, monkeys and apes (the haplorhines) brought with it an increased reliance on vision at the expense of olfaction. As a result, the snout became reduced and the rhinarium or "wet nose" was lost. The muscles of the face and lips consequently became less constrained, enabling their co-option to serve purposes of facial expression. The lips also became thicker, and the oral cavity hidden behind became smaller. Hence, according to Ann MacLarnon, "the evolution of mobile, muscular lips, so important to human speech, was the exaptive result of the evolution of diurnality and visual communication in the common ancestor of haplorhines". It is unclear whether human lips have undergone a more recent adaptation to the specific requirements of speech.

Respiratory control

Compared with nonhuman primates, humans have significantly enhanced control of breathing, enabling exhalations to be extended and inhalations shortened as we speak. Whilst we are speaking, intercostal and interior abdominal muscles are recruited to expand the thorax and draw air into the lungs, and subsequently to control the release of air as the lungs deflate. The muscles concerned are markedly more innervated in humans than in nonhuman primates. Evidence from fossil hominins suggests that the necessary enlargement of the vertebral canal, and therefore spinal cord dimensions, may not have occurred in Australopithecus or Homo erectus but was present in the Neanderthals and early modern humans.

Larynx

Anatomy of the larynx, anterolateral view

Illu larynx.jpg

The larynx or voice box is an organ in the neck housing the vocal folds, which are responsible for phonation. In humans, the larynx is descended, it is positioned lower than in other primates. This is because the evolution of humans to an upright position shifted the head directly above the spinal cord, forcing everything else downward. The repositioning of the larynx resulted in a longer cavity called the pharynx, which is responsible for increasing the range and clarity of the sound being produced. Other primates have almost no pharynx; therefore, their vocal power is significantly lower. Humans are not unique in this respect: goats, dogs, pigs and tamarins lower the larynx temporarily, to emit loud calls. Several deer species have a permanently lowered larynx, which may be lowered still further by males during their roaring displays. Lions, jaguars, cheetahs and domestic cats also do this. However, laryngeal descent in nonhumans (according to Philip Lieberman) is not accompanied by descent of the hyoid; hence the tongue remains horizontal in the oral cavity, preventing it from acting as a pharyngeal articulator.

Anterolateral view of head and neck

Despite all this, scholars remain divided as to how "special" the human vocal tract really is. It has been shown that the larynx does descend to some extent during development in chimpanzees, followed by hyoidal descent. As against this, Philip Lieberman points out that only humans have evolved permanent and substantial laryngeal descent in association with hyoidal descent, resulting in a curved tongue and two-tube vocal tract with 1:1 proportions. Uniquely in the human case, simple contact between the epiglottis and velum is no longer possible, disrupting the normal mammalian separation of the respiratory and digestive tracts during swallowing. Since this entails substantial costs – increasing the risk of choking whilst swallowing food – we are forced to ask what benefits might have outweighed those costs. Some claim the clear benefit must have been speech, but other contest this. One objection is that humans are in fact not seriously at risk of choking on food: medical statistics indicate that accidents of this kind are extremely rare. Another objection is that in the view of most scholars, speech as we know it emerged relatively late in human evolution, roughly contemporaneously with the emergence of Homo sapiens. A development as complex as the reconfiguration of the human vocal tract would have required much more time, implying an early date of origin. This discrepancy in timescales undermines the idea that human vocal flexibility was initially driven by selection pressures for speech.

At least one orangutan has demonstrated the ability to control the voice box.

The size exaggeration hypothesis

To lower the larynx is to increase the length of the vocal tract, in turn lowering formant frequencies so that the voice sounds "deeper" – giving an impression of greater size. John Ohala argued that the function of the lowered larynx in humans, especially males, is probably to enhance threat displays rather than speech itself. Ohala pointed out that if the lowered larynx were an adaptation for speech, we would expect adult human males to be better adapted in this respect than adult females, whose larynx is considerably less low. In fact, females invariably outperform males in verbal tests, falsifying this whole line of reasoning. William Tecumseh Fitch likewise argues that this was the original selective advantage of laryngeal lowering in our species. Although, according to Fitch, the initial lowering of the larynx in humans had nothing to do with speech, the increased range of possible formant patterns was subsequently co-opted for speech. Size exaggeration remains the sole function of the extreme laryngeal descent observed in male deer. Consistent with the size exaggeration hypothesis, a second descent of the larynx occurs at puberty in humans, although only in males. In response to the objection that the larynx is descended in human females, Fitch suggests that mothers vocalising to protect their infants would also have benefited from this ability.

Neanderthal speech

Hyoid bone – anterior surface, enlarged

Most specialists credit the Neanderthals with speech abilities not radically different from those of modern Homo sapiens. An indirect line of argument is that their toolmaking and hunting tactics would have been difficult to learn or execute without some kind of speech. A recent extraction of DNA from Neanderthal bones indicates that Neanderthals had the same version of the FOXP2 gene as modern humans. This gene, mistakenly described as the "grammar gene", plays a role in controlling the orofacial movements which (in modern humans) are involved in speech.

During the 1970s, it was widely believed that the Neanderthals lacked modern speech capacities. It was claimed that they possessed a hyoid bone so high up in the vocal tract as to preclude the possibility of producing certain vowel sounds.

The hyoid bone is present in many mammals. It allows a wide range of tongue, pharyngeal and laryngeal movements by bracing these structures alongside each other in order to produce variation. It is now realised that its lowered position is not unique to Homo sapiens, whilst its relevance to vocal flexibility may have been overstated: although men have a lower larynx, they do not produce a wider range of sounds than women or two-year-old babies. There is no evidence that the larynx position of the Neanderthals impeded the range of vowel sounds they could produce. The discovery of a modern-looking hyoid bone of a Neanderthal man in the Kebara Cave in Israel led its discoverers to argue that the Neanderthals had a descended larynx, and thus human-like speech capabilities. However, other researchers have claimed that the morphology of the hyoid is not indicative of the larynx's position. It is necessary to take into consideration the skull base, the mandible, the cervical vertebrae and a cranial reference plane.

The morphology of the outer and middle ear of Middle Pleistocene hominins from Atapuerca, Spain, believed to be proto-Neanderthal, suggests they had an auditory sensitivity similar to modern humans and very different from chimpanzees. They were probably able to differentiate between many different speech sounds.

Hypoglossal canal

Hypoglossal nerve
Gray794.png
Hypoglossal nerve, cervical plexus, and their branches
Details
Identifiers
Latinnervus hypoglossus
Anatomical terms of neuroanatomy

The hypoglossal nerve plays an important role in controlling movements of the tongue. In 1998, a research team used the size of the hypoglossal canal in the base of fossil skulls in an attempt to estimate the relative number of nerve fibres, claiming on this basis that Middle Pleistocene hominins and Neanderthals had more fine-tuned tongue control than either Australopithecines or apes. Subsequently, however, it was demonstrated that hypoglossal canal size and nerve sizes are not correlated, and it is now accepted that such evidence is uninformative about the timing of human speech evolution.

Distinctive features theory

According to one influential school, the human vocal apparatus is intrinsically digital on the model of a keyboard or digital computer (see below). Nothing about a chimpanzee's vocal apparatus suggests a digital keyboard, notwithstanding the anatomical and physiological similarities. This poses the question as to when and how, during the course of human evolution, the transition from analog to digital structure and function occurred.

The human supralaryngeal tract is said to be digital in the sense that it is an arrangement of moveable toggles or switches, each of which, at any one time, must be in one state or another. The vocal cords, for example, are either vibrating (producing a sound) or not vibrating (in silent mode). By virtue of simple physics, the corresponding distinctive feature – in this case, "voicing" – cannot be somewhere in between. The options are limited to "off" and "on". Equally digital is the feature known as "nasalisation". At any given moment the soft palate or velum either allows or does not allow sound to resonate in the nasal chamber. In the case of lip and tongue positions, more than two digital states may be allowed.

The theory that speech sounds are composite entities constituted by complexes of binary phonetic features was first advanced in 1938 by the Russian linguist Roman Jakobson. A prominent early supporter of this approach was Noam Chomsky, who went on to extend it from phonology to language more generally, in particular to the study of syntax and semantics. In his 1965 book, Aspects of the Theory of Syntax, Chomsky treated semantic concepts as combinations of binary-digital atomic elements explicitly on the model of distinctive features theory. The lexical item "bachelor", on this basis, would be expressed as [+ Human], [+ Male], [- Married].

Supporters of this approach view the vowels and consonants recognised by speakers of a particular language or dialect at a particular time as cultural entities of little scientific interest. From a natural science standpoint, the units which matter are those common to Homo sapiens by virtue of our biological nature. By combining the atomic elements or "features" with which all humans are innately equipped, anyone may in principle generate the entire range of vowels and consonants to be found in any of the world's languages, whether past, present or future. The distinctive features are in this sense atomic components of a universal language.

Voicing contrast in English fricatives
Articulation Voiceless Voiced
Pronounced with the lower lip against the teeth: [f] (fan) [v] (van)
Pronounced with the tongue against the teeth: [θ] (thin, thigh) [ð] (then, thy)
Pronounced with the tongue near the gums: [s] (sip) [z] (zip)
Pronounced with the tongue bunched up: [ʃ] (pressure) [ʒ] (pleasure)

Criticism

In recent years, the notion of an innate "universal grammar" underlying phonological variation has been called into question. The most comprehensive monograph ever written about speech sounds, The Sounds of the World's Languages, by Peter Ladefoged and Ian Maddieson, found virtually no basis for the postulation of some small number of fixed, discrete, universal phonetic features. Examining 305 languages, for example, they encountered vowels that were positioned basically everywhere along the articulatory and acoustic continuum. Ladefoged concluded that phonological features are not determined by human nature: "Phonological features are best regarded as artifacts that linguists have devised in order to describe linguistic systems".

Self-organisation theory

Birds flocking, an example of self-organization in biology

Self-organisation characterises systems where macroscopic structures are spontaneously formed out of local interactions between the many components of the system. In self-organised systems, global organisational properties are not to be found at the local level. In colloquial terms, self-organisation is roughly captured by the idea of "bottom-up" (as opposed to "top-down") organisation. Examples of self-organised systems range from ice crystals to galaxy spirals in the inorganic world.

A termite mound (Macrotermitinae) in the Okavango Delta just outside Maun, Botswana

According to many phoneticians, the sounds of language arrange and re-arrange themselves through self-organisation. Speech sounds have both perceptual (how one hears them) and articulatory (how one produces them) properties, all with continuous values. Speakers tend to minimise effort, favouring ease of articulation over clarity. Listeners do the opposite, favouring sounds that are easy to distinguish even if difficult to pronounce. Since speakers and listeners are constantly switching roles, the syllable systems actually found in the world's languages turn out to be a compromise between acoustic distinctiveness on the one hand, and articulatory ease on the other.

Agent-based computer models take the perspective of self-organisation at the level of the speech community or population. The two main paradigms are (1) the iterated learning model and (2) the language game model. Iterated learning focuses on transmission from generation to generation, typically with just one agent in each generation. In the language game model, a whole population of agents simultaneously produce, perceive and learn language, inventing novel forms when the need arises.

Several models have shown how relatively simple peer-to-peer vocal interactions, such as imitation, can spontaneously self-organise a system of sounds shared by the whole population, and different in different populations. For example, models elaborated by Berrah et al. (1996) and de Boer (2000), and recently reformulated using Bayesian theory, showed how a group of individuals playing imitation games can self-organise repertoires of vowel sounds which share substantial properties with human vowel systems. For example, in de Boer's model, initially vowels are generated randomly, but agents learn from each other as they interact repeatedly over time. Agent A chooses a vowel from her repertoire and produces it, inevitably with some noise. Agent B hears this vowel and chooses the closest equivalent from her own repertoire. To check whether this truly matches the original, B produces the vowel she thinks she has heard, whereupon A refers once again to her own repertoire to find the closest equivalent. If this matches the one she initially selected, the game is successful, otherwise, it has failed. "Through repeated interactions", according to de Boer, "vowel systems emerge that are very much like the ones found in human languages".

In a different model, the phonetician Björn Lindblom was able to predict, on self-organisational grounds, the favoured choices of vowel systems ranging from three to nine vowels on the basis of a principle of optimal perceptual differentiation.

Further models studied the role of self-organisation in the origins of phonemic coding and combinatoriality, which is the existence of phonemes and their systematic reuse to build structured syllables. Pierre-Yves Oudeyer developed models which showed that basic neural equipment for adaptive holistic vocal imitation, coupling directly motor and perceptual representations in the brain, can generate spontaneously shared combinatorial systems of vocalisations, including phonotactic patterns, in a society of babbling individuals. These models also characterised how morphological and physiological innate constraints can interact with these self-organised mechanisms to account for both the formation of statistical regularities and diversity in vocalisation systems.

Gestural theory

The gestural theory states that speech was a relatively late development, evolving by degrees from a system that was originally gestural. Our ancestors were unable to control their vocalisation at the time when gestures were used to communicate; however, as they slowly began to control their vocalisations, spoken language began to evolve.

Three types of evidence support this theory:

  1. Gestural language and vocal language depend on similar neural systems. The regions on the cortex that are responsible for mouth and hand movements border each other.
  2. Nonhuman primates minimise vocal signals in favour of manual, facial and other visible gestures in order to express simple concepts and communicative intentions in the wild. Some of these gestures resemble those of humans, such as the "begging posture", with the hands stretched out, which humans share with chimpanzees.
  3. Mirror Neurons

Research has found strong support for the idea that spoken language and signing depend on similar neural structures. Patients who used sign language, and who suffered from a left-hemisphere lesion, showed the same disorders with their sign language as vocal patients did with their oral language. Other researchers found that the same left-hemisphere brain regions were active during sign language as during the use of vocal or written language.

Humans spontaneously use hand and facial gestures when formulating ideas to be conveyed in speech. There are also, of course, many sign languages in existence, commonly associated with deaf communities; as noted above, these are equal in complexity, sophistication, and expressive power, to any oral language. The main difference is that the "phonemes" are produced on the outside of the body, articulated with hands, body, and facial expression, rather than inside the body articulated with tongue, teeth, lips, and breathing.

Many psychologists and scientists have looked into the mirror system in the brain to answer this theory as well as other behavioural theories. Evidence to support mirror neurons as a factor in the evolution of speech includes mirror neurons in primates, the success of teaching apes to communicate gesturally, and pointing/gesturing to teach young children language. Fogassi and Ferrari (2014) monitored motor cortex activity in monkeys, specifically area F5 in the Broca’s area, where mirror neurons are located. They observed changes in electrical activity in this area when the monkey executed or observed different hand actions performed by someone else. Broca’s area is a region in the frontal lobe responsible for language production and processing. The discovery of mirror neurons in this region, which fire when an action is done or observed specifically with the hand, strongly supports the belief that communication was once accomplished with gestures. The same is true when teaching young children language. When one points at a specific object or location, mirror neurons in the child fire as though they were doing the action, which results in long-term learning 

Criticism

Critics note that for mammals in general, sound turns out to be the best medium in which to encode information for transmission over distances at speed. Given the probability that this applied also to early humans, it is hard to see why they should have abandoned this efficient method in favour of more costly and cumbersome systems of visual gesturing – only to return to sound at a later stage.

By way of explanation, it has been proposed that at a relatively late stage in human evolution, our ancestors' hands became so much in demand for making and using tools that the competing demands of manual gesturing became a hindrance. The transition to spoken language is said to have occurred only at that point. Since humans throughout evolution have been making and using tools, however, most scholars remain unconvinced by this argument. (For a different approach to this issue – one setting out from considerations of signal reliability and trust – see "from pantomime to speech" below).

Timeline of speech evolution

Little is known about the timing of language's emergence in the human species. Unlike writing, speech leaves no material trace, making it archaeologically invisible. Lacking direct linguistic evidence, specialists in human origins have resorted to the study of anatomical features and genes arguably associated with speech production. Whilst such studies may provide information as to whether pre-modern Homo species had speech capacities, it is still unknown whether they actually spoke. Whilst they may have communicated vocally, the anatomical and genetic data lack the resolution necessary to differentiate proto-language from speech.

Using statistical methods to estimate the time required to achieve the current spread and diversity in modern languages today, Johanna Nichols – a linguist at the University of California, Berkeley – argued in 1998 that vocal languages must have begun diversifying in our species at least 100,000 years ago.

More recently – in 2012 – anthropologists Charles Perreault and Sarah Mathew used phonemic diversity to suggest a date consistent with this. "Phonemic diversity" denotes the number of perceptually distinct units of sound – consonants, vowels and tones – in a language. The current worldwide pattern of phonemic diversity potentially contains the statistical signal of the expansion of modern Homo sapiens out of Africa, beginning around 60-70 thousand years ago. Some scholars argue that phonemic diversity evolves slowly and can be used as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. As human populations left Africa and expanded into the rest of the world, they underwent a series of bottlenecks – points at which only a very small population survived to colonise a new continent or region. Allegedly such a population crash led to a corresponding reduction in genetic, phenotypic and phonemic diversity. African languages today have some of the largest phonemic inventories in the world, whilst the smallest inventories are found in South America and Oceania, some of the last regions of the globe to be colonised. For example, Rotokas, a language of New Guinea, and Pirahã, spoken in South America, both have just 11 phonemes, whilst !Xun, a language spoken in Southern Africa has 141 phonemes. The authors use a natural experiment – the colonization of mainland Southeast Asia on the one hand, the long-isolated Andaman Islands on the other – to estimate the rate at which phonemic diversity increases through time. Using this rate, they estimate that the world's languages date back to the Middle Stone Age in Africa, sometime between 350 thousand and 150 thousand years ago. This corresponds to the speciation event which gave rise to Homo sapiens.

These and similar studies have however been criticised by linguists who argue that they are based on a flawed analogy between genes and phonemes, since phonemes are frequently transferred laterally between languages unlike genes, and on a flawed sampling of the world's languages, since both Oceania and the Americas also contain languages with very high numbers of phonemes, and Africa contains languages with very few. They argue that the actual distribution of phonemic diversity in the world reflects recent language contact and not deep language history - since it is well demonstrated that languages can lose or gain many phonemes over very short periods. In other words, there is no valid linguistic reason to expect genetic founder effects to influence phonemic diversity.

Speculative scenarios

Early speculations

"I cannot doubt that language owes its origin to the imitation and modification, aided by signs and gestures, of various natural sounds, the voices of other animals, and man's own instinctive cries."

— Charles Darwin, 1871. The Descent of Man, and Selection in Relation to Sex.

In 1861, historical linguist Max Müller published a list of speculative theories concerning the origins of spoken language: These theories have been grouped under the category named invention hypotheses. These hypotheses were all meant to understand how the first language could have developed and postulate that human mimicry of natural sounds were how the first words with meaning were derived.

  • Bow-wow. The bow-wow or cuckoo theory, which Müller attributed to the German philosopher Johann Gottfried Herder, saw early words as imitations of the cries of beasts and birds. This theory, believed to be derived from onomatopoeia, relates the meaning of the sound to the actual sound formulated by the speaker.
  • Pooh-pooh. The Pooh-Pooh theory saw the first words as emotional interjections and exclamations triggered by pain, pleasure, surprise and so on. These sounds were all produced on sudden intakes of breath, which is unlike any other language. Unlike emotional reactions, spoken language is produced on the exhale, so the sounds contained in this form of communication are unlike those used in normal speech production, which makes this theory a less plausible one for language acquisition.
  • Ding-dong. Müller suggested what he called the Ding-Dong theory, which states that all things have a vibrating natural resonance, echoed somehow by man in his earliest words. Words are derived from the sound associated with their meaning; for example, “crash became a word for thunder, boom for explosion.” This theory also heavily relies on the concept of onomatopoeia.
  • Yo-he-ho. The yo-he-ho theory saw language emerging out of collective rhythmic labor, the attempt to synchronize muscular effort resulting in sounds such as heave alternating with sounds such as ho. Believed to be derived from the basis of human collaborative efforts, this theory states that humans needed words, which might have started off as chanting, to communicate. This need could have been to ward off predators, or served as a unifying battle cry.
  • Ta-ta. This did not feature in Max Müller's list, having been proposed in 1930 by Sir Richard Paget.[93] According to the ta-ta theory, humans made the earliest words by tongue movements that mimicked manual gestures, rendering them audible.

A common concept of onomatopoeia as the first source of words is present; however, there is a problem with this theory. Onomatopoeia can explain the first couple of words all derived from natural phenomenon, but there is no explanation as to how more complex words without a natural counterpart came to be. Most scholars today consider all such theories not so much wrong – they occasionally offer peripheral insights – as drastically limited. These theories are too narrowly mechanistic to comprehensively explain the origin of language. They assume that once the ancestors of humans had stumbled upon the appropriate ingenious mechanism for linking sounds with meanings, language automatically evolved and changed.

Problems of reliability and deception

From the perspective of modern science, the main obstacle to the evolution of speech-like communication in nature is not a mechanistic one. Rather, it is that symbols – arbitrary associations of sounds with corresponding meanings – are unreliable and may well be false. As the saying goes, "words are cheap". The problem of reliability was not recognised at all by Darwin, Müller or the other early evolutionist theorists.

Animal vocal signals are for the most part intrinsically reliable. When a cat purrs, the signal constitutes direct evidence of the animal's contented state. One can "trust" the signal not because the cat is inclined to be honest, but because it just can't fake that sound. Primate vocal calls may be slightly more manipulable, but they remain reliable for the same reason – because they are hard to fake. Primate social intelligence is Machiavellian – self-serving and unconstrained by moral scruples. Monkeys and apes often attempt to deceive one another, whilst at the same time remaining constantly on guard against falling victim to deception themselves. Paradoxically, it is precisely primates' resistance to deception that blocks the evolution of their vocal communication systems along language-like lines. Language is ruled out because the best way to guard against being deceived is to ignore all signals except those that are instantly verifiable. Words automatically fail this test.

Words are easy to fake. Should they turn out to be lies, listeners will adapt by ignoring them in favour of hard-to-fake indices or cues. For language to work, then, listeners must be confident that those with whom they are on speaking terms are generally likely to be honest. A peculiar feature of language is "displaced reference", which means reference to topics outside the currently perceptible situation. This property prevents utterances from being corroborated in the immediate "here" and "now". For this reason, language presupposes relatively high levels of mutual trust in order to become established over time as an evolutionarily stable strategy. A theory of the origins of language must, therefore, explain why humans could begin trusting cheap signals in ways that other animals apparently cannot (see signalling theory).

"Kin selection"

The "mother tongues" hypothesis was proposed in 2004 as a possible solution to this problem. W. Tecumseh Fitch suggested that the Darwinian principle of "kin selection" – the convergence of genetic interests between relatives – might be part of the answer. Fitch suggests that spoken languages were originally "mother tongues". If speech evolved initially for communication between mothers and their own biological offspring, extending later to include adult relatives as well, the interests of speakers and listeners would have tended to coincide. Fitch argues that shared genetic interests would have led to sufficient trust and cooperation for intrinsically unreliable vocal signals – spoken words – to become accepted as trustworthy and so begin evolving for the first time.

Criticism

Critics of this theory point out that kin selection is not unique to humans. Ape mothers also share genes with their offspring, as do all animals, so why is it only humans who speak? Furthermore, it is difficult to believe that early humans restricted linguistic communication to genetic kin: the incest taboo must have forced men and women to interact and communicate with non-kin. The extension of the posited "mother tongue" networks from relatives to non-relatives remains unexplained.

"Reciprocal altruism"

Ib Ulbæk invokes another standard Darwinian principle – "reciprocal altruism" – to explain the unusually high levels of intentional honesty necessary for language to evolve. 'Reciprocal altruism' can be expressed as the principle that if you scratch my back, I'll scratch yours. In linguistic terms, it would mean that if you speak truthfully to me, I'll speak truthfully to you. Ordinary Darwinian reciprocal altruism, Ulbæk points out, is a relationship established between frequently interacting individuals. For language to prevail across an entire community, however, the necessary reciprocity would have needed to be enforced universally instead of being left to individual choice. Ulbæk concludes that for language to evolve, early society as a whole must have been subject to moral regulation.

Criticism

Critics point out that this theory fails to explain when, how, why or by whom "obligatory reciprocal altruism" could possibly have been enforced. Various proposals have been offered to remedy this defect. A further criticism is that language doesn't work on the basis of reciprocal altruism anyway. Humans in conversational groups don't withhold information to all except listeners likely to offer valuable information in return. On the contrary, they seem to want to advertise to the world their access to socially relevant information, broadcasting it to anyone who will listen without thought of return.

"Gossip and grooming"

Gossip, according to Robin Dunbar, does for group-living humans what manual grooming does for other primates – it allows individuals to service their relationships and so maintain their alliances. As humans began living in larger and larger social groups, the task of manually grooming all one's friends and acquaintances became so time-consuming as to be unaffordable. In response to this problem, humans invented "a cheap and ultra-efficient form of grooming" – vocal grooming. To keep your allies happy, you now needed only to "groom" them with low-cost vocal sounds, servicing multiple allies simultaneously whilst keeping both hands free for other tasks. Vocal grooming (the production of pleasing sounds lacking syntax or combinatorial semantics) then evolved somehow into syntactical speech.

Criticism

Critics of this theory point out that the very efficiency of "vocal grooming" – that words are so cheap – would have undermined its capacity to signal commitment of the kind conveyed by time-consuming and costly manual grooming. A further criticism is that the theory does nothing to explain the crucial transition from vocal grooming – the production of pleasing but meaningless sounds – to the cognitive complexities of syntactical speech.

From pantomime to speech

According to another school of thought, language evolved from mimesis – the "acting out" of scenarios using vocal and gestural pantomime. Charles Darwin, who himself was skeptical, hypothesised that human speech and language is derived from gestures and mouth pantomime. This theory, further elaborated on by various authors, postulates that the genus Homo, different from our ape ancestors, evolved a new type of cognition. Apes are capable of associational learning. They can tie a sensory cue to a motor response often trained through classical conditioning. However, in apes,  the conditioned sensory cue is necessary for a conditioned response to be observed again. The motor response will not occur without an external cue from an outside agent. A remarkable ability that humans possess is the ability to voluntarily retrieve memories without the need for a cue (e.g. conditioned stimulus). This is not an ability that has been observed in animals except language-trained apes. There is still much controversy on whether pantomime is a capability for apes, both wild and captured. For as long as utterances needed to be emotionally expressive and convincing, it was not possible to complete the transition to purely conventional signs. On this assumption, pre-linguistic gestures and vocalisations would have been required not just to disambiguate intended meanings, but also to inspire confidence in their intrinsic reliability. If contractual commitments were necessary in order to inspire community-wide trust in communicative intentions, it would follow that these had to be in place before humans could shift at last to an ultra-efficient, high-speed – digital as opposed to analog – signalling format. Vocal distinctive features (sound contrasts) are ideal for this purpose. It is therefore suggested that the establishment of contractual understandings enabled the decisive transition from mimetic gesture to fully conventionalised, digitally encoded speech.

"Ritual/speech coevolution"

The ritual/speech coevolution theory was originally proposed by the distinguished social anthropologist Roy Rappaport before being elaborated by anthropologists such as Chris Knight, Jerome Lewis, Nick Enfield, Camilla Power and Ian Watts. Cognitive scientist and robotics engineer Luc Steels is another prominent supporter of this general approach, as is biological anthropologist/neuroscientist Terrence Deacon.

These scholars argue that there can be no such thing as a "theory of the origins of language". This is because language is not a separate adaptation but an internal aspect of something much wider – namely, human symbolic culture as a whole. Attempts to explain language independently of this wider context have spectacularly failed, say these scientists, because they are addressing a problem with no solution. Can we imagine a historian attempting to explain the emergence of credit cards independently of the wider system of which they are a part? Using a credit card makes sense only if you have a bank account institutionally recognised within a certain kind of advanced capitalist society – one where communications technology has already been invented and fraud can be detected and prevented. In much the same way, language would not work outside a specific array of social mechanisms and institutions. For example, it would not work for an ape communicating with other apes in the wild. Not even the cleverest ape could make language work under such conditions.

"Lie and alternative, inherent in language, ... pose problems to any society whose structure is founded on language, which is to say all human societies. I have therefore argued that if there are to be words at all it is necessary to establish The Word, and that The Word is established by the invariance of liturgy."

Advocates of this school of thought point out that words are cheap. As digital hallucinations, they are intrinsically unreliable. Should an especially clever ape, or even a group of articulate apes, try to use words in the wild, they would carry no conviction. The primate vocalizations that do carry conviction – those they actually use – are unlike words, in that they are emotionally expressive, intrinsically meaningful and reliable because they are relatively costly and hard to fake.

Speech consists of digital contrasts whose cost is essentially zero. As pure social conventions, signals of this kind cannot evolve in a Darwinian social world – they are a theoretical impossibility. Being intrinsically unreliable, language works only if you can build up a reputation for trustworthiness within a certain kind of society – namely, one where symbolic cultural facts (sometimes called "institutional facts") can be established and maintained through collective social endorsement. In any hunter-gatherer society, the basic mechanism for establishing trust in symbolic cultural facts is collective ritual. Therefore, the task facing researchers into the origins of language is more multidisciplinary than is usually supposed. It involves addressing the evolutionary emergence of human symbolic culture as a whole, with language an important but subsidiary component.

Criticism

Critics of the theory include Noam Chomsky, who terms it the "non-existence" hypothesis – a denial of the very existence of language as an object of study for natural science. Chomsky's own theory is that language emerged in an instant and in perfect form, prompting his critics in turn to retort that only something that doesn't exist – a theoretical construct or convenient scientific fiction – could possibly emerge in such a miraculous way. The controversy remains unresolved.

Twentieth century speculations

Festal origins

The essay "The festal origin of human speech", though published in the late nineteenth century, made little impact until the American philosopher Susanne Langer re-discovered and publicised it in 1941.

"In the early history of articulate sounds they could make no meaning themselves, but they preserved and got intimately associated with the peculiar feelings and perceptions that came most prominently into the minds of the festal players during their excitement."

— J. Donovan, 1891. The Festal Origin of Human Speech.

The theory sets out from the observation that primate vocal sounds are above all emotionally expressive. The emotions aroused are socially contagious. Because of this, an extended bout of screams, hoots or barks will tend to express not just the feelings of this or that individual but the mutually contagious ups and downs of everyone within earshot.

Turning to the ancestors of Homo sapiens, the "festal origin" theory suggests that in the "play-excitement" preceding or following a communal hunt or other group activity, everyone might have combined their voices in a comparable way, emphasizing their mood of togetherness with such noises as rhythmic drumming and hand-clapping. Variably pitched voices would have formed conventional patterns, such that choral singing became an integral part of communal celebration.

Although this was not yet speech, according to Langer, it developed the vocal capacities from which speech would later derive. There would be conventional modes of ululating, clapping or dancing appropriate to different festive occasions, each so intimately associated with that kind of occasion that it would tend to collectively uphold and embody the concept of it. Anyone hearing a snatch of sound from such a song would recall the associated occasion and mood. A melodic, rhythmic sequence of syllables conventionally associated with a certain type of celebration would become, in effect, its vocal mark. On that basis, certain familiar sound sequences would become "symbolic".

In support of all this, Langer cites ethnographic reports of tribal songs consisting entirely of "rhythmic nonsense syllables". She concedes that an English equivalent such as "hey-nonny-nonny", although perhaps suggestive of certain feelings or ideas, is neither noun, verb, adjective, nor any other syntactical part of speech. So long as articulate sound served only in the capacity of "hey nonny-nonny", "hallelujah" or "alack-a-day", it cannot yet have been speech. For that to arise, according to Langer, it was necessary for such sequences to be emitted increasingly out of context – outside the total situation that gave rise to them. Extending a set of associations from one cognitive context to another, completely different one, is the secret of metaphor. Langer invokes an early version of what is nowadays termed "grammaticalization" theory to show how, from, such a point of departure, syntactically complex speech might progressively have arisen.

Langer acknowledges Emile Durkheim as having proposed a strikingly similar theory back in 1912. For recent thinking along broadly similar lines, see Steven Brown on "musilanguage", Chris Knight on "ritual" and "play", Jerome Lewis on "mimicry", Steven Mithen on "Hmmmmm" Bruce Richman on "nonsense syllables" and Alison Wray on "holistic protolanguage".

Mirror neuron hypothesis (MSH) and the Motor Theory of Speech Perception

Mirror Neurons, originally found in the macaque monkey, are neurons which are activated in both the action-performer and action-observer. This is a proposed mechanism in humans.

The mirror neuron hypothesis, based on a phenomenon discovered in 2008 by Rizzolatti and Fabbri, supports the motor theory of speech perception. The motor theory of speech perception was proposed in 1967 by Liberman, who believed that the motor system and language systems were closely interlinked. This would result in a more streamlined process of generating speech; both the cognition and speech formulation could occur simultaneously. Essentially, it is wasteful to have a speech decoding and speech encoding process independent of each other. This hypothesis was further supported by the discovery of motor neurons. Rizzolatti and Fabbri found that there were specific neurons in the motor cortex of macaque monkeys which were activated when seeing an action. The neurons which are activated are the same neurons in which would be required to perform the same action themselves. Mirror neurons fire when observing an action and performing an action, indicating that these neurons found in the motor cortex are necessary for understanding a visual process. The presence of mirror neurons may indicate that non-verbal, gestural communication is far more ancient than previously thought to be. Motor theory of speech perception relies on the understanding of motor representations that underlie speech gestures, such as lip movement. There is no clear understanding of speech perception currently, but it is generally accepted that the motor cortex is activated in speech perception to some capacity.

"Musilanguage"

The term "musilanguage" (or "hmmmmm") refers to a pre-linguistic system of vocal communication from which (according to some scholars) both music and language later derived. The idea is that rhythmic, melodic, emotionally expressive vocal ritual helped bond coalitions and, over time, set up selection pressures for enhanced volitional control over the speech articulators. Patterns of synchronized choral chanting are imagined to have varied according to the occasion. For example, "we're setting off to find honey" might sound qualitatively different from "we're setting off to hunt" or "we're grieving over our relative's death". If social standing depended on maintaining a regular beat and harmonizing one's own voice with that of everyone else, group members would have come under pressure to demonstrate their choral skills.

Archaeologist Steven Mithen speculates that the Neanderthals possessed some such system, expressing themselves in a "language" known as "Hmmmmm", standing for Holistic, manipulative, multi-modal, musical and mimetic. In Bruce Richman's earlier version of essentially the same idea, frequent repetition of the same few songs by many voices made it easy for people to remember those sequences as whole units. Activities that a group of people were doing whilst they were vocalizing together – activities that were important or striking or richly emotional – came to be associated with particular sound sequences, so that each time a fragment was heard, it evoked highly specific memories. The idea is that the earliest lexical items (words) started out as abbreviated fragments of what were originally communal songs.

"Whenever people sang or chanted a particular sound sequence they would remember the concrete particulars of the situation most strongly associated with it: ah, yes! we sing this during this particular ritual admitting new members to the group; or, we chant this during a long journey in the forest; or, when a clearing is finished for a new camp, this is what we chant; or these are the keenings we sing during ceremonies over dead members of our group."

— Richman, B. 2000. How music fixed "nonsense" into significant formulas: on rhythm, repetition, and meaning. In N. L. Wallin, B. Merker and S. Brown (eds), The Origins of Music: An introduction to evolutionary musicology. Cambridge, Massachusetts: MIT Press, pp. 301-314.

As group members accumulated an expanding repertoire of songs for different occasions, interpersonal call-and-response patterns evolved along one trajectory to assume linguistic form. Meanwhile, along a divergent trajectory, polyphonic singing and other kinds of music became increasingly specialised and sophisticated.

To explain the establishment of syntactical speech, Richman cites English "I wanna go home". He imagines this to have been learned in the first instance not as a combinatorial sequence of free-standing words, but as a single stuck-together combination – the melodic sound people make to express "feeling homesick". Someone might sing "I wanna go home", prompting other voices to chime in with "I need to go home", "I'd love to go home", "Let's go home" and so forth. Note that one part of the song remains constant, whilst another is permitted to vary. If this theory is accepted, syntactically complex speech began evolving as each chanted mantra allowed for variation at a certain point, allowing for the insertion of an element from some other song. For example, whilst mourning during a funeral rite, someone might want to recall a memory of collecting honey with the deceased, signaling this at an appropriate moment with a fragment of the "we're collecting honey" song. Imagine that such practices became common. Meaning-laden utterances would now have become subject to a distinctively linguistic creative principle – that of recursive embedding.

Hunter-gatherer egalitarianism

Mbendjele hunter-gatherer meat sharing

Many scholars associate the evolutionary emergence of speech with profound social, sexual, political and cultural developments. One view is that primate-style dominance needed to give way to a more cooperative and egalitarian lifestyle of the kind characteristic of modern hunter-gatherers.

Intersubjectivity

According to Michael Tomasello, the key cognitive capacity distinguishing Homo sapiens from our ape cousins is "intersubjectivity". This entails turn-taking and role-reversal: your partner strives to read your mind, you simultaneously strive to read theirs, and each of you makes a conscious effort to assist the other in the process. The outcome is that each partner forms a representation of the other's mind in which their own can be discerned by reflection.

Tomasello argues that this kind of bi-directional cognition is central to the very possibility of linguistic communication. Drawing on his research with both children and chimpanzees, he reports that human infants, from one year old onwards, begin viewing their own mind as if from the standpoint of others. He describes this as a cognitive revolution. Chimpanzees, as they grow up, never undergo such a revolution. The explanation, according to Tomasello, is that their evolved psychology is adapted to a deeply competitive way of life. Wild-living chimpanzees from despotic social hierarchies, most interactions involving calculations of dominance and submission. An adult chimp will strive to outwit its rivals by guessing at their intentions whilst blocking them from reciprocating. Since bi-directional intersubjective communication is impossible under such conditions, the cognitive capacities necessary for language don't evolve.

Counter-dominance

In the scenario favoured by David Erdal and Andrew Whiten, primate-style dominance provoked equal and opposite coalitionary resistance – counter-dominance. During the course of human evolution, increasingly effective strategies of rebellion against dominant individuals led to a compromise. Whilst abandoning any attempt to dominate others, group members vigorously asserted their personal autonomy, maintaining their alliances to make potentially dominant individuals think twice. Within increasingly stable coalitions, according to this perspective, status began to be earned in novel ways, social rewards accruing to those perceived by their peers as especially cooperative and self-aware.

Reverse dominance

Whilst counter-dominance, according to this evolutionary narrative, culminates in a stalemate, anthropologist Christopher Boehm extends the logic a step further. Counter-dominance tips over at last into full-scale "reverse dominance". The rebellious coalition decisively overthrows the figure of the primate alpha-male. No dominance is allowed except that of the self-organised community as a whole.

As a result of this social and political change, hunter-gatherer egalitarianism is established. As children grow up, they are motivated by those around them to reverse perspective, engaging with other minds on the model of their own. Selection pressures favor such psychological innovations as imaginative empathy, joint attention, moral judgment, project-oriented collaboration and the ability to evaluate one's own behaviour from the standpoint of others. Underpinning enhanced probabilities of cultural transmission and cumulative cultural evolution, these developments culminated in the establishment of hunter-gatherer-style egalitarianism in association with intersubjective communication and cognition. It is in this social and political context that language evolves.

Scenarios involving mother-infant interactions

"Putting the baby down"

According to Dean Falk's "putting the baby down" theory, vocal interactions between early hominin mothers and infants sparked a sequence of events that led, eventually, to our ancestors' earliest words. The basic idea is that evolving human mothers, unlike their monkey and ape counterparts, couldn't move around and forage with their infants clinging onto their backs. Loss of fur in the human case left infants with no means of clinging on. Frequently, therefore, mothers had to put their babies down. As a result, these babies needed reassurance that they were not being abandoned. Mothers responded by developing "motherese" – an infant-directed communicative system embracing facial expressions, body language, touching, patting, caressing, laughter, tickling and emotionally expressive contact calls. The argument is that language somehow developed out of all this.

Criticism

Whilst this theory may explain a certain kind of infant-directed "protolanguage" – known today as "motherese" – it does little to solve the really difficult problem, which is the emergence amongst adults of syntactical speech.

Co-operative breeding

Evolutionary anthropologist Sarah Hrdy observes that only human mothers amongst great apes are willing to let another individual take hold of their own babies; further, we are routinely willing to let others babysit. She identifies lack of trust as the major factor preventing chimpanzee, bonobo or gorilla mothers from doing the same: "If ape mothers insist on carrying their babies everywhere ... it is because the available alternatives are not safe enough". The fundamental problem is that ape mothers (unlike monkey mothers who may often babysit) do not have female relatives nearby. The strong implication is that, in the course of Homo evolution, allocare could develop because Homo mothers did have female kin close by – in the first place, most reliably, their own mothers. Extending the Grandmother hypothesis, Hrdy argues that evolving Homo erectus females necessarily relied on female kin initially; this novel situation in ape evolution of mother, infant and mother's mother as allocarer provided the evolutionary ground for the emergence of intersubjectivity. She relates this onset of "cooperative breeding in an ape" to shifts in life history and slower child development, linked to the change in brain and body size from the 2 million year mark.

Primatologist Klaus Zuberbühler uses these ideas to help explain the emergence of vocal flexibility in the human species. Co-operative breeding would have compelled infants to struggle actively to gain the attention of caregivers, not all of whom would have been directly related. A basic primate repertoire of vocal signals may have been insufficient for this social challenge. Natural selection, according to this view, would have favoured babies with advanced vocal skills, beginning with babbling (which triggers positive responses in care-givers) and paving the way for the elaborate and unique speech abilities of modern humans.

Was "mama" the first word?

These ideas might be linked to those of the renowned structural linguist Roman Jakobson, who claimed that "the sucking activities of the child are accompanied by a slight nasal murmur, the only phonation to be produced when the lips are pressed to the mother's breast ... and the mouth is full". He proposed that later in the infant's development, "this phonatory reaction to nursing is reproduced as an anticipatory signal at the mere sight of food and finally as a manifestation of a desire to eat, or more generally, as an expression of discontent and impatient longing for missing food or absent nurser, and any ungranted wish". So, the action of opening and shutting the mouth, combined with the production of a nasal sound when the lips are closed, yielded the sound sequence "Mama", which may, therefore, count as the very first word. Peter MacNeilage sympathetically discusses this theory in his major book, The Origin of Speech, linking it with Dean Falk's "putting the baby down" theory (see above). Needless to say, other scholars have suggested completely different candidates for Homo sapiens' very first word.

Niche construction theory

A beaver dam in Tierra del Fuego. Beavers adapt to an environmental niche which they shape by their own activities.

Whilst the biological language faculty is genetically inherited, actual languages or dialects are culturally transmitted, as are social norms, technological traditions and so forth. Biologists expect a robust co-evolutionary trajectory linking human genetic evolution with the evolution of culture. Individuals capable of rudimentary forms of protolanguage would have enjoyed enhanced access to cultural understandings, whilst these, conveyed in ways that young brains could readily learn, would, in turn, have become transmitted with increasing efficiency.

In some ways like beavers, as they construct their dams, humans have always engaged in niche construction, creating novel environments to which they subsequently become adapted. Selection pressures associated with prior niches tend to become relaxed as humans depend increasingly on novel environments created continuously by their own productive activities. According to Steven Pinker, language is an adaptation to "the cognitive niche". Variations on the theme of ritual/speech co-evolution – according to which speech evolved for purposes of internal communication within a ritually constructed domain – have attempted to specify more precisely when, why and how this special niche was created by human collaborative activity.

Conceptual frameworks

Structuralism

"Consider a knight in chess. Is the piece by itself an element of the game? Certainly not. For as a material object, separated from its square on the board and the other conditions of play, it is of no significance for the player. It becomes a real, concrete element only when it takes on or becomes identified with its value in the game. Suppose that during a game this piece gets destroyed or lost. Can it be replaced? Of course, it can. Not only by some other knight but even by an object of quite a different shape, which can be counted as a knight, provided it is assigned the same value as the missing piece."

— de Saussure, F. (1983) [1916]. Course in General Linguistics. Translated by R. Harris. London: Duckworth. pp. 108–09.

The Swiss scholar Ferdinand de Saussure founded linguistics as a twentieth-century professional discipline. Saussure regarded a language as a rule-governed system, much like a board game such as chess. In order to understand chess, he insisted, we must ignore such external factors as the weather prevailing during a particular session or the material composition of this or that piece. The game is autonomous with respect to its material embodiments. In the same way, when studying language, it's essential to focus on its internal structure as a social institution. External matters (e.g., the shape of the human tongue) are irrelevant from this standpoint. Saussure regarded 'speaking' (parole) as individual, ancillary and more or less accidental by comparison with "language" (langue), which he viewed as collective, systematic and essential.

Saussure showed little interest in Darwin's theory of evolution by natural selection. Nor did he consider it worthwhile to speculate about how language might originally have evolved. Saussure's assumptions in fact cast doubt on the validity of narrowly conceived origins scenarios. His structuralist paradigm, when accepted in its original form, turns scholarly attention to a wider problem: how our species acquired the capacity to establish social institutions in general.

Behaviourism

"The basic processes and relations which give verbal behavior its special characteristics are now fairly well understood. Much of the experimental work responsible for this advance has been carried out on other species, but the results have proved to be surprisingly free of species restrictions. Recent work has shown that the methods can be extended to human behavior without serious modification."

— Skinner, B.F. (1957). Verbal Behavior. New York: Appleton Century Crofts. p. 3.

In the United States, prior to and immediately following World War II, the dominant psychological paradigm was behaviourism. Within this conceptual framework, language was seen as a certain kind of behaviour – namely, verbal behaviour, to be studied much like any other kind of behaviour in the animal world. Rather as a laboratory rat learns how to find its way through an artificial maze, so a human child learns the verbal behaviour of the society into which it is born. The phonological, grammatical and other complexities of speech are in this sense "external" phenomena, inscribed into an initially unstructured brain. Language's emergence in Homo sapiens, from this perspective, presents no special theoretical challenge. Human behaviour, whether verbal or otherwise, illustrates the malleable nature of the mammalian – and especially the human – brain.

Chomskyan Nativism

The modularity of mind is an idea which was prefigured in some respects by the 19th-century movement of phrenology.
 

Nativism is the theory that humans are born with certain specialised cognitive modules enabling us to acquire highly complex bodies of knowledge such as the grammar of a language.

"There is a long history of study of the origin of language, asking how it arose from calls of apes and so forth. That investigation in my view is a complete waste of time because language is based on an entirely different principle than any animal communication system."

— Chomsky, N. (1988). Language and Problems of Knowledge. Cambridge, Massachusetts: MIT Press. p. 183.

From the mid-1950s onwards, Noam Chomsky, Jerry Fodor and others mounted what they conceptualised as a 'revolution' against behaviourism. Retrospectively, this became labelled 'the cognitive revolution'. Whereas behaviourism had denied the scientific validity of the concept of "mind", Chomsky replied that, in fact, the concept of "body" is more problematic. Behaviourists tended to view the child's brain as a tabula rasa, initially lacking structure or cognitive content. According to B. F. Skinner, for example, richness of behavioural detail (whether verbal or non-verbal) emanated from the environment. Chomsky turned this idea on its head. The linguistic environment encountered by a young child, according to Chomsky's version of psychological nativism, is in fact hopelessly inadequate. No child could possibly acquire the complexities of grammar from such an impoverished source. Far from viewing language as wholly external, Chomsky re-conceptualised it as wholly internal. To explain how a child so rapidly and effortlessly acquires its natal language, he insisted, we must conclude that it comes into the world with the essentials of grammar already pre-installed. No other species, according to Chomsky, is genetically equipped with a language faculty – or indeed with anything remotely like one. The emergence of such a faculty in Homo sapiens, from this standpoint, presents biological science with a major theoretical challenge.

Speech act theory

One way to explain biological complexity is by reference to its inferred function. According to the influential philosopher John Austin, speech's primary function is active in the social world.

Speech acts, according to this body of theory, can be analyzed on three different levels: elocutionary, illocutionary and perlocutionary. An act is locutionary when viewed as the production of certain linguistic sounds – for example, practicing correct pronunciation in a foreign language. An act is illocutionary insofar as it constitutes an intervention in the world as jointly perceived or understood. Promising, marrying, divorcing, declaring, stating, authorizing, announcing and so forth are all speech acts in this illocutionary sense. An act is perlocutionary when viewed in terms of its direct psychological effect on an audience. Frightening a baby by saying 'Boo!' would be an example of a "perlocutionary" act.

For Austin, "doing things" with words means, first and foremost, deploying illocutionary force. The secret of this is community participation or collusion. There must be a 'correct' (conventionally agreed) procedure, and all those concerned must accept that it has been properly followed.

"One of our examples was, for instance, the utterance 'I do' (take this woman to be my lawful wedded wife), as uttered in the course of a marriage ceremony. Here we should say that in saying these words we are doing something — namely, marrying, rather than reporting something, namely that we are marrying."

— Austin, J.L. (1962). How To Do Things With Words. Oxford: Oxford University Press. pp. 12–13.

In the case of a priest declaring a couple to be man and wife, his words will have illocutionary force only if he is properly authorised and only if the ceremony is properly conducted, using words deemed appropriate to the occasion. Austin points out that should anyone attempt to baptise a penguin, the act would be null and void. For reasons which have nothing to do with physics, chemistry or biology, baptism is inappropriate to be applied to penguins, irrespective of the verbal formulation used.

This body of theory may have implications for speculative scenarios concerning the origins of speech. "Doing things with words" presupposes shared understandings and agreements pertaining not just to language but to social conduct more generally. Apes might produce sequences of structured sound, influencing one another in that way. To deploy illocutionary force, however, they would need to have entered a non-physical and non-biological realm – one of shared contractual and other intangibles. This novel cognitive domain consists of what philosophers term "institutional facts" – objective facts whose existence, paradoxically, depends on communal faith or belief. Few primatologists, evolutionary psychologists or anthropologists consider that nonhuman primates are capable of the necessary levels of joint attention, sustained commitment or collaboration in pursuit of future goals.

Biosemiotics

The structure of part of a DNA double helix
 

"the deciphering of the genetic code has revealed our possession of a language much older than hieroglyphics, a language as old as life itself, a language that is the most living language of all — even if its letters are invisible and its words are buried in the cells of our bodies."

— Beadle, G.; Beadle, M. (1966). The Language of Life. An introduction to the science of genetics. New York: Doubleday and Co.

Biosemiotics is a relatively new discipline, inspired in large part by the discovery of the genetic code in the early 1960s. Its basic assumption is that Homo sapiens is not alone in its reliance on codes and signs. Language and symbolic culture must have biological roots, hence semiotic principles must apply also in the animal world.

The discovery of the molecular structure of DNA apparently contradicted the idea that life could be explained, ultimately, in terms of the fundamental laws of physics. The letters of the genetic alphabet seemed to have "meaning", yet meaning is not a concept that has any place in physics. The natural science community initially solved this difficulty by invoking the concept of "information", treating information as independent of meaning. But a different solution to the puzzle was to recall that the laws of physics in themselves are never sufficient to explain natural phenomena. To explain, say, the unique physical and chemical characteristics of the planets in our solar system, scientists must work out how the laws of physics became constrained by particular sequences of events following the formation of the Sun.

According to Howard Pattee, the same principle applies to the evolution of life on earth, a process in which certain "frozen accidents" or "natural constraints" have from time to time drastically reduced the number of possible evolutionary outcomes. Codes, when they prove to be stable over evolutionary time, are constraints of this kind. The most fundamental such "frozen accident" was the emergence of DNA as a self-replicating molecule, but the history of life on earth has been characterised by a succession of comparably dramatic events, each of which can be conceptualised as the emergence of a new code. From this perspective, the evolutionary emergence of spoken language was one more event of essentially the same kind.

The handicap principle

A peacock's tail: a classic example of costly signalling

In 1975, the Israeli theoretical biologist Amotz Zahavi proposed a novel theory which, although controversial, has come to dominate Darwinian thinking on how signals evolve. Zahavi's "handicap principle" states that to be effective, signals must be reliable; to be reliable, the bodily investment in them must be so high as to make cheating unprofitable.

Paradoxically, if this logic is accepted, signals in nature evolve not to be efficient but, on the contrary, to be elaborate and wasteful of time and energy. A peacock's tail is the classic illustration. Zahavi's theory is that since peahens are on the look-out for male braggarts and cheats, they insist on a display of quality so costly that only a genuinely fit peacock could afford to pay. Needless to say, not all signals in the animal world are quite as elaborate as a peacock's tail. But if Zahavi is correct, all require some bodily investment – an expenditure of time and energy which "handicaps" the signaller in some way.

Animal vocalizations (according to Zahavi) are reliable because they are faithful reflections of the state of the signaller's body. To switch from an honest to a deceitful call, the animal would have to adopt a different bodily posture. Since every bodily action has its own optimal starting position, changing that position to produce a false message would interfere with the task of carrying out the action really intended. The gains made by cheating would not make up for the losses incurred by assuming an improper posture – and so the phony message turns out to be not worth its price. This may explain, in particular, why ape and monkey vocal signals have evolved to be so strikingly inflexible when compared with the varied speech sounds produced by the human tongue. The apparent inflexibility of chimpanzee vocalizations may strike the human observer as surprising until we realize that being inflexible is necessarily bound up with being perceptibly honest in the sense of "hard-to-fake".

If we accept this theory, the emergence of speech becomes theoretically impossible. Communication of this kind just cannot evolve. The problem is that words are cheap. Nothing about their acoustic features can reassure listeners that they are genuine and not fakes. Any strategy of reliance on someone else's tongue – perhaps the most flexible organ in the body – presupposes unprecedented levels of honesty and trust. To date, Darwinian thinkers have found it difficult to explain the requisite levels of community-wide cooperation and trust.

An influential standard textbook is Animal Signals, by John Maynard Smith and David Harper. These authors divide the costs of communication into two components, (1) the investment necessary to ensure transmission of a discernible signal; (2) the investment necessary to guarantee that each signal is reliable and not a fake. The authors point out that although costs in the second category may be relatively low, they are not zero. Even in relatively relaxed, cooperative social contexts – for example, when communication is occurring between genetic kin – some investment must be made to guarantee reliability. In short, the notion of super-efficient communication – eliminating all costs except those necessary for successful transmission – is biologically unrealistic. Yet speech comes precisely into this category.

Johnstone's 1997 representation of the handicap principle

The graph shows the different signal intensities as a result of costs and benefits. If two individuals face different costs but have the same benefits, or have different benefits but the same cost, they will signal at different levels. The higher signal represents a more reliable quality. The high-quality individual will maximise costs relative to benefits at a high signal intensities, whilst the low-quality individual maximises their benefits relative to cost at low signal intensity. The high-quality individual is shown to take more risks (greater cost), which can be understood in terms of honest signals, which are expensive. The stronger you are, the more easily you can bear the cost of the signal, making you a more appealing mating partner. The low-quality individuals are less likely to be able to afford a specific signal, and will consequently be less likely to attract a female.

Cognitive linguistics

Cognitive linguistics views linguistic structure as arising continuously out of usage. Speakers are forever discovering new ways to convey meanings by producing sounds, and in some cases, these novel strategies become conventionalised. Between the phonological structure and semantic structure, there is no causal relationship. Instead, each novel pairing of sound and meaning involves an imaginative leap.

In their book, Metaphors We Live By, George Lakoff and Mark Johnson helped pioneer this approach, claiming that metaphor is what makes human thought special. All language, they argued, is permeated with metaphor, whose use in fact constitutes distinctively human – that is, distinctively abstract – thought. To conceptualise things which cannot be directly perceived – intangibles such as time, life, reason, mind, society or justice – we have no choice but to set out from more concrete and directly perceptible phenomena such as motion, location, distance, size and so forth. In all cultures across the world, according to Lakoff and Johnson, people resort to such familiar metaphors as ideas are locations, thinking is moving and mind is body. For example, we might express the idea of "arriving at a crucial point in our argument" by proceeding as if literally traveling from one physical location to the next.

Metaphors, by definition, are not literally true. Strictly speaking, they are fictions – from a pedantic standpoint, even falsehoods. But if we couldn't resort to metaphorical fictions, it's doubtful whether we could even form conceptual representations of such nebulous phenomena as "ideas", thoughts", "minds", and so forth.

The bearing of these ideas on current thinking on speech origins remains unclear. One suggestion is that ape communication tends to resist the metaphor for social reasons. Since they inhabit a Darwinian (as opposed to morally regulated) social world, these animals are under strong competitive pressure not to accept patent fictions as valid communicative currency. Ape vocal communication tends to be inflexible, marginalizing the ultra-flexible tongue, precisely because listeners treat with suspicion any signal which might prove to be a fake. Such insistence on perceptible veracity is clearly incompatible with metaphoric usage. An implication is that neither articulate speech nor distinctively human abstract thought could have begun evolving until our ancestors had become more cooperative and trusting of one another's communicative intentions.

Natural science vs social science interpretations

Social reality

When people converse with one another, according to the American philosopher John Searle, they're making moves, not in the real world which other species inhabit, but in a shared virtual realm peculiar to ourselves. Unlike the deployment of muscular effort to move a physical object, the deployment of illocutionary force requires no physical effort (except the movement of the tongue/mouth to produce speech) and produces no effect which any measuring device could detect. Instead, our action takes place on a quite different level – that of social reality. This kind of reality is in one sense hallucinatory, being a product of collective intentionality. It consists, not of "brute facts" – facts which exist anyway, irrespective of anyone's belief – but of "institutional facts", which "exist" only if you believe in them. Government, marriage, citizenship and money are examples of "institutional facts". One can distinguish between "brute" facts and "institutional" ones by applying a simple test. Suppose no one believed in the fact – would it still be true? If the answer is "yes", it's "brute". If the answer is "no", it's "institutional".

"Imagine a group of primitive creatures, more or less like ourselves ... Now imagine that acting as a group, they build a barrier, a wall around the place where they live ... The wall is designed to keep intruders out and keep members of the group in ... Let us suppose that the wall gradually decays. It slowly deteriorates until all that is left is a line of stones. But let us suppose that the inhabitants continue to treat the line of stones as if it could perform the function of the wall. Let us suppose that, as a matter of fact, they treat the line of stones just as if they understood that it was not to be crossed ... This shift is the decisive move in the creation of institutional reality. It is nothing less than the decisive move in the creation of what we think of as distinctive in humans, as opposed to animals, societies."

— John R. Searle (1995). The construction of social reality. Free Press. p. 134.

The facts of language in general and of speech, in particular, are, from this perspective, "institutional" rather than "brute". The semantic meaning of a word, for example, is whatever its users imagine it to be. To "do things with words" is to operate in a virtual world which seems real because we share it in common. In this incorporeal world, the laws of physics, chemistry, and biology do not apply. That explains why illocutionary force can be deployed without exerting muscular effort. Apes and monkeys inhabit the "brute" world. To make an impact, they must scream, bark, threaten, seduce or in other ways invest bodily effort. If they were invited to play chess, they would be unable to resist throwing their pieces at one another. Speech is not like that. A few movements of the tongue, under appropriate conditions, can be sufficient to open parliament, annul a marriage, confer a knighthood or declare war. To explain, on a Darwinian basis, how such apparent magic first began to work, we must ask how, when and why Homo sapiens succeeded in establishing the wider domain of institutional facts.

Nature or society?

"Brute facts", in the terminology of speech act philosopher John Searle, are facts which are true anyway, regardless of human belief. For example, a person might not believe in gravity; however, if the person jumped over a cliff, they would still fall. Natural science is the study of facts of this kind. "Institutional facts" are fictions accorded factual status within human social institutions. Monetary and commercial facts are fictions of this kind. The complexities of today's global currency system are facts only whilst society believes in them: suspend the belief and the facts correspondingly dissolve. Yet although institutional facts rest on human belief, that doesn't make them mere distortions or hallucinations. Take a person's confidence that two five-pound banknotes are worth ten pounds. That is not merely a subjective belief: it's an objective, indisputable fact. But now imagine a collapse of public confidence in the currency system. Suddenly, the realities in a person's pocket dissolve.

Scholars who doubt the scientific validity of the notion of "institutional facts" include Noam Chomsky, for whom language is not social. In Chomsky's view, language is a natural object (a component of the individual brain) and its study, therefore, a branch of natural science. In explaining the origin of language, scholars in this intellectual camp invoke non-social developments – in Chomsky's case, a random genetic mutation. Chomsky argues that language might exist inside the brain of a single mutant gorilla even if no one else believed in it, even if no one else existed apart from the mutant – and even if the gorilla in question remained unaware of its existence, never actually speaking. In the opposite philosophical camp are those who, in the tradition of Ferdinand de Saussure, argue that if no one believed in words or rules, they simply would not exist. These scholars, correspondingly, regard language as essentially institutional, concluding that linguistics should be considered a topic within social science. In explaining the evolutionary emergence of language, scholars in this intellectual camp tend to invoke profound changes in social relationships.

Criticism. Darwinian scientists today see little value in the traditional distinction between "natural" and "social" science. Darwinism in its modern form is the study of cooperation and competition in nature – a topic which is intrinsically social. Against this background, there is an increasing awareness amongst evolutionary linguists and Darwinian anthropologists that traditional inter-disciplinary barriers can have damaging consequences for investigations into the origins of speech.

Generative anthropology

Generative anthropology is a field of study based on the hypothesis that the origin of human language happened in a singular event. The discipline of Generative Anthropology centers upon this original event which Eric Gans calls The Originary Scene. This scene is a kind of origin story that hypothesizes the specific event where language originated. The Originary Scene is powerful because any human ability: our ability to do science, to be ironic, to love, to think, to dominate, etc can be carefully explained first by reference to this scene of origin.

Because The Originary Scene was the origin of all things human; Generative Anthropology attempts to understand all cultural phenomena in the simplest terms possible: all things human can be traced back to this hypothetical single origin point.

Eric Gans and the origin of generative anthropology

Generative Anthropology originated with Professor Eric Gans of UCLA who developed his ideas in a series of books and articles beginning with The Origin of Language: A Formal Theory of Representation (1981), which builds on the ideas of René Girard, notably that of mimetic desire. However, in establishing the theory of Generative Anthropology, Gans departs from and goes beyond Girard's work in many ways. Generative Anthropology is therefore an independent and original way of understanding the human species, its origin, culture, history, and development.

Anthropoetics

Gans founded (and edits) the web-based journal Anthropoetics: The Journal of Generative Anthropology as a scholarly forum for research into human culture and origins based on his theories of Generative Anthropology and the closely related theories of fundamental anthropology developed by René Girard. In his online Chronicles of Love and Resentment Gans applies the principles of Generative Anthropology to a wide variety of fields including popular culture, film, post-modernism, economics, contemporary politics, the Holocaust, philosophy, religion, and paleo-anthropology.

The originary hypothesis of human language

The central hypothesis of generative anthropology is that the origin of language was a singular event. Human language is radically different from animal communication systems. It possesses syntax, allowing for unlimited new combinations and content; it is symbolic, and it possesses a capacity for history. Thus it is hypothesized that the origin of language must have been a singular event, and the principle of parsimony requires that it originated only once.

Language makes possible new forms of social organization radically different from animal "pecking order" hierarchies dominated by an alpha male. Thus, the development of language allowed for a new stage in human evolution – the beginning of culture, including religion, art, desire, and the sacred. As language provides memory and history via a record of its own history, language itself can be defined via a hypothesis of its origin based on our knowledge of human culture. As with any scientific hypothesis, its value is in its ability to account for the known facts of human history and culture.

Mimetic behaviour

Mimetic (imitatory) behaviour connects proto-hominid species with humans. Imitation is an adaptive learning behavior, a form of intelligence favored by natural selection. Imitation, however, as René Girard observes, leads to conflict when two individuals imitate each other in their attempt to appropriate a desired object. The problem is to explain the transition from one form of mimesis, imitation, to another, representation. Although many anthropologists have hypothesized that language evolved to help humans describe their world, this ignores the fact that intra-species violence, not the environment, poses the greatest threat to human existence. Human representation, according to Gans, is not merely a "natural" evolutionary development of animal communication systems, but is a radical departure from it. The signifier implies a symbolic dimension that is not reducible to empirical referents.

The originary event

At the event of the origin of language, there was a proto-human hominid species which had gradually become more mimetic, presumably in response to environmental pressures including climate changes and competition for limited resources. Higher primates have dominance hierarchies which serve to limit and prevent destructive conflict within the social group. However, as individuals within the proto-human group became more mimetic, the dominance system broke down and became inadequate to control the threat of violence posed by conflictual mimesis.

Gans asks us to imagine an "originary event" along the following lines: A group of hominids have surrounded a food object, e.g. the body of a large mammal following a hunt. The attraction of the object, however, exceeds the limits of simple appetite due to the operation of group mimesis, essentially an expression of competition or rivalry. The object becomes more attractive simply because each member of the group finds it attractive: each individual in the group observes the attention that his rivals give the object. Actual appetite is artificially inflated through this mutual reinforcement. The power of appetitive mimesis in conjunction with the threat of violence is such that the central object begins to assume a sacred aura – infinitely desirable and infinitely dangerous.

Mimesis thus gives rise to a pragmatic paradox: the double imperative to take the desired object for personal gain, and to refrain from taking it to avoid conflict. In other words, imitating the rival means not imitating the rival, because imitation leads to conflict, the attempt to destroy rather than imitate (Gans, Signs of Paradox 18). Generative Anthropology theorizes that when this mimetic instinct becomes so powerful that it seems to possess a sacred force endangering the survival of the group, the resultant intra-species pressure favours the emergence of the sign.

No member of the group is able to take the sacred object, and at least one member of the group intends this aborted gesture as a sign designating the central object. This meaning is successfully communicated to the group, who follow suit by reading their aborted gestures as signs also. The sign focuses attention on the sacred power of the central object, which is conceived as the source of its own power. The object which compels attention yet prohibits consumption can only be represented. The basic advantage of the sign over the object is that "The sign is an economical substitute for its inaccessible referent. Things are scarce and consequently objects of potential contention; signs are abundant because they can be reproduced at will" (Gans, Originary Thinking 9). The desire for the object is mediated by the sign, which paradoxically both creates desire, by attributing significance to the object, yet also defers desire, by designating the object as sacred or taboo. The mimetic impulse is sublimated, expressed in a different form, as the act of representation. Individual self-consciousness is also born at this moment, in the recognition of alienation from the sacred center. The primary value/function of the sign in this scenario is ethical, as the deferral of violence, but the sign is also referential. What the sign refers to, strictly speaking, is not the physical object, but rather the mediated object of desire as realized in the imagination of each individual.

The emergence of the sign is only a temporary deferral of violence. It is immediately followed by the sparagmos, the discharge of the mimetic tension created by the sign in the violent dismemberment and consumption of the worldly incarnation of the sign, the central appetitive object. The violence of the sparagmos is mediated by the sign and thus directed towards the central object rather than the other members of the group. By including the sparagmos in the originary hypothesis, Gans intends to incorporate Girard's insights into scapegoating and the sacrificial (see Signs of Paradox 131–151).

The "scene of representation" is fundamentally social or interpersonal. The act of representation always implies the presence of another or others. The use of a sign evokes the communal scene of representation, structured by a sacred center and a human periphery. The significance of the sign seems to emerge from the sacred center (in its resistance to appropriation), but the pragmatic significance of the sign is realized in the peace brokered amongst the humans on the periphery.

All signs point to the sacred, that which is significant to the community. The sacred cannot be signified directly, since it is essentially an imaginary or ideal construction of mimetic desire. The significance is realized in the human relationships as mediated by the sign. When an individual refers to an object or idea, the reference is fundamentally to the significance of that object or idea for the human community. Language attempts to reproduce the non-violent presence of the community to itself, even though it may attempt to do so sacrificially, by designating a scapegoat victim.

Generative Anthropology is so called because human culture is understood as a "genetic" development of the originary event. The scene of representation is a true cultural universal, but it must be analyzed in terms of its dialectical development. The conditions for the generation of significance are subject to historical evolution, so that the formal articulation of the sign always includes a dialogical relationship to past forms.

Generative Anthropology Society and Conference

The Generative Anthropology Society & Conference (GASC) is a scholarly association formed for the purpose of facilitating intellectual exchange amongst those interested in fundamental reflection on the human, originary thinking, and Generative Anthropology, including support for regular conferences. GASC was formally organized on June 24, 2010 at Westminster College, Salt Lake City during the 4th Annual Generative Anthropology Summer Conference.

Since 2007, Generative Anthropology Society & Conference (GASC) has held an annual summer conference on Generative Anthropology.

2007 - Kwantlen University College of University of British Columbia (Vancouver, British Columbia)

2008 - Chapman University (Orange, California)

2009 - University of Ottawa (Ottawa, Ontario)

2010 - Westminster College (Utah) (Salt Lake City) and Brigham Young University (Provo, Utah)

2011 - High Point University (High Point, North Carolina)

2012 - International Christian University (Tokyo, Japan)

2013 - University of California, Los Angeles

2014 - University of Victoria (Greater Victoria, British Columbia), Canada

2015 - High Point University (High Point, North Carolina)

2016 - Kinjo Gakuin University (Nagoya, Japan)

Origins of society

From Wikipedia, the free encyclopedia

The origins of society — the evolutionary emergence of distinctively human social organization — is an important topic within evolutionary biology, anthropology, prehistory and palaeolithic archaeology. While little is known for certain, debates since Hobbes and Rousseau have returned again and again to the philosophical, moral and evolutionary questions posed.

Social origins in nature

Origin of social groups

Thomas Hobbes

Frontispiece of "Leviathan," by Abraham Bosse, with input from Hobbes

Arguably the most influential theory of human social origins is that of Thomas Hobbes, who in his Leviathan argued that without strong government, society would collapse into Bellum omnium contra omnes — "the war of all against all":

In such condition, there is no place for industry; because the fruit thereof is uncertain: and consequently no culture of the earth; no navigation, nor use of the commodities that may be imported by sea; no commodious building; no instruments of moving, and removing, such things as require much force; no knowledge of the face of the earth; no account of time; no arts; no letters; no society; and which is worst of all, continual fear, and danger of violent death; and the life of man, solitary, poor, nasty, brutish, and short.

— "Chapter XIII: Of the Natural Condition of Mankind As Concerning Their Felicity, and Misery.", Leviathan

Hobbes' innovation was to attribute the establishment of society to a founding 'social contract', in which the Crown's subjects surrender some part of their freedom in return for security.

If Hobbes' idea is accepted, it follows that society could not have emerged prior to the state. This school of thought has remained influential to this day. Prominent in this respect is British archaeologist Colin Renfrew (Baron Renfrew of Kaimsthorn), who points out that the state did not emerge until long after the evolution of Homo sapiens. The earliest representatives of our species, according to Renfrew, may well have been anatomically modern, but they were not yet cognitively or behaviourally modern. For example, they lacked political leadership, large-scale cooperation, food production, organised religion, law or symbolic artefacts. Humans were simply hunter-gatherers, who — much like extant apes — ate whatever food they could find in the vicinity. Renfrew controversially suggests that hunter-gatherers to this day think and socialise along lines not radically different from those of their nonhuman primate counterparts. In particular, he says that they do not "ascribe symbolic meaning to material objects" and for that reason "lack fully developed 'mind.'"

However, hunter-gatherer ethnographers emphasise that extant foraging peoples certainly do have social institutions — notably institutionalised rights and duties codified in formal systems of kinship. Elaborate rituals such as initiation ceremonies serve to cement contracts and commitments, quite independently of the state. Other scholars would add that insofar as we can speak of "human revolutions" — "major transitions" in human evolution — the first was not the Neolithic Revolution but the rise of symbolic culture that occurred toward the end of the Middle Stone Age.

Arguing the exact opposite of Hobbes's position, anarchist anthropologist Pierre Clastres views the state and society as mutually incompatible: genuine society is always struggling to survive against the state.

Jean-Jacques Rousseau

Rousseau in 1753

Like Hobbes, Jean-Jacques Rousseau argued that society was born in a social contract. In Rousseau's case, however, sovereignty is vested in the entire populace, who enter into the contract directly with one another. "The problem", he explained, "is to find a form of association which will defend and protect with the whole common force the person and goods of each associate, and in which each, while uniting himself with all, may still obey himself alone, and remain as free as before." This is the fundamental problem of which the Social Contract provides the solution. The contract's clauses, Rousseau continued, may be reduced to one — "the total alienation of each associate, together with all his rights, to the whole community. Each man, in giving himself to all, gives himself to nobody; and as there is no associate over whom he does not acquire the same right as he yields others over himself, he gains an equivalent for everything he loses, and an increase of force for the preservation of what he has". In other words: "Each of us puts his person and all his power in common under the supreme direction of the general will, and, in our corporate capacity, we receive each member as an indivisible part of the whole." At once, in place of the individual personality of each contracting party, this act of association creates a moral and collective body, composed of as many members as the assembly contains votes, and receiving from this act its unity, its common identity, its life and its will. By this means, each member of the community acquires not only the capacities of the whole but also, for the first time, rational mentality:

The passage from the state of nature to the civil state produces a very remarkable change in man, by substituting justice for instinct in his conduct, and giving his actions the morality they had formerly lacked. Then only, when the voice of duty takes the place of physical impulses and right of appetite, does man, who so far had considered only himself, find that he is forced to act on different principles, and to consult his reason before listening to his inclinations.

— Jean-Jacques Rousseau, The Social Contract and Discourses. Trans. G. D. H. Cole. New edition. London & Melbourne: Dent. Book I Ch. 8.

Sir Henry Sumner Maine

In his influential book, Ancient Law (1861), Maine argued that in early times, the basic unit of human social organisation was the patriarchal family:

Sir Henry James Sumner Maine

The effect of the evidence derived from comparative jurisprudence is to establish the view of the primeval condition of the human race which is known as the Patriarchal Theory.

— Maine, H. S. 1861. Ancient Law. London: John Murray. p. 122.

Hostile to French revolutionary and other radical social ideas, Maine's motives were partly political. He sought to undermine the legacy of Rousseau and other advocates of man's natural rights by asserting that originally, no one had any rights at all – ‘every man, living during the greater part of his life under the patriarchal despotism, was practically controlled in all his actions by a regimen not of law but of caprice’. Not only were the patriarch's children subject to what Maine calls his ‘despotism’: his wife and his slaves were equally affected. The very notion of kinship, according to Maine, was simply a way of categorizing those who were forcibly subjected to the despot's arbitrary rule. Maine later added a Darwinian strand to this argument. In his The Descent of Man, Darwin had cited reports that a wild-living male gorilla would monopolise for itself as large a harem of females as it could violently defend. Maine endorsed Darwin's speculation that ‘primeval man’ probably 'lived in small communities, each with as many wives as he could support and obtain, whom he would have jealously guarded against all other men’. Under pressure to spell out exactly what he meant by the term 'patriarchy', Maine clarified that ‘sexual jealousy, indulged through power, might serve as a definition of the Patriarchal Family’.

Lewis Henry Morgan

Lewis H. Morgan

In his influential book, Ancient Society (1877), its title echoing Maine's Ancient Law, Lewis Henry Morgan proposed a very different theory. Morgan insisted that throughout the earlier periods of human history, neither the state nor the family existed.

It may be here premised that all forms of government are reducible to two general plans, using the word plan in its scientific sense. In their bases the two are fundamentally distinct. The first, in the order of time, is founded upon persons, and upon relations purely personal, and may be distinguished as a society (societas). The gens is the unit of this organization; giving as the successive stages of integration, in the archaic period, the gens, the phratry, the tribe, and the confederacy of tribes, which constituted a people or nation (populus). At a later period a coalescence of tribes in the same area into a nation took the place of a confederacy of tribes occupying independent areas. Such, through prolonged ages, after the gens appeared, was the substantially universal organization of ancient society; and it remained among the Greeks and Romans after civilization supervened. The second is founded upon territory and upon property, and may be distinguished as a state (civitas).

— Morgan, L. H. 1877. Ancient Society. Chicago: Charles H. Kerr, p. 6.

In place of both family and state, according to Morgan, was the gens — nowadays termed the 'clan' — based initially on matrilocal residence and matrilineal descent. This aspect of Morgan's theory, later endorsed by Karl Marx and Frederick Engels, is nowadays widely considered discredited (but for a critical survey of the current consensus, see Knight 2008, 'Early Human Kinship Was Matrilineal').

Friedrich Engels

Friedrich Engels

Friedrich Engels built on Morgan's ideas in his 1884 essay, The Origin of the Family, Private Property and the State in the light of the researches of Lewis Henry Morgan. His primary interest was the position of women in early society, and — in particular — Morgan's insistence that the matrilineal clan preceded the family as society's fundamental unit. 'The mother-right gens', wrote Engels in his survey of contemporary historical materialist scholarship, 'has become the pivot around which the entire science turns...' Engels argued that the matrilineal clan represented a principle of self-organization so vibrant and effective that it allowed no room for patriarchal dominance or the territorial state.

The first class antagonism which appears in human history coincides with the development of the antagonism between man and woman in monogamian marriage, and the first class oppression with that of the female sex by the male.

— Engels, F. 1940 [1884] The origin of the family, private property and the state. London: Lawrence and Wishart.

Emile Durkheim

Émile Durkheim

Emile Durkheim considered that in order to exist, any human social system must counteract the natural tendency for the sexes to promiscuously conjoin. He argued that social order presupposes sexual morality, which is expressed in prohibitions against sex with certain people or during certain periods — in traditional societies particularly during menstruation.

One first fact is certain: that is, that the entire system of prohibitions must strictly conform to the ideas that primitive man had about menstruation and about menstrual blood. For all these taboos start only with the onset of puberty: and it is only when the first signs of blood appear that they reach their maximum rigour.

— Durkheim, E. 1963 [1898]. La prohibition de l'inceste et ses origines. L'Année Sociologique 1: 1–70. Reprinted as Incest. The nature and origin of the taboo, trans. E. Sagarin. New York: Stuart, p. 81.

The incest taboo, wrote Durkheim in 1898, is no more than a particular example of something more basic and universal - the ritualistic setting apart of 'the sacred' from 'the profane'. This begins as the segregation of the sexes, each of which - at least on important occasions - is 'sacred' or 'set apart' from the other. 'The two sexes', as Durkheim explains, 'must avoid each other with the same care as the profane flees from the sacred and the sacred from the profane.' Women as sisters act out the role of 'sacred' beings invested 'with an isolating power of some sort, a power which holds the masculine population at a distance.' Their menstrual blood in particular sets them in a category apart, exercising a 'type of repulsing action which keeps the other sex far from them'. In this way, the earliest ritual structure emerges — establishing morally regulated 'society' for the first time.

Sigmund Freud

Charles Darwin pictured early human society as resembling that of apes, with one or more dominant males jealously guarding a harem of females. In his myth of the 'Primal Horde', Sigmund Freud later took all this as his starting point but then postulated an insurrection mounted by the tyrant's own sons:

All that we find there is a violent and jealous father who keeps all the females for himself and drives away his sons as they grow up…. One day the brothers who had been driven out came together, killed and devoured their father and so made an end of the patriarchal horde.

— Freud, S. 1965 [1913]. Totem and Taboo. London: Routledge, p. 141.

Following this, the band of brothers were about to take sexual possession of their mothers and sisters when suddenly they were overcome with remorse. In their contradictory emotional state, their dead father now became stronger than the living one had been. In memory of him, the brothers revoked their deed by forbidding the killing and eating of the 'totem' (as their father had now become) and renouncing their claim to the women who had just been set free. In this way, the two fundamental taboos of primitive society – not to eat the totem and not to marry one's sisters – were established for the first time.

Marshall Sahlins

A related but less dramatic version of Freud's 'sexual revolution' idea was proposed in 1960 by American social anthropologist Marshall Sahlins. Somehow, he writes, the world of primate brute competition and sexual dominance was turned upside-down:

The decisive battle between early culture and human nature must have been waged on the field of primate sexuality…. Among subhuman primates sex had organized society; the customs of hunters and gatherers testify eloquently that now society was to organize sex…. In selective adaptation to the perils of the Stone Age, human society overcame or subordinated such primate propensities as selfishness, indiscriminate sexuality, dominance and brute competition. It substituted kinship and co-operation for conflict, placed solidarity over sex, morality over might. In its earliest days it accomplished the greatest reform in history, the overthrow of human primate nature, and thereby secured the evolutionary future of the species.

— Sahlins, M. D. 1960 The origin of society. Scientific American 203(3): 76–87.

Christopher Boehm

Once a prehistoric hunting band institutionalized a successful and decisive rebellion, and did away with the alpha-male role permanently... it is easy to see how this institution would have spread.

— Boehm, C. 2000. Journal of Consciousness Studies 7, 1–2 pp. 79–101; p. 97.

If we accept Rousseau's line of reasoning, no single dominant individual is needed to embody society, to guarantee security, or to enforce social contracts. The people themselves can do these things, combining to enforce the general will. A modern origins theory along these lines is that of evolutionary anthropologist Christopher Boehm. Boehm argues that ape social organisation tends to be despotic, typically with one or more dominant males monopolising access to the locally available females. But wherever there is dominance, we can also expect resistance. In the human case, resistance to being personally dominated intensified as humans used their social intelligence to form coalitions. Eventually, a point was reached when the costs of attempting to impose dominance became so high that the strategy was no longer evolutionarily stable, whereupon social life tipped over into 'reverse dominance' — defined as a situation in which only the entire community, on guard against primate-style individual dominance, is permitted to use force to suppress deviant behaviour.

Ernest Gellner

Human beings, writes social anthropologist Ernest Gellner, are not genetically programmed to be members of this or that social order. You can take a human infant and place it into any kind of social order and it will function acceptably. What makes human society so distinctive is the fabulous range of quite different forms it takes across the world. Yet in any given society, the range of permitted behaviours is quite narrowly constrained. This is not owing to the existence of any externally imposed system of rewards and punishments. The constraints come from within — from certain compulsive moral concepts which members of the social order have internalised. The society installs these concepts in each individual's psyche in the manner first identified by Emile Durkheim, namely, by means of collective rituals such as initiation rites. Therefore, the problem of the origins of society boils down to the problem of the origins of collective ritual.

How is a society established, and a series of societies diversified, whilst each of them is restrained from chaotically exploiting that wide diversity of possible human behaviour? A theory is available concerning how this may be done and it is one of the basic theories of social anthropology. The way in which you restrain people from doing a wide variety of things, not compatible with the social order of which they are members, is that you subject them to ritual. The process is simple: you make them dance around a totem pole until they are wild with excitement, and become jellies in the hysteria of collective frenzy; you enhance their emotional state by any device, by all the locally available audio-visual aids, drugs, music and so on; and once they are really high, you stamp upon their minds the type of concept or notion to which they subsequently become enslaved.

— Gellner, E. 1988. Origins of Society. In A. C. Fabian (ed.), Origins. The Darwin College Lectures. Cambridge: Cambridge University Press, pp.128–140; p. 130.

Gender and origins

Feminist scholars — among them palaeoanthropologists Leslie Aiello and Camilla Power — take similar arguments a step further, arguing that any reform or revolution which overthrew male dominance must have been led by women. Evolving human females, Power and Aiello suggest, actively separated themselves from males on a periodic basis, using their own blood (and/or pigments such as red ochre) to mark themselves as fertile and defiant:

The sexual division of labor entails differentiation of roles in food procurement, with logistic hunting of large game by males, co-operation and exchange of products. Our hypothesis is that symbolism arose in this context. To minimize energetic costs of travel, coalitions of women began to invest in home bases. To secure this strategy, women would have to use their attractive, collective signal of impending fertility in a wholly new way: by signalling refusal of sexual access except to males who returned "home" with provisions. Menstruation — real or artificial — while biologically the wrong time for fertile sex, is psychologically the right moment for focusing men's minds on imminent hunting, since it offers the prospect of fertile sex in the near future.

— Power, C. and L. C. Aiello 1997. Female proto-symbolic strategies. In L. D. Hager (ed.), Women in Human Evolution. New York and London: Routledge, pp. 153–171; p. 159.

In similar vein, anthropologist Chris Knight argues that Boehm's idea of a 'coalition of everyone' is hard to envisage, unless — along the lines of a modern industrial picket line — it was formed to co-ordinate 'sex-strike' action against badly behaving males:

....male dominance had to be overthrown because the unending prioritising of male short-term sexual interests could lead only to the permanence and institutionalisation of behavioural conflict between the sexes, between the generations and also between rival males. If the symbolic, cultural domain was to emerge, what was needed was a political collectivity — an alliance — capable of transcending such conflicts. ... Only the consistent defence and self-defence of mothers with their offspring could produce a collectivity embodying interests of a sufficiently broad, universalistic kind.

— Knight, C. 1991. Blood Relations. Menstruation and the origins of culture. New Haven and London: Yale University Press, p. 514

In virtually all hunter-gatherer ethnographies, according to Knight, a persistent theme is that 'women like meat', and that they determinedly use their collective bargaining power to motivate men to hunt for them and bring home their kills — on pain of exclusion from sex. Arguments about women's crucial role in domesticating males — motivating them to cooperate — have also been advanced by anthropologists Kristen Hawkes, Sarah Hrdy and Bruce Knauft among others. Meanwhile, other evolutionary scientists continue to envisage uninterrupted male dominance, continuity with primate social systems and the emergence of society on a gradualist basis without revolutionary leaps.

Sociobiological theories

Robert Trivers

I consider Trivers one of the great thinkers in the history of Western thought. It would not be too much of an exaggeration to say that he has provided a scientific explanation for the human condition: the intricately complicated and endlessly fascinating relationships that bind us to one another.

In his 1985 book, Social Evolution, Robert Trivers outlines the theoretical framework used today by most evolutionary biologists to understand how and why societies are established. Trivers sets out from the fundamental fact that genes survive beyond the death of the bodies they inhabit, because copies of the same gene may be replicated in multiple different bodies. From this, it follows that a creature should behave altruistically to the extent that those benefiting carry the same genes — 'inclusive fitness', as this source of cooperation in nature is termed. Where animals are unrelated, cooperation should be limited to 'reciprocal altruism' or 'tit-for-tat'. Where previously, biologists took parent-offspring cooperation for granted, Trivers predicted on theoretical grounds both cooperation and conflict — as when a mother needs to wean an existing baby (even against its will) in order to make way for another. Previously, biologists had interpreted male infanticidal behaviour as aberrant and inexplicable or, alternatively, as a necessary strategy for culling excess population. Trivers was able to show that such behaviour was a logical strategy by males to enhance their own reproductive success at the expense of conspecifics including rival males. Ape or monkey females whose babies are threatened have directly opposed interests, often forming coalitions to defend themselves and their offspring against infanticidal males.

Human society, according to Trivers, is unusual in that it involves the male of the species investing parental care in his own offspring — a rare pattern for a primate. Where such cooperation occurs, it's not enough to take it for granted: in Trivers' view we need to explain it using an overarching theoretical framework applicable to humans and nonhumans alike.

Everybody has a social life. All living creatures reproduce and reproduction is a social event, since at its bare minimum it involves the genetic and material construction of one individual by another. In turn, differences between individuals in the number of their surviving offspring (natural selection) is the driving force behind organic evolution. Life is intrinsically social and it evolves through a process of natural selection which is itself social. For these reasons social evolution refers not only to the evolution of social relationships between individuals but also to deeper themes of biological organization stretching from gene to community.

— Robert Trivers, 1985. Social Evolution. Menlo Park, California: Benjamin/Cummings, p. vii.

Robin Dunbar

Robin Dunbar

Robin Dunbar originally studied gelada baboons in the wild in Ethiopia, and has done much to synthesise modern primatological knowledge with Darwinian theory into a comprehensive overall picture. The components of primate social systems 'are essentially alliances of a political nature aimed at enabling the animals concerned to achieve more effective solutions to particular problems of survival and reproduction'. Primate societies are in essence 'multi-layered sets of coalitions'. Although physical fights are ultimately decisive, the social mobilisation of allies usually decides matters and requires skills that go beyond mere fighting ability. The manipulation and use of coalitions demands sophisticated social — more precisely political — intelligence. Usually but not always, males exercise dominance over females. Even where male despotism prevails, females typically gang up with one another to pursue agendas of their own. When a male gelada baboon attacks a previously dominant rival so as to take over his harem, the females concerned may insist on their own say in the outcome. At various stages during the fighting, the females may 'vote' among themselves on whether to accept the provisional outcome. Rejection is signalled by refusing to groom the challenger; acceptance is signalled by going up to him and grooming him. According to Dunbar, the ultimate outcome of an inter-male 'sexual fight' always depends on the female 'vote'.

Dunbar points out that in a primate social system, lower-ranking females will typically suffer the most intense harassment. Consequently, they will be the first to form coalitions in self-defence. But maintaining commitment from coalition allies involves much time-consuming manual grooming, putting pressure on time-budgets. In the case of evolving humans, who were living in increasingly large groups, the costs would soon have outweighed the benefits — unless some more efficient way of maintaining relationships could be found. Dunbar argues that 'vocal grooming' — using the voice to signal commitment — was the time-saving solution adopted, and that this led eventually to speech. Dunbar goes on to suggest (citing evolutionary anthropologist Chris Knight) that distinctively human society may have been evolved under pressure from female ritual and 'gossiping' coalitions established to dissuade males from fighting one another and instead cooperate in hunting for the benefit of the whole camp:

If females formed the core of these early groups, and language evolved to bond these groups, it naturally follows that the early human females were the first to speak. This reinforces the suggestion that language was first used to create a sense of emotional solidarity between allies. Chris Knight has argued a passionate case for the idea that language first evolved to allow the females in these early groups to band together to force males to invest in them and their offspring, principally by hunting for meat. This would be consistent with the fact that, among modern humans, women are generally better at verbal skills than men, as well as being more skilful in the social domain.

— Dunbar, R. I. M. 1996. Grooming, Gossip and the Evolution of Language. London: Faber and Faber, p. 149.

Dunbar stresses that this is currently a minority theory among specialists in human origins — most still support the 'bison-down-at-the-lake' theory attributing early language and cooperation to the imperatives of men's activities such as hunting. Despite this, he argues that 'female bonding may have been a more powerful force in human evolution than is sometimes supposed'. Although still controversial, the idea that female coalitions may have played a decisive role has subsequently received strong support from a number of anthropologists including Sarah Hrdy, Camilla Power, Ian Watts. and Jerome Lewis. It is also consistent with recent studies by population geneticists (see Verdu et al. 2013 for Central African Pygmies; Schlebusch 2010 for Khoisan) showing a deep-time tendency to matrilocality among African hunter-gatherers.

Tuesday, January 24, 2023

Sermon on the Mount

From Wikipedia, the free encyclopedia
 
Sermon on the Mount by Carl Bloch (1877)

The Sermon on the Mount (anglicized from the Matthean Vulgate Latin section title: Sermo in monte) is a collection of sayings attributed to Jesus of Nazareth found in the Gospel of Matthew (chapters 5, 6, and 7) that emphasizes his moral teachings. It is the first of five discourses in the Gospel and has been one of the most widely quoted sections of the Gospels.

Background and setting

The Sermon on the Mount is placed relatively early in Matthew's portrayal of Jesus' ministry--following, in chapter 3, his baptism by John and, in chapter 4, his sojourn and temptation in the desert, his call of four disciples, and his early preaching in Galilee.

The five discourses in the Gospel of Matthew are: the Sermon on the Mount (5-7), the discourse on discipleship (10), the discourse of parables (13), the discourse on the community of faith (18), and the discourse on future events (24-25). Also, like all the other "discourses," this one has Matthew's concluding statement (7:28-29) that distinguishes it from the material that follows. For similar statements at the end of the other discourses, see 11:1; 13:53; 19:1; 26:1.

It may be common to speculate on where the Sermon on the Mount occurred and designate it the Mount of Beatitudes. In fact, the traditional site is understood to be a hill on the shore of the Lake of Galilee.

Nevertheless, this long "sermon" is one of the most widely quoted sections of the Gospels, including some of the best-known sayings attributed to Jesus, such as the Beatitudes and the commonly recited version of the Lord's Prayer. It also contains what many consider to be the central tenets of Christian discipleship.

The setting for the "sermon" is given in Matthew 5:1-2. There, Jesus is said to see the crowds, to go up the mountain accompanied by his disciples, to sit down, and to begin his speech.

Components

The Lord's Prayer, in Matthew 6:9, 1500, Vienna

Although the issues of Matthew's compositional plan for the Sermon on the Mount remain unresolved among scholars, its structural components are clear.

Matthew 5:3–12 includes the Beatitudes. These describe the character of the people of the Kingdom of Heaven, expressed as "blessings". The Greek word most versions of the Gospel render as "blessed," can also be translated "happy" (Matthew 5:3–12 in Young's Literal Translation for an example). In Matthew, there are eight (or nine) blessings, while in Luke there are four, followed by four woes.

In almost all cases, the phrases used in the Beatitudes are familiar from an Old Testament context, but in the sermon Jesus gives them new meaning. Together, the Beatitudes present a new set of ideals that focus on love and humility rather than force and mastery; they echo the highest ideals of Jesus' teachings on spirituality and compassion.

In Christian teachings, the Works of Mercy, which have corporal and spiritual components, have resonated with the theme of the Beatitude for mercy. These teachings emphasize that these acts of mercy provide both temporal and spiritual benefits.

Matthew 5:13–16 presents the metaphors of salt and light. This completes the profile of God's people presented in the beatitudes and acts as the introduction to the next section.

There are two parts in this section, using the terms "salt of the earth" and Light of the World to refer to the disciples – implying their value. Elsewhere, in John 8:12, Jesus applies 'Light of the World' to himself.

Jesus preaches about Hell and what Hell is like: "But I say unto you, That whosoever is angry with his brother without a cause shall be in danger of the judgment: and whosoever shall say to his brother "Raca (fool)" shall be in danger of the council: but whosoever shall say, Thou fool, shall be in danger of hell fire."

A page from Matthew, from Papyrus 1, c. 250 AD

The longest section of the Sermon is Matthew 5:17–48, traditionally referred to as "the Antitheses" or "Matthew's Antitheses". In the section, Jesus fulfils and reinterprets the Old Covenant and in particular its Ten Commandments, contrasting with what "you have heard" from others. For example, he advises turning the other cheek, and to love one's enemies, in contrast to taking an eye for an eye. According to most interpretations of Matthew 5:17, 18, 19, and 20, and most Christian views of the Old Covenant, these new interpretations of the Law and Prophets are not opposed to the Old Testament, which was the position of Marcion, but form Jesus' new teachings which bring about salvation, and hence must be adhered to, as emphasized in Matthew 7:24–27 towards the end of the sermon.

In Matthew 6, Jesus condemns doing what would normally be "good works" simply for recognition and not from the heart, such as those of alms (6:1–4), prayer (6:5–15), and fasting (6:16–18). The discourse goes on to condemn the superficiality of materialism and calls the disciples not to worry about material needs, but to "seek" God's kingdom first. Within the discourse on ostentation, Matthew presents an example of correct prayer. Luke places this in a different context. The Lord's prayer (6:9–13) contains parallels to 1 Chronicles 29:10–18.

The first part of Matthew 7 (Matthew 7:1–6) deals with judging. Jesus condemns those who judge others before first judging themselves: "Judge not, that ye be not judged." Jesus concludes the sermon in Matthew 7:17–29 by warning against false prophets.

Teachings and theology

Plaque of the 8 Beatitudes, St. Cajetan Church, Lindavista, Mexico

The teachings of the Sermon on the Mount have been a key element of Christian ethics, and for centuries the sermon has acted as a fundamental recipe for the conduct of the followers of Jesus. Various religious and moral thinkers (e.g. Leo Tolstoy and Mahatma Gandhi) have admired its message, and it has been one of the main sources of Christian pacifism.

In the 5th century, Saint Augustine began his book Our Lord's Sermon on the Mount by stating:

If anyone will piously and soberly consider the sermon which our Lord Jesus Christ spoke on the mount, as we read it in the Gospel according to Matthew, I think that he will find in it, so far as regards the highest morals, a perfect standard of the Christian life.

The last verse of chapter 5 of Matthew (Matthew 5:48) is a focal point of the Sermon that summarizes its teachings by advising the disciples to seek perfection. The Greek word telios used to refer to perfection also implies an end, or destination, advising the disciples to seek the path towards perfection and the Kingdom of God. It teaches that God's children are those who act like God.

The teachings of the sermon are often referred to as the "Ethics of the Kingdom": they place a high level of emphasis on "purity of the heart" and embody the basic standard of Christian righteousness.

Theological structure

The theological structure of the Sermon on the Mount is widely discussed. One group of theologians ranging from Saint Augustine in the 5th century to Michael Goulder in the 20th century, see the Beatitudes as the central element of the Sermon. Others such as Günther Bornkamm see the Sermon arranged around the Lord's prayer, while Daniel Patte, closely followed by Ulrich Luz, see a chiastic structure in the sermon. Dale Allison and Glen Stassen have proposed a structure based on triads. Jack Kingsbury and Hans Dieter Betz see the sermon as composed of theological themes, e.g. righteousness or way of life.

Interpretation

The Sermon of the Mount as depicted by Louis Comfort Tiffany in a stained glass window at Arlington Street Church in Boston

The high ethical standards of the Sermon have been interpreted in a wide variety of ways by different Christian groups.

North American Biblical scholar Craig S. Keener finds at least 36 different interpretations of the message of the Sermon which he groups into eight views:

  1. The predominant medieval view, "reserving a higher ethic for clergy, especially in monastic orders"
  2. A view associated with Martin Luther that it represents an impossible demand, but serves to educate Christians on the ideals of their faith
  3. The Anabaptist a literal view which directly applies the teachings
  4. The Social Gospel view
  5. The Christian existentialism view
  6. Schweitzer's view of an imminent eschatology referring to an interim ethic
  7. Dispensational eschatology which refers to the future Kingdom of God
  8. Inaugurated eschatology in which the Sermon's ethics remain a goal to be approached, yet realized later

Comparison with the Sermon on the Plain

While Matthew groups Jesus' teachings into sets of similar material, the same material is scattered when found in Luke. The Sermon on the Mount may be compared with the similar but shorter Sermon on the Plain as recounted by the Gospel of Luke (Luke 6:17-49),Luke 6:17–49 which occurs at the same moment in Luke's narrative, and also features Jesus heading up a mountain, but giving the sermon on the way down at a level spot. Some scholars believe that they are the same sermon, while others hold that Jesus frequently preached similar themes in different places.

Moon

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Moon   Near side of the Moon , lunar ...