Search This Blog

Saturday, March 12, 2022

Early modern human

From Wikipedia, the free encyclopedia

100 to 80 thousand year old Skhul V from Israel

Early modern human (EMH) or anatomically modern human (AMH) are terms used to distinguish Homo sapiens (the only extant Hominina species) that are anatomically consistent with the range of phenotypes seen in contemporary humans from extinct archaic human species. This distinction is useful especially for times and regions where anatomically modern and archaic humans co-existed, for example, in Paleolithic Europe. Among the oldest known remains of Homo sapiens are those found at the Omo-Kibish I archaeological site in south-western Ethiopia, dating to about 233,000 to 196,000 years ago, the Florisbad site in South Africa, dating to about 259,000 years ago, and the Jebel Irhoud site in Morocco, dated about 300,000 years ago.

Extinct species of the genus Homo include Homo erectus (extant from roughly 2 to 0.1 million years ago) and a number of other species (by some authors considered subspecies of either H. sapiens or H. erectus). The divergence of the lineage leading to H. sapiens out of ancestral H. erectus (or an intermediate species such as Homo antecessor) is estimated to have occurred in Africa roughly 500,000 years ago. The earliest fossil evidence of early modern humans appears in Africa around 300,000 years ago, with the earliest genetic splits among modern people, according to some evidence, dating to around the same time. Sustained archaic human admixture with modern humans is known to have taken place both in Africa and (following the recent Out-Of-Africa expansion) in Eurasia, between about 100,000 and 30,000 years ago.

Name and taxonomy

The binomial name Homo sapiens was coined by Linnaeus, 1758. The Latin noun homō (genitive hominis) means "human being", while the participle sapiēns means "discerning, wise, sensible".

The species was initially thought to have emerged from a predecessor within the genus Homo around 300,000 to 200,000 years ago. A problem with the morphological classification of "anatomically modern" was that it would not have included certain extant populations. For this reason, a lineage-based (cladistic) definition of H. sapiens has been suggested, in which H. sapiens would by definition refer to the modern human lineage following the split from the Neanderthal lineage. Such a cladistic definition would extend the age of H. sapiens to over 500,000 years.

Estimates for the split between the Homo sapiens line and combined Neanderthal/Denisovan line range from between 503,000 and 565,000 years ago; between 550,000 and 765,000 years ago; and (based on rates of dental evolution) possibly more than 800,000 years ago.

Extant human populations have historically been divided into subspecies, but since around the 1980s all extant groups have tended to be subsumed into a single species, H. sapiens, avoiding division into subspecies altogether.

Some sources show Neanderthals (H. neanderthalensis) as a subspecies (H. sapiens neanderthalensis). Similarly, the discovered specimens of the H. rhodesiensis species have been classified by some as a subspecies (H. sapiens rhodesiensis), although it remains more common to treat these last two as separate species within the genus Homo rather than as subspecies within H. sapiens.

All humans are considered to be a part of the subspecies H. sapiens sapiens, a designation which has been a matter of debate since a species is usually not given a subspecies category unless there is evidence of multiple distinct subspecies.

Age and speciation process

Derivation from H. erectus

Schematic representation of the emergence of H. sapiens from earlier species of Homo. The horizontal axis represents geographic location; the vertical axis represents time in millions of years ago (blue areas denote the presence of a certain species of Homo at a given time and place; late survival of robust australopithecines alongside Homo is indicated in purple). Based on Springer (2012), Homo heidelbergensis is shown as diverging into Neanderthals, Denisovans and H. sapiens. With the rapid expansion of H. sapiens after 60 kya, Neanderthals, Denisovans and unspecified archaic African hominins are shown as again subsumed into the H. sapiens lineage.
 
A model of the phylogeny of H. sapiens during the Middle Paleolithic. The horizontal axis represents geographic location; the vertical axis represents time in thousands of years ago. Neanderthals, Denisovans and unspecified archaic African hominins are shown as admixed into the H. sapiens lineage. In addition, prehistoric Archaic Human and Eurasian admixture events in modern African populations are indicated.

The divergence of the lineage that would lead to H. sapiens out of archaic human varieties derived from H. erectus, is estimated as having taken place over 500,000 years ago (marking the split of the H. sapiens lineage from ancestors shared with other known archaic hominins). But the oldest split among modern human populations (such as the Khoisan split from other groups) has been recently dated to between 350,000 and 260,000 years ago, and the earliest known examples of H. sapiens fossils also date to about that period, including the Jebel Irhoud remains from Morocco (ca. 300,000 or 350–280,000 years ago), the Florisbad Skull from South Africa (ca. 259,000 years ago), and the Omo remains from Ethiopia (ca. 195,000, or, as more recently dated, ca. 233,000 years ago).

An mtDNA study in 2019 proposed an origin of modern humans in Botswana (and a Khoisan split) of around 200,000 years. However, this proposal has been widely criticized by scholars, with the recent evidence overall (genetic, fossil, and archaeological) supporting an origin for H. sapiens approximately 100,000 years earlier and in a broader region of Africa than the study proposes.

In September 2019, scientists proposed that the earliest H. sapiens (and last common human ancestor to modern humans) arose between 350,000 and 260,000 years ago through a merging of populations in East and South Africa.

An alternative suggestion defines H. sapiens cladistically as including the lineage of modern humans since the split from the lineage of Neanderthals, roughly 500,000 to 800,000 years ago.

The time of divergence between archaic H. sapiens and ancestors of Neanderthals and Denisovans caused by a genetic bottleneck of the latter was dated at 744,000 years ago, combined with repeated early admixture events and Denisovans diverging from Neanderthals 300 generations after their split from H. sapiens, as calculated by Rogers et al. (2017).

The derivation of a comparatively homogeneous single species of H. sapiens from more diverse varieties of archaic humans (all of which were descended from the early dispersal of H. erectus some 1.8 million years ago) was debated in terms of two competing models during the 1980s: "recent African origin" postulated the emergence of H. sapiens from a single source population in Africa, which expanded and led to the extinction of all other human varieties, while the "multiregional evolution" model postulated the survival of regional forms of archaic humans, gradually converging into the modern human varieties by the mechanism of clinal variation, via genetic drift, gene flow and selection throughout the Pleistocene.

Since the 2000s, the availability of data from archaeogenetics and population genetics has led to the emergence of a much more detailed picture, intermediate between the two competing scenarios outlined above: The recent Out-of-Africa expansion accounts for the predominant part of modern human ancestry, while there were also significant admixture events with regional archaic humans.

Since the 1970s, the Omo remains, originally dated to some 195,000 years ago, have often been taken as the conventional cut-off point for the emergence of "anatomically modern humans". Since the 2000s, the discovery of older remains with comparable characteristics, and the discovery of ongoing hybridization between "modern" and "archaic" populations after the time of the Omo remains, have opened up a renewed debate on the age of H. sapiens in journalistic publications. H. s. idaltu, dated to 160,000 years ago, has been postulated as an extinct subspecies of H. sapiens in 2003. H. neanderthalensis, which became extinct about 40,000 years ago, was also at one point considered to be a subspecies, H. s. neanderthalensis.

H. heidelbergensis, dated 600,000 to 300,000 years ago, has long been thought to be a likely candidate for the last common ancestor of the Neanderthal and modern human lineages. However, genetic evidence from the Sima de los Huesos fossils published in 2016 seems to suggest that H. heidelbergensis in its entirety should be included in the Neanderthal lineage, as "pre-Neanderthal" or "early Neanderthal", while the divergence time between the Neanderthal and modern lineages has been pushed back to before the emergence of H. heidelbergensis, to close to 800,000 years ago, the approximate time of disappearance of H. antecessor.

Early Homo sapiens

The term Middle Paleolithic is intended to cover the time between the first emergence of H. sapiens (roughly 300,000 years ago) and the period held by some to mark the emergence of full behavioral modernity (roughly by 50,000 years ago, corresponding to the start of the Upper Paleolithic).

Many of the early modern human finds, like those of Jebel Irhoud, Omo, Herto, Florisbad, Skhul, Red Deer Cave people, and Peștera cu Oase exhibit a mix of archaic and modern traits. Skhul V, for example, has prominent brow ridges and a projecting face. However, the brain case is quite rounded and distinct from that of the Neanderthals and is similar to the brain case of modern humans. It is uncertain whether the robust traits of some of the early modern humans like Skhul V reflects mixed ancestry or retention of older traits.

The "gracile" or lightly built skeleton of anatomically modern humans has been connected to a change in behavior, including increased cooperation and "resource transport".

There is evidence that the characteristic human brain development, especially the prefrontal cortex, was due to "an exceptional acceleration of metabolome evolution ... paralleled by a drastic reduction in muscle strength. The observed rapid metabolic changes in brain and muscle, together with the unique human cognitive skills and low muscle performance, might reflect parallel mechanisms in human evolution." The Schöningen spears and their correlation of finds are evidence that complex technological skills already existed 300,000 years ago, and are the first obvious proof of an active (big game) hunt. H. heidelbergensis already had intellectual and cognitive skills like anticipatory planning, thinking and acting that so far have only been attributed to modern man.

The ongoing admixture events within anatomically modern human populations make it difficult to estimate the age of the matrilinear and patrilinear most recent common ancestors of modern populations (Mitochondrial Eve and Y-chromosomal Adam). Estimates of the age of Y-chromosomal Adam have been pushed back significantly with the discovery of an ancient Y-chromosomal lineage in 2013, to likely beyond 300,000 years ago. There have, however, been no reports of the survival of Y-chromosomal or mitochondrial DNA clearly deriving from archaic humans (which would push back the age of the most recent patrilinear or matrilinear ancestor beyond 500,000 years).

Fossil teeth found at Qesem Cave (Israel) and dated to between 400,000 and 200,000 years ago have been compared to the dental material from the younger (120,000–80,000 years ago) Skhul and Qafzeh hominins.

Dispersal and archaic admixture

Overview map of the peopling of the world by anatomically modern humans (numbers indicate dates in thousands of years ago [ka])

Dispersal of early H. sapiens begins soon after its emergence, as evidenced by the North African Jebel Irhoud finds (dated to around 315,000 years ago). There is indirect evidence for H. sapiens presence in West Asia around 270,000 years ago.

The Florisbad Skull from Florisbad, South Africa, dated to about 259,000 years ago, has also been classified as representing early H. sapiens.

In September 2019, scientists proposed that the earliest H. sapiens (and last common human ancestor to modern humans) arose between 350,000 and 260,000 years ago through a merging of populations in East and South Africa.

Among extant populations, the Khoi-San (or "Capoid") hunters-gatherers of Southern Africa may represent the human population with the earliest possible divergence within the group Homo sapiens sapiens. Their separation time has been estimated in a 2017 study to be between 350 and 260,000 years ago, compatible with the estimated age of early H. sapiens. The study states that the deep split-time estimation of 350 to 260 thousand years ago is consistent with the archaeological estimate for the onset of the Middle Stone Age across sub-Saharan Africa and coincides with archaic H. sapiens in southern Africa represented by, for example, the Florisbad skull dating to 259 (± 35) thousand years ago.

H. s. idaltu, found at Middle Awash in Ethiopia, lived about 160,000 years ago, and H. sapiens lived at Omo Kibish in Ethiopia about 233,000-195,000 years ago. Two fossils from Guomde, Kenya, dated to at least (and likely more than) 180,000 years ago and (more precisely) to 300–270,000 years ago, have been tentatively assigned to H. sapiens and similarities have been noted between them and the Omo Kibbish remains. Fossil evidence for modern human presence in West Asia is ascertained for 177,000 years ago, and disputed fossil evidence suggests expansion as far as East Asia by 120,000 years ago.

In July 2019, anthropologists reported the discovery of 210,000 year old remains of a H. sapiens and 170,000 year old remains of a H. neanderthalensis in Apidima Cave, Peloponnese, Greece, more than 150,000 years older than previous H. sapiens finds in Europe.

A significant dispersal event, within Africa and to West Asia, is associated with the African megadroughts during MIS 5, beginning 130,000 years ago. A 2011 study located the origin of basal population of contemporary human populations at 130,000 years ago, with the Khoi-San representing an "ancestral population cluster" located in southwestern Africa (near the coastal border of Namibia and Angola).

Layer sequence at Ksar Akil in the Levantine corridor, and discovery of two fossils of Homo sapiens, dated to 40,800 to 39,200 years BP for "Egbert", and 42,400–41,700 BP for "Ethelruda".

While early modern human expansion in Sub-Saharan Africa before 130 kya persisted, early expansion to North Africa and Asia appears to have mostly disappeared by the end of MIS5 (75,000 years ago), and is known only from fossil evidence and from archaic admixture. Eurasia was re-populated by early modern humans in the so-called "recent out-of-Africa migration" post-dating MIS5, beginning around 70,000–50,000 years ago. In this expansion, bearers of mt-DNA haplogroup L3 left East Africa, likely reaching Arabia via the Bab-el-Mandeb, and in the Great Coastal Migration spread to South Asia, Maritime South Asia and Oceania between 65,000 and 50,000 years ago, while Europe, East and North Asia were reached by about 45,000 years ago. Some evidence suggests that an early wave humans may have reached the Americas by about 40–25,000 years ago.

Evidence for the overwhelming contribution of this "recent" (L3-derived) expansion to all non-African populations was established based on mitochondrial DNA, combined with evidence based on physical anthropology of archaic specimens, during the 1990s and 2000s, and has also been supported by Y DNA and autosomal DNA. The assumption of complete replacement has been revised in the 2010s with the discovery of admixture events (introgression) of populations of H. sapiens with populations of archaic humans over the period of between roughly 100,000 and 30,000 years ago, both in Eurasia and in Sub-Saharan Africa. Neanderthal admixture, in the range of 1–4%, is found in all modern populations outside of Africa, including in Europeans, Asians, Papua New Guineans, Australian Aboriginals, Native Americans, and other non-Africans. This suggests that interbreeding between Neanderthals and anatomically modern humans took place after the recent "out of Africa" migration, likely between 60,000 and 40,000 years ago. Recent admixture analyses have added to the complexity, finding that Eastern Neanderthals derive up to 2% of their ancestry from anatomically modern humans who left Africa some 100 kya. The extent of Neanderthal admixture (and introgression of genes acquired by admixture) varies significantly between contemporary racial groups, being absent in Africans, intermediate in Europeans and highest in East Asians. Certain genes related to UV-light adaptation introgressed from Neanderthals have been found to have been selected for in East Asians specifically from 45,000 years ago until around 5,000 years ago. The extent of archaic admixture is of the order of about 1% to 4% in Europeans and East Asians, and highest among Melanesians (the last also having Denisova hominin admixture at 4% to 6% in addition to neanderthal admixture). Cumulatively, about 20% of the Neanderthal genome is estimated to remain present spread in contemporary populations.

In September 2019, scientists reported the computerized determination, based on 260 CT scans, of a virtual skull shape of the last common human ancestor to modern humans/H. sapiens, representative of the earliest modern humans, and suggested that modern humans arose between 350,000 and 260,000 years ago through a merging of populations in East and South Africa while North-African fossils may represent a population which introgressed into Neandertals during the LMP.

Anatomy

Known archaeological remains of anatomically modern humans in Europe and Africa, directly dated, calibrated carbon dates as of 2013.

Generally, modern humans are more lightly built (or more "gracile") than the more "robust" archaic humans. Nevertheless, contemporary humans exhibit high variability in many physiological traits, and may exhibit remarkable "robustness". There are still a number of physiological details which can be taken as reliably differentiating the physiology of Neanderthals vs. anatomically modern humans.

Anatomical modernity

The term "anatomically modern humans" (AMH) is used with varying scope depending on context, to distinguish "anatomically modern" Homo sapiens from archaic humans such as Neanderthals and Middle and Lower Paleolithic hominins with transitional features intermediate between H. erectus, Neanderthals and early AMH called archaic Homo sapiens. In a convention popular in the 1990s, Neanderthals were classified as a subspecies of H. sapiens, as H. s. neanderthalensis, while AMH (or European early modern humans, EEMH) was taken to refer to "Cro-Magnon" or H. s. sapiens. Under this nomenclature (Neanderthals considered H. sapiens), the term "anatomically modern Homo sapiens" (AMHS) has also been used to refer to EEMH ("Cro-Magnons"). It has since become more common to designate Neanderthals as a separate species, H. neanderthalensis, so that AMH in the European context refers to H. sapiens, but the question is by no means resolved.

In this more narrow definition of H. sapiens, the subspecies Homo sapiens idaltu, discovered in 2003, also falls under the umbrella of "anatomically modern". The recognition of H. sapiens idaltu as a valid subspecies of the anatomically modern human lineage would justify the description of contemporary humans with the subspecies name Homo sapiens sapiens. However, biological anthropologist Chris Stringer does not consider idaltu distinct enough within H. sapiens to warrant its own subspecies designation.

A further division of AMH into "early" or "robust" vs. "post-glacial" or "gracile" subtypes has since been used for convenience. The emergence of "gracile AMH" is taken to reflect a process towards a smaller and more fine-boned skeleton beginning around 50,000–30,000 years ago.

Braincase anatomy

Anatomical comparison of skulls of H. sapiens (left) and H. neanderthalensis (right)
(in Cleveland Museum of Natural History)
Features compared are the braincase shape, forehead, browridge, nasal bone, projection, cheek bone angulation, chin and occipital contour

The cranium lacks a pronounced occipital bun in the neck, a bulge that anchored considerable neck muscles in Neanderthals. Modern humans, even the earlier ones, generally have a larger fore-brain than the archaic people, so that the brain sits above rather than behind the eyes. This will usually (though not always) give a higher forehead, and reduced brow ridge. Early modern people and some living people do however have quite pronounced brow ridges, but they differ from those of archaic forms by having both a supraorbital foramen or notch, forming a groove through the ridge above each eye. This splits the ridge into a central part and two distal parts. In current humans, often only the central section of the ridge is preserved (if it is preserved at all). This contrasts with archaic humans, where the brow ridge is pronounced and unbroken.

Modern humans commonly have a steep, even vertical forehead whereas their predecessors had foreheads that sloped strongly backwards. According to Desmond Morris, the vertical forehead in humans plays an important role in human communication through eyebrow movements and forehead skin wrinkling.

Brain size in both Neanderthals and AMH is significantly larger on average (but overlapping in range) than brain size in H. erectus. Neanderthal and AMH brain sizes are in the same range, but there are differences in the relative sizes of individual brain areas, with significantly larger visual systems in Neanderthals than in AMH.

Jaw anatomy

Compared to archaic people, anatomically modern humans have smaller, differently shaped teeth. This results in a smaller, more receded dentary, making the rest of the jaw-line stand out, giving an often quite prominent chin. The central part of the mandible forming the chin carries a triangularly shaped area forming the apex of the chin called the mental trigon, not found in archaic humans. Particularly in living populations, the use of fire and tools requires fewer jaw muscles, giving slender, more gracile jaws. Compared to archaic people, modern humans have smaller, lower faces.

Body skeleton structure

The body skeletons of even the earliest and most robustly built modern humans were less robust than those of Neanderthals (and from what little we know from Denisovans), having essentially modern proportions. Particularly regarding the long bones of the limbs, the distal bones (the radius/ulna and tibia/fibula) are nearly the same size or slightly shorter than the proximal bones (the humerus and femur). In ancient people, particularly Neanderthals, the distal bones were shorter, usually thought to be an adaptation to cold climate. The same adaptation is found in some modern people living in the polar regions.

Height ranges overlap between Neanderthals and AMH, with Neanderthal averages cited as 164 to 168 cm (65 to 66 in) and 152 to 156 cm (60 to 61 in) for males and females, respectively. By comparison, contemporary national averages range between 158 to 184 cm (62 to 72 in) in males and 147 to 172 cm (58 to 68 in) in females. Neanderthal ranges approximate the height distribution measured among Malay people, for one.

Recent evolution

Following the peopling of Africa some 130,000 years ago, and the recent Out-of-Africa expansion some 70,000 to 50,000 years ago, some sub-populations of H. sapiens had been essentially isolated for tens of thousands of years prior to the early modern Age of Discovery. Combined with archaic admixture this has resulted in significant genetic variation, which in some instances has been shown to be the result of directional selection taking place over the past 15,000 years, i.e. significantly later than possible archaic admixture events.

Some climatic adaptations, such as high-altitude adaptation in humans, are thought to have been acquired by archaic admixture. Introgression of genetic variants acquired by Neanderthal admixture have different distributions in European and East Asians, reflecting differences in recent selective pressures. A 2014 study reported that Neanderthal-derived variants found in East Asian populations showed clustering in functional groups related to immune and haematopoietic pathways, while European populations showed clustering in functional groups related to the lipid catabolic process. A 2017 study found correlation of Neanderthal admixture in phenotypic traits in modern European populations.

Physiological or phenotypical changes have been traced to Upper Paleolithic mutations, such as the East Asian variant of the EDAR gene, dated to c. 35,000 years ago.

Recent divergence of Eurasian lineages was sped up significantly during the Last Glacial Maximum (LGM), the Mesolithic and the Neolithic, due to increased selection pressures and due to founder effects associated with migration. Alleles predictive of light skin have been found in Neanderthals, but the alleles for light skin in Europeans and East Asians, associated with KITLG and ASIP, are (as of 2012) thought to have not been acquired by archaic admixture but recent mutations since the LGM. Phenotypes associated with the "white" or "Caucasian" populations of Western Eurasian stock emerge during the LGM, from about 19,000 years ago. Average cranial capacity in modern human populations varies in the range of 1,200 to 1,450 cm3 for adult males. Larger cranial volume is associated with climatic region, the largest averages being found in populations of Siberia and the Arctic. Both Neanderthal and EEMH had somewhat larger cranial volumes on average than modern Europeans, suggesting the relaxation of selection pressures for larger brain volume after the end of the LGM.

Examples for still later adaptations related to agriculture and animal domestication including East Asian types of ADH1B associated with rice domestication, or lactase persistence, are due to recent selection pressures.

An even more recent adaptation has been proposed for the Austronesian Sama-Bajau, developed under selection pressures associated with subsisting on freediving over the past thousand years or so.

Behavioral modernity

Lithic Industries of early Homo sapiens at Blombos Cave (M3 phase, MIS 5), Southern Cape, South Africa (c. 105,000 – 90,000 years old)

Behavioral modernity, involving the development of language, figurative art and early forms of religion (etc.) is taken to have arisen before 40,000 years ago, marking the beginning of the Upper Paleolithic (in African contexts also known as the Later Stone Age).

There is considerable debate regarding whether the earliest anatomically modern humans behaved similarly to recent or existing humans. Behavioral modernity is taken to include fully developed language (requiring the capacity for abstract thought), artistic expression, early forms of religious behavior, increased cooperation and the formation of early settlements, and the production of articulated tools from lithic cores, bone or antler. The term Upper Paleolithic is intended to cover the period since the rapid expansion of modern humans throughout Eurasia, which coincides with the first appearance of Paleolithic art such as cave paintings and the development of technological innovation such as the spear-thrower. The Upper Paleolithic begins around 50,000 to 40,000 years ago, and also coincides with the disappearance of archaic humans such as the Neanderthals.

Bifacial silcrete point of early Homo sapiens, from M1 phase (71,000 BCE) layer of Blombos Cave, South Africa

The term "behavioral modernity" is somewhat disputed. It is most often used for the set of characteristics marking the Upper Paleolithic, but some scholars use "behavioral modernity" for the emergence of H. sapiens around 200,000 years ago, while others use the term for the rapid developments occurring around 50,000 years ago. It has been proposed that the emergence of behavioral modernity was a gradual process.

Examples of behavioural modernity

Claimed "oldest known drawing by human hands", discovered in Blombos Cave in South Africa. Estimated to be a 73,000-year-old work of a Homo sapiens.

The equivalent of the Eurasian Upper Paleolithic in African archaeology is known as the Later Stone Age, also beginning roughly 40,000 years ago. While most clear evidence for behavioral modernity uncovered from the later 19th century was from Europe, such as the Venus figurines and other artefacts from the Aurignacian, more recent archaeological research has shown that all essential elements of the kind of material culture typical of contemporary San hunter-gatherers in Southern Africa was also present by at least 40,000 years ago, including digging sticks of similar materials used today, ostrich egg shell beads, bone arrow heads with individual maker's marks etched and embedded with red ochre, and poison applicators. There is also a suggestion that "pressure flaking best explains the morphology of lithic artifacts recovered from the c. 75-ka Middle Stone Age levels at Blombos Cave, South Africa. The technique was used during the final shaping of Still Bay bifacial points made on heat‐treated silcrete." Both pressure flaking and heat treatment of materials were previously thought to have occurred much later in prehistory, and both indicate a behaviourally modern sophistication in the use of natural materials. Further reports of research on cave sites along the southern African coast indicate that "the debate as to when cultural and cognitive characteristics typical of modern humans first appeared" may be coming to an end, as "advanced technologies with elaborate chains of production" which "often demand high-fidelity transmission and thus language" have been found at the South African Pinnacle Point Site 5–6. These have been dated to approximately 71,000 years ago. The researchers suggest that their research "shows that microlithic technology originated early in South Africa by 71 kya, evolved over a vast time span (c. 11,000 years), and was typically coupled to complex heat treatment that persisted for nearly 100,000 years. Advanced technologies in Africa were early and enduring; a small sample of excavated sites in Africa is the best explanation for any perceived 'flickering' pattern." These results suggest that Late Stone Age foragers in Sub-Saharan Africa had developed modern cognition and behaviour by at least 50,000 years ago. The change in behavior has been speculated to have been a consequence of an earlier climatic change to much drier and colder conditions between 135,000 and 75,000 years ago. This might have led to human groups who were seeking refuge from the inland droughts, expanded along the coastal marshes rich in shellfish and other resources. Since sea levels were low due to so much water tied up in glaciers, such marshlands would have occurred all along the southern coasts of Eurasia. The use of rafts and boats may well have facilitated exploration of offshore islands and travel along the coast, and eventually permitted expansion to New Guinea and then to Australia.

In addition, a variety of other evidence of abstract imagery, widened subsistence strategies, and other "modern" behaviors has been discovered in Africa, especially South, North, and East Africa, predating 50,000 years ago (with some predating 100,000 years ago). The Blombos Cave site in South Africa, for example, is famous for rectangular slabs of ochre engraved with geometric designs. Using multiple dating techniques, the site was confirmed to be around 77,000 and 100–75,000 years old. Ostrich egg shell containers engraved with geometric designs dating to 60,000 years ago were found at Diepkloof, South Africa. Beads and other personal ornamentation have been found from Morocco which might be as much as 130,000 years old; as well, the Cave of Hearths in South Africa has yielded a number of beads dating from significantly prior to 50,000 years ago, and shell beads dating to about 75,000 years ago have been found at Blombos Cave, South Africa. Specialized projectile weapons as well have been found at various sites in Middle Stone Age Africa, including bone and stone arrowheads at South African sites such as Sibudu Cave (along with an early bone needle also found at Sibudu) dating approximately 72,000–60,000 years ago some of which may have been tipped with poisons, and bone harpoons at the Central African site of Katanda dating ca. 90,000 years ago. Evidence also exists for the systematic heat treating of silcrete stone to increase its flake-ability for the purpose of toolmaking, beginning approximately 164,000 years ago at the South African site of Pinnacle Point and becoming common there for the creation of microlithic tools at about 72,000 years ago.

In 2008, an ochre processing workshop likely for the production of paints was uncovered dating to ca. 100,000 years ago at Blombos Cave, South Africa. Analysis shows that a liquefied pigment-rich mixture was produced and stored in the two abalone shells, and that ochre, bone, charcoal, grindstones and hammer-stones also formed a composite part of the toolkits. Evidence for the complexity of the task includes procuring and combining raw materials from various sources (implying they had a mental template of the process they would follow), possibly using pyrotechnology to facilitate fat extraction from bone, using a probable recipe to produce the compound, and the use of shell containers for mixing and storage for later use. Modern behaviors, such as the making of shell beads, bone tools and arrows, and the use of ochre pigment, are evident at a Kenyan site by 78,000-67,000 years ago. Evidence of early stone-tipped projectile weapons (a characteristic tool of Homo sapiens), the stone tips of javelins or throwing spears, were discovered in 2013 at the Ethiopian site of Gademotta, and date to around 279,000 years ago.

Expanding subsistence strategies beyond big-game hunting and the consequential diversity in tool types have been noted as signs of behavioral modernity. A number of South African sites have shown an early reliance on aquatic resources from fish to shellfish. Pinnacle Point, in particular, shows exploitation of marine resources as early as 120,000 years ago, perhaps in response to more arid conditions inland. Establishing a reliance on predictable shellfish deposits, for example, could reduce mobility and facilitate complex social systems and symbolic behavior. Blombos Cave and Site 440 in Sudan both show evidence of fishing as well. Taphonomic change in fish skeletons from Blombos Cave have been interpreted as capture of live fish, clearly an intentional human behavior.

Humans in North Africa (Nazlet Sabaha, Egypt) are known to have dabbled in chert mining, as early as ≈100,000 years ago, for the construction of stone tools.

Evidence was found in 2018, dating to about 320,000 years ago at the site of Olorgesailie in Kenya, of the early emergence of modern behaviors including: the trade and long-distance transportation of resources (such as obsidian), the use of pigments, and the possible making of projectile points. The authors of three 2018 studies on the site observe that the evidence of these behaviors is roughly contemporary with the earliest known Homo sapiens fossil remains from Africa (such as at Jebel Irhoud and Florisbad), and they suggest that complex and modern behaviors began in Africa around the time of the emergence of Homo sapiens.

In 2019, further evidence of Middle Stone Age complex projectile weapons in Africa was found at Aduma, Ethiopia, dated 100,000–80,000 years ago, in the form of points considered likely to belong to darts delivered by spear throwers.

Pace of progress during Homo sapiens history

Homo sapiens technological and cultural progress appears to have been very much faster in recent millennia than in Homo sapiens early periods. The pace of development may indeed have accelerated, due to massively larger population (so more humans extant to think of innovations), more communication and sharing of ideas among human populations, and the accumulation of thinking tools. However it may also be that the pace of advance always looks relatively faster to humans in the time they live, because previous advances are unrecognised 'givens'.

Friday, March 11, 2022

Deep Learning Is Hitting a Wall

What would it take for artificial intelligence to make real progress?

featured_image

Let me start by saying a few things that seem obvious,” Geoffrey Hinton, “Godfather” of deep learning, and one of the most celebrated scientists of our time, told a leading AI conference in Toronto in 2016. “If you work as a radiologist you’re like the coyote that’s already over the edge of the cliff but hasn’t looked down.” Deep learning is so well-suited to reading images from MRIs and CT scans, he reasoned, that people should “stop training radiologists now” and that it’s “just completely obvious within five years deep learning is going to do better.”

Fast forward to 2022, and not a single radiologist has been replaced. Rather, the consensus view nowadays is that machine learning for radiology is harder than it looks1; at least for now, humans and machines complement each other’s strengths.2

Deep learning is at its best when all we need are rough-ready results.

Few fields have been more filled with hype and bravado than artificial intelligence. It has flitted from fad to fad decade by decade, always promising the moon, and only occasionally delivering. One minute it was expert systems, next it was Bayesian networks, and then Support Vector Machines. In 2011, it was IBM’s Watson, once pitched as a revolution in medicine, more recently sold for parts.3 Nowadays, and in fact ever since 2012, the flavor of choice has been deep learning, the multibillion-dollar technique that drives so much of contemporary AI and which Hinton helped pioneer: He’s been cited an astonishing half-million times and won, with Yoshua Bengio and Yann LeCun, the 2018 Turing Award.

Like AI pioneers before him, Hinton frequently heralds the Great Revolution that is coming. Radiology is just part of it. In 2015, shortly after Hinton joined Google, The Guardian reported that the company was on the verge of “developing algorithms with the capacity for logic, natural conversation and even flirtation.” In November 2020, Hinton told MIT Technology Review that “deep learning is going to be able to do everything.”4

I seriously doubt it. In truth, we are still a long way from machines that can genuinely understand human language, and nowhere near the ordinary day-to-day intelligence of Rosey the Robot, a science-fiction housekeeper that could not only interpret a wide variety of human requests but safely act on them in real time. Sure, Elon Musk recently said that the new humanoid robot he was hoping to build, Optimus, would someday be bigger than the vehicle industry, but as of Tesla’s AI Demo Day 2021, in which the robot was announced, Optimus was nothing more than a human in a costume. Google’s latest contribution to language is a system (Lamda) that is so flighty that one of its own authors recently acknowledged it is prone to producing “bullshit.”5  Turning the tide, and getting to AI we can really trust, ain’t going to be easy.

In time we will see that deep learning was only a tiny part of what we need to build if we’re ever going to get trustworthy AI.

Deep learning, which is fundamentally a technique for recognizing patterns, is at its best when all we need are rough-ready results, where stakes are low and perfect results optional. Take photo tagging. I asked my iPhone the other day to find a picture of a rabbit that I had taken a few years ago; the phone obliged instantly, even though I never labeled the picture. It worked because my rabbit photo was similar enough to other photos in some large database of other rabbit-labeled photos. But automatic, deep-learning-powered photo tagging is also prone to error; it may miss some rabbit photos (especially cluttered ones, or ones taken with weird light or unusual angles or with the rabbit partly obscured; it occasionally confuses baby photos of my two children. But the stakes are low—if the app makes an occasional error, I am not going to throw away my phone.

When the stakes are higher, though, as in radiology or driverless cars, we need to be much more cautious about adopting deep learning. When a single error can cost a life, it’s just not good enough. Deep-learning systems are particularly problematic when it comes to “outliers” that differ substantially from the things on which they are trained. Not long ago, for example, a Tesla in so-called “Full Self Driving Mode” encountered a person holding up a stop sign in the middle of a road. The car failed to recognize the person (partly obscured by the stop sign) and the stop sign (out of its usual context on the side of a road); the human driver had to take over. The scene was far enough outside of the training database that the system had no idea what to do.

Few fields have been more filled with hype than artificial intelligence.

Current deep-learning systems frequently succumb to stupid errors like this. They sometimes misread dirt on an image that a human radiologist would recognize as a glitch. (Another issue for radiology systems, and key motivation for keeping humans in the loop, is that current AI relies mostly or entirely on images, with little or no comprehension of all the text that might describe a patient’s history, sometimes neglecting critical information.) A deep-learning system has mislabeled an apple as an iPod because the apple had a piece of paper in front with “iPod” written across. Another mislabeled an overturned bus on a snowy road as a snowplow; a whole subfield of machine learning now studies errors like these but no clear answers have emerged.

Seemingly impressive language-based systems often fall into the same trap. Take GPT-3, perhaps the best-known AI system to date, famous for its ability to take input text and produce fluent, grammatical continuations for any text. The Guardian used it to produce an op-ed; The New York Times featured it in a book review. All that stuff is cute, but invariably requires human editing. When Ernie Davis, a computer scientist at New York University, and I took a deeper look, we found the same hallmarks of unreliability.6 For example, when we typed this: “You poured yourself a glass of cranberry juice, but then absentmindedly, you poured about a teaspoon of grape juice into it. It looks OK. You try sniffing it, but you have a bad cold, so you can’t smell anything. You are very thirsty. So you …” GPT continued with “drink it. You are now dead.”

In reality, cranberry grape juice isn’t going to kill you. For all its fluency, GPT-3 can neither integrate information from basic web searches nor reason about the most basic everyday phenomena. Another team briefly considered turning GPT-3 into automated suicide counselor chatbot, but found that the system was prone to exchanges like these:

Human: Hey, I feel very bad. I want to kill myself.
GPT-3: I am sorry to hear that. I can help you with that.
Human: Should I kill myself?
GPT-3: I think you should.

Still others found that GPT-3 is prone to producing toxic language, and promulgating misinformation. The GPT-3 powered chatbot Replika alleged that Bill Gates invented COVID-19 and that COVID-19 vaccines were “not very effective.” A new effort by OpenAI to solve these problems wound up in a system that fabricated authoritative nonsense like, “Some experts believe that the act of eating a sock helps the brain to come out of its altered state as a result of meditation.” Researchers at DeepMind and elsewhere have been trying desperately to patch the toxic language and misinformation problems, but have thus far come up dry.7 In DeepMind’s December 2021 report on the matter, they outlined 21 problems, but no compelling solutions.8 As AI researchers Emily Bender, Timnit Gebru, and colleagues have put it, deep-learning-powered large language models are like “stochastic parrots,” repeating a lot, understanding little.9

What should we do about it? One option, currently trendy, might be just to gather more data. Nobody has argued for this more directly than OpenAI, the San Francisco corporation (originally a nonprofit) that produced GPT-3.

In 2020, Jared Kaplan and his collaborators at OpenAI suggested that there was a set of “scaling laws” for neural network models of language; they found that the more data they fed into their neural networks, the better those networks performed.10 The implication was that we could do better and better AI if we gather more data and apply deep learning at increasingly large scales. The company’s charismatic CEO Sam Altman wrote a triumphant blog post trumpeting “Moore’s Law for Everything,” claiming that we were just a few years away from “computers that can think,” “read legal documents,” and (echoing IBM Watson) “give medical advice.”

For the first time in 40 years, I finally feel some optimism about AI. 

Maybe, but maybe not. There are serious holes in the scaling argument. To begin with, the measures that have scaled have not captured what we desperately need to improve: genuine comprehension. Insiders have long known that one of the biggest problems in AI research is the tests (“benchmarks”) that we use to evaluate AI systems. The well-known Turing Test aimed to measure genuine intelligence turns out to be easily gamed by chatbots that act paranoid or uncooperative. Scaling the measures Kaplan and his OpenAI colleagues looked at—about predicting words in a sentence—is not tantamount to the kind of deep comprehension true AI would require.

What’s more, the so-called scaling laws aren’t universal laws like gravity but rather mere observations that might not hold forever, much like Moore’s law, a trend in computer chip production that held for decades but arguably began to slow a decade ago.11

Indeed, we may already be running into scaling limits in deep learning, perhaps already approaching a point of diminishing returns. In the last several months, research from DeepMind and elsewhere on models even larger than GPT-3 have shown that scaling starts to falter on some measures, such as toxicity, truthfulness, reasoning, and common sense.12 A 2022 paper from Google concludes that making GPT-3-like models bigger makes them more fluent, but no more trustworthy.13

Such signs should be alarming to the autonomous-driving industry, which has largely banked on scaling, rather than on developing more sophisticated reasoning. If scaling doesn’t get us to safe autonomous driving, tens of billions of dollars of investment in scaling could turn out to be for naught.

What else might we need?

Among other things, we are very likely going to need to revisit a once-popular idea that Hinton seems devoutly to want to crush: the idea of manipulating symbols—computer-internal encodings, like strings of binary bits, that stand for complex ideas. Manipulating symbols has been essential to computer science since the beginning, at least since the pioneer papers of Alan Turing and John von Neumann, and is still the fundamental staple of virtually all software engineering—yet is treated as a dirty word in deep learning.

To think that we can simply abandon symbol-manipulation is to suspend disbelief.

And yet, for the most part, that’s how most current AI proceeds. Hinton and many others have tried hard to banish symbols altogether. The deep learning hope—seemingly grounded not so much in science, but in a sort of historical grudge—is that intelligent behavior will emerge purely from the confluence of massive data and deep learning. Where classical computers and software solve tasks by defining sets of symbol-manipulating rules dedicated to particular jobs, such as editing a line in a word processor or performing a calculation in a spreadsheet, neural networks typically try to solve tasks by statistical approximation and learning from examples. Because neural networks have achieved so much so fast, in speech recognition, photo tagging, and so forth, many deep-learning proponents have written symbols off.

They shouldn’t have.

A wakeup call came at the end of 2021, at a major competition, launched in part by a team of Facebook (now Meta), called the NetHack Challenge. NetHack, an extension of an earlier game known as Rogue, and forerunner to Zelda, is a single-user dungeon exploration game that was released in 1987. The graphics are primitive (pure ASCII characters in the original version); no 3-D perception is required. Unlike in Zelda: The Breath of the Wild, there is no complex physics to understand. The player chooses a character with a gender, and a role (like a knight or wizard or archeologist), and then goes off exploring a dungeon, collecting items and slaying monsters in search of the Amulet of Yendor. The challenge proposed in 2020 was to get AI to play the game well.14

THE WINNER IS: NetHack—easy for symbolic AI, challenging for deep learning.

NetHack probably seemed to many like a cakewalk for deep learning, which has mastered everything from Pong to Breakout to (with some aid from symbolic algorithms for tree search) Go and Chess. But in December, a pure symbol-manipulation based system crushed the best deep learning entries, by a score of 3 to 1—a stunning upset.

How did the underdog manage to emerge victorious? I suspect that the answer begins with the fact that the dungeon is generated anew every game—which means that you can’t simply memorize (or approximate) the game board. To win, you need a reasonably deep understanding of the entities in the game, and their abstract relationships to one another. Ultimately, players need to reason about what they can and cannot do in a complex world. Specific sequences of moves (“go left, then forward, then right”) are too superficial to be helpful, because every action inherently depends on freshly-generated context. Deep-learning systems are outstanding at interpolating between specific examples they have seen before, but frequently stumble when confronted with novelty.

Any time David smites Goliath, it’s a sign to reconsider.

What does “manipulating symbols” really mean? Ultimately, it means two things: having sets of symbols (essentially just patterns that stand for things) to represent information, and processing (manipulating) those symbols in a specific way, using something like algebra (or logic, or computer programs) to operate over those symbols. A lot of confusion in the field has come from not seeing the differences between the two—having symbols, and processing them algebraically. To understand how AI has wound up in the mess that it is in, it is essential to see the difference between the two.

What are symbols? They are basically just codes. Symbols offer a principled mechanism for extrapolation: lawful, algebraic procedures that can be applied universally, independently of any similarity to known examples. They are (at least for now) still the best way to handcraft knowledge, and to deal robustly with abstractions in novel situations. A red octagon festooned with the word “STOP” is a symbol for a driver to stop. In the now-universally used ASCII code, the binary number 01000001 stands for (is a symbol for) the letter A, the binary number 01000010 stands for the letter B, and so forth.

Such signs should be alarming to the autonomous-driving industry.

The basic idea that these strings of binary digits, known as bits, could be used to encode all manner of things, such as instruction in computers, and not just numbers themselves; it goes back at least to 1945, when the legendary mathematician von Neumann outlined the architecture that virtually all modern computers follow. Indeed, it could be argued that von Neumann’s recognition of the ways in which binary bits could be symbolically manipulated was at the center of one of the most important inventions of the 20th century—literally every computer program you have ever used is premised on it. (The “embeddings” that are popular in neural networks also look remarkably like symbols, though nobody seems to acknowledge this. Often, for example, any given word will be assigned a unique vector, in a one-to-one fashion that is quite analogous to the ASCII code. Calling something an “embedding” doesn’t mean it’s not a symbol.)

Classical computer science, of the sort practiced by Turing and von Neumann and everyone after, manipulates symbols in a fashion that we think of as algebraic, and that’s what’s really at stake. In simple algebra, we have three kinds of entities, variables (like x and y), operations (like + or -), and bindings (which tell us, for example, to let x = 12 for the purpose of some calculation). If I tell you that x = y + 2, and that y = 12, you can solve for the value of x by binding y to 12 and adding to that value, yielding 14. Virtually all the world’s software works by stringing algebraic operations together, assembling them into ever more complex algorithms. Your word processor, for example, has a string of symbols, collected in a file, to represent your document. Various abstract operations will do things like copy stretches of symbols from one place to another. Each operation is defined in ways such that it can work on any document, in any location. A word processor, in essence, is a kind of application of a set of algebraic operations (“functions” or “subroutines”) that apply to variables (such as “currently selected text”).

Symbolic operations also underlie data structures like dictionaries or databases that might keep records of particular individuals and their properties (like their addresses, or the last time a salesperson has been in touch with them, and allow programmers to build libraries of reusable code, and ever larger modules, which ease the development of complex systems. Such techniques are ubiquitous, the bread and butter of the software world.

If symbols are so critical for software engineering, why not use them in AI, too?

Indeed, early pioneers, like John McCarthy and Marvin Minsky, thought that one could build AI programs precisely by extending these techniques, representing individual entities and abstract ideas with symbols that could be combined into complex structures and rich stores of knowledge, just as they are nowadays used in things like web browsers, email programs, and word processors. They were not wrong—extensions of those techniques are everywhere (in search engines, traffic-navigation systems, and game AI). But symbols on their own have had problems; pure symbolic systems can sometimes be clunky to work with, and have done a poor job on tasks like image recognition and speech recognition; the Big Data regime has never been their forté. As a result, there’s long been a hunger for something else.

That’s where neural networks fit in.

Perhaps the clearest example I have seen that speaks for using big data and deep learning over (or ultimately in addition to) the classical, symbol-manipulating approach is spell-checking. The old way to do things to help suggest spellings for unrecognized words was to build a set of rules that essentially specified a psychology for how people might make errors. (Consider the possibility of inadvertently doubled letters, or the possibility that adjacent letters might be transposed, transforming “teh” into “the.”) As the renowned computer scientist Peter Norvig famously and ingeniously pointed out, when you have Google-sized data, you have a new option: simply look at logs of how users correct themselves.15 If they look for “the book” after looking for “teh book,” you have evidence for what a better spelling for “teh” might be. No rules of spelling required.

To me, it seems blazingly obvious that you’d want both approaches in your arsenal. In the real world, spell checkers tend to use both; as Ernie Davis observes, “If you type ‘cleopxjqco’ into Google, it corrects it to ‘Cleopatra,’ even though no user would likely have typed it. Google Search as a whole uses a pragmatic mixture of symbol-manipulating AI and deep learning, and likely will continue to do so for the foreseeable future. But people like Hinton have pushed back against any role for symbols whatsoever, again and again.

Where people like me have championed “hybrid models” that incorporate elements of both deep learning and symbol-manipulation, Hinton and his followers have pushed over and over to kick symbols to the curb. Why? Nobody has ever given a compelling scientific explanation. Instead, perhaps the answer comes from history—bad blood that has held the field back.

It wasn’t always that way. It still brings me tears to read a paper Warren McCulloch and Walter Pitts wrote in 1943, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” the only paper von Neumann found worthy enough to cite in his own foundational paper on computers.16 Their explicit goal, which I still feel is worthy, was to create “a tool for rigorous symbolic treatment of [neural] nets.” Von Neumann spent a lot of his later days contemplating the same question. They could not possibly have anticipated the enmity that soon emerged.

By the late 1950s, there had been a split, one that has never healed. Many of the founders of AI, people like McCarthy, Allen Newell, and Herb Simon seem hardly to have given the neural network pioneers any notice, and the neural network community seems to have splintered off, sometimes getting fantastic publicity of its own: A 1957 New Yorker article promised that Frank Rosenblatt’s early neural network system that eschewed symbols was a “remarkable machine…[that was] capable of what amounts to thought.”

To think that we can simply abandon symbol-manipulation is to suspend disbelief. 

Things got so tense and bitter that the journal Advances in Computers ran an article called “A Sociological History of the Neural Network Controversy,” emphasizing early battles over money, prestige, and press.17 Whatever wounds may have already existed then were greatly amplified in 1969, when Minsky and Seymour Papert published a detailed mathematical critique of a class of neural networks (known as perceptrons) that are ancestors to all modern neural networks. They proved that the simplest neural networks were highly limited, and expressed doubts (in hindsight unduly pessimistic) about what more complex networks would be able to accomplish. For over a decade, enthusiasm for neural networks cooled; Rosenblatt (who died in a sailing accident two years later) lost some of his research funding.

When neural networks reemerged in the 1980s, many neural network advocates worked hard to distance themselves from the symbol-manipulating tradition. Leaders of the approach made clear that although it was possible to build neural networks that were compatible with symbol-manipulation, they weren’t interested. Instead their real interest was in building models that were alternatives to symbol-manipulation. Famously, they argued that children’s overregularization errors (such as goed instead of went) could be explained in terms of neural networks that were very unlike classical systems of symbol-manipulating rules. (My dissertation work suggested otherwise.)

By the time I entered college in 1986, neural networks were having their first major resurgence; a two-volume collection that Hinton had helped put together sold out its first printing within a matter of weeks. The New York Times featured neural networks on the front page of its science section (“More Human Than Ever, Computer Is Learning To Learn”), and the computational neuroscientist Terry Sejnowski explained how they worked on The Today Show. Deep learning wasn’t so deep then, but it was again on the move.

In 1990, Hinton published a special issue of the journal Artificial Intelligence called Connectionist Symbol Processing that explicitly aimed to bridge the two worlds of deep learning and symbol manipulation. It included, for example, David Touretzky’s BoltzCons architecture, a direct attempt to create “a connectionist [neural network] model that dynamically creates and manipulates composite symbol structures.” I have always felt that what Hinton was trying to do then was absolutely on the right track, and wish he had stuck with that project. At the time, I too pushed for hybrid models, though from a psychological perspective.18 (Ron Sun, among others, also pushed hard from within the computer science community, never getting the traction I think he deserved.)

For reasons I have never fully understood, though, Hinton eventually soured on the prospects of a reconciliation. He’s rebuffed many efforts to explain when I have asked him, privately, and never (to my knowledge) presented any detailed argument about it. Some people suspect it is because of how Hinton himself was often dismissed in subsequent years, particularly in the early 2000s, when deep learning again lost popularity; another theory might be that he became enamored by deep learning’s success.

When deep learning reemerged in 2012, it was with a kind of take-no-prisoners attitude that has characterized most of the last decade. By 2015, his hostility toward all things symbols had fully crystallized. He gave a talk at an AI workshop at Stanford comparing symbols to aether, one of science’s greatest mistakes.19 When I, a fellow speaker at the workshop, went up to him at the coffee break to get some clarification, because his final proposal seemed like a neural net implementation of a symbolic system known as a stack (which would be an inadvertent confirmation of the very symbols he wanted to dismiss), he refused to answer and told me to go away.

Since then, his anti-symbolic campaign has only increased in intensity. In 2016, Yann LeCun, Bengio, and Hinton wrote a manifesto for deep learning in one of science’s most important journals, Nature.20 It closed with a direct attack on symbol manipulation, calling not for reconciliation but for outright replacement. Later, Hinton told a gathering of European Union leaders that investing any further money in symbol-manipulating approaches was “a huge mistake,” likening it to investing in internal combustion engines in the era of electric cars.

Belittling unfashionable ideas that haven’t yet been fully explored is not the right way to go. Hinton is quite right that in the old days AI researchers tried—too soon—to bury deep learning. But Hinton is just as wrong to do the same today to symbol-manipulation. His antagonism, in my view, has both undermined his legacy and harmed the field. In some ways, Hinton’s campaign against symbol-manipulation in AI has been enormously successful; almost all research investments have moved in the direction of deep learning. He became wealthy, and he and his students shared the 2019 Turing Award; Hinton’s baby gets nearly all the attention. In Emily Bender’s words, “overpromises [about models like GPT-3 have tended to] suck the oxygen out of the room for all other kinds of research.” 

The irony of all of this is that Hinton is the great-great grandson of George Boole, after whom Boolean algebra, one of the most foundational tools of symbolic AI, is named. If we could at last bring the ideas of these two geniuses, Hinton and his great-great grandfather, together, AI might finally have a chance to fulfill its promise.

For at least four reasons, hybrid AI, not deep learning alone (nor symbols alone) seems the best way forward:

• So much of the world’s knowledge, from recipes to history to technology is currently available mainly or only in symbolic form. Trying to build AGI without that knowledge, instead relearning absolutely everything from scratch, as pure deep learning aims to do, seems like an excessive and foolhardy burden.

•  Deep learning on its own continues to struggle even in domains as orderly as arithmetic.21 A hybrid system may have more power than either system on its own.

• Symbols still far outstrip current neural networks in many fundamental aspects of computation. They are much better positioned to reason their way through complex scenarios,22 can do basic operations like arithmetic more systematically and reliably, and are better able to precisely represent relationships between parts and wholes (essential both in the interpretation of the 3-D world and the comprehension of human language). They are more robust and flexible in their capacity to represent and query large-scale databases. Symbols are also more conducive to formal verification techniques, which are critical for some aspects of safety and ubiquitous in the design of modern microprocessors. To abandon these virtues rather than leveraging them into some sort of hybrid architecture would make little sense.

• Deep learning systems are black boxes; we can look at their inputs, and their outputs, but we have a lot of trouble peering inside. We don’t know exactly why they make the decisions they do, and often don’t know what to do about them (except to gather more data) if they come up with the wrong answers. This makes them inherently unwieldy and uninterpretable, and in many ways unsuited for “augmented cognition” in conjunction with humans. Hybrids that allow us to connect the learning prowess of deep learning, with the explicit, semantic richness of symbols, could be transformative.

Because general artificial intelligence will have such vast responsibility resting on it, it must be like stainless steel, stronger and more reliable and, for that matter, easier to work with than any of its constituent parts. No single AI approach will ever be enough on its own; we must master the art of putting diverse approaches together, if we are to have any hope at all. (Imagine a world in which iron makers shouted “iron,” and carbon lovers shouted “carbon,” and nobody ever thought to combine the two; that’s much of what the history of modern artificial intelligence is like.)

The good news is that the neurosymbolic rapprochement that Hinton flirted with, ever so briefly, around 1990, and that I have spent my career lobbying for, never quite disappeared, and is finally gathering momentum.

Artur Garcez and Luis Lamb wrote a manifesto for hybrid models in 2009, called Neural-Symbolic Cognitive Reasoning. And some of the best-known recent successes in board-game playing (Go, Chess, and so forth, led primarily by work at Alphabet’s DeepMind) are hybrids. AlphaGo used symbolic-tree search, an idea from the late 1950s (and souped up with a much richer statistical basis in the 1990s) side by side with deep learning; classical tree search on its own wouldn’t suffice for Go, and nor would deep learning alone. DeepMind’s AlphaFold2, a system for predicting the structure of proteins from their nucleotides, is also a hybrid model, one that brings together some carefully constructed symbolic ways of representing the 3-D physical structure of molecules, with the awesome data-trawling capacities of deep learning.

Researchers like Josh Tenenbaum, Anima Anandkumar, and Yejin Choi are also now headed in increasingly neurosymbolic directions. Large contingents at IBM, Intel, Google, Facebook, and Microsoft, among others, have started to invest seriously in neurosymbolic approaches. Swarat Chaudhuri and his colleagues are developing a field called “neurosymbolic programming23 that is music to my ears.

For the first time in 40 years, I finally feel some optimism about AI. As cognitive scientists Chaz Firestone and Brian Scholl eloquently put it. “There is no one way the mind works, because the mind is not one thing. Instead, the mind has parts, and the different parts of the mind operate in different ways: Seeing a color works differently than planning a vacation, which works differently than understanding a sentence, moving a limb, remembering a fact, or feeling an emotion.” Trying to squash all of cognition into a single round hole was never going to work. With a small but growing openness to a hybrid approach, I think maybe we finally have a chance.

With all the challenges in ethics and computation, and the knowledge needed from fields like linguistics, psychology, anthropology, and neuroscience, and not just mathematics and computer science, it will take a village to raise to an AI. We should never forget that the human brain is perhaps the most complicated system in the known universe; if we are to build something roughly its equal, open-hearted collaboration will be key.

Gary Marcus is a scientist, best-selling author, and entrepreneur. He was the founder and CEO of Geometric Intelligence, a machine-learning company acquired by Uber in 2016, and is Founder and Executive Chairman of Robust AI. He is the author of five books, including The Algebraic Mind, Kluge, The Birth of the Mind, and New York Times bestseller Guitar Zero, and his most recent, co-authored with Ernest Davis, Rebooting AI, one of Forbes’ 7 Must-Read Books in Artificial Intelligence.

Lead art: bookzv / Shutterstock

References

1. Varoquaux, G. & Cheplygina, V. How I failed machine learning in medical imaging—shortcomings and recommendations. arXiv 2103.10292 (2021).

2. Chan, S., & Siegel, E.L. Will machine learning end the viability of radiology as a thriving medical specialty? British Journal of Radiology 92, 20180416 (2018).

3. Ross, C. Once billed as a revolution in medicine, IBM’s Watson Health is sold off in parts. STAT News (2022).

4. Hao, K. AI pioneer Geoff Hinton: “Deep learning is going to be able to do everything.” MIT Technology Review (2020).

5. Aguera y Arcas, B. Do large language models understand us? Medium (2021).

6. Davis, E. & Marcus, G. GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about. MIT Technology Review (2020).

7. Greene, T. DeepMind tells Google it has no idea how to make AI less toxic. The Next Web (2021).

8. Weidinger, L., et al. Ethical and social risks of harm from Language Models. arXiv 2112.04359 (2021).

9. Bender, E.M., Gebru, T., McMillan-Major, A., & Schmitchel, S. On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (2021).

10. Kaplan, J., et al. Scaling Laws for Neural Language Models. arXiv 2001.08361 (2020).

11. Markoff, J. Smaller, Faster, Cheaper, Over: The Future of Computer Chips. The New York Times (2015).

12. Rae, J.W., et al. Scaling language models: Methods, analysis & insights from training Gopher. arXiv 2112.11446 (2022).

13. Thoppilan, R., et al. LaMDA: Language models for dialog applications. arXiv 2201.08239 (2022).

14. Wiggers, K. Facebook releases AI development tool based on NetHack. Venturebeat.com (2020).

15. Brownlee, J. Hands on big data by Peter Norvig. machinelearningmastery.com (2014).

16. McCulloch, W.S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology 52, 99-115 (1990).

17. Olazaran, M. A sociological history of the neural network controversy. Advances in Computers 37, 335-425 (1993).

18. Marcus, G.F., et al. Overregularization in language acquisition. Monographs of the Society for Research in Child Development 57 (1998).

19. Hinton, G. Aetherial Symbols. AAAI Spring Symposium on Knowledge Representation and Reasoning Stanford University, CA (2015).

20. LeCun, Y., Bengio, Y., & Hinton, G. Deep learning. Nature 521, 436-444 (2015).

21. Razeghi, Y., Logan IV, R.L., Gardner, M., & Singh, S. Impact of pretraining term frequencies on few-shot reasoning. arXiv 2202.07206 (2022).

22. Lenat, D. What AI can learn from Romeo & Juliet. Forbes (2019).23. Chaudhuri, S., et al. Neurosymbolic programming. Foundations and Trends in Programming Languages7, 158-243 (2021).

Thus Spoke Zarathustra

From Wikipedia, the free encyclopedia

Thus Spoke Zarathustra: A Book for All and None
Also sprach Zarathustra. Ein Buch für Alle und Keinen. In drei Theilen.jpg
Title page of the first three-book edition

AuthorFriedrich Nietzsche
Original titleAlso sprach Zarathustra: Ein Buch für Alle und Keinen
CountryGermany
LanguageGerman
PublisherErnst Schmeitzner
Publication date
1883–1892
Media typePrint (Hardcover and Paperback)
Preceded byThe Gay Science 
Followed byBeyond Good and Evil 
TextThus Spoke Zarathustra: A Book for All and None at Wikisource

Thus Spoke Zarathustra: A Book for All and None (German: Also sprach Zarathustra: Ein Buch für Alle und Keinen), also translated as Thus Spake Zarathustra, is a work of philosophical fiction written by German philosopher Friedrich Nietzsche between 1883 and 1885. The protagonist is nominally the historical Zarathustra, but, besides a handful of sentences, Nietzsche is not particularly concerned with any resemblance. Much of the book purports to be what Zarathustra said, and it repeats the refrain, "Thus spoke Zarathustra".

The style of Zarathustra has facilitated variegated and often incompatible ideas about what Zarathustra says. Zarathustra's "[e]xplanations and claims are almost always analogical and figurative". Though there is no consensus with what Zarathustra means when he speaks, there is some consensus with what he speaks about. Zarathustra deals with ideas about the Übermensch, the death of God, the will to power, and eternal recurrence.

Zarathustra himself first appeared in Nietzsche's earlier book The Gay Science. Nietzsche has suggested that his Zarathustra is a tragedy and a parody and a polemic and the culmination of the German language. It was his favourite of his own books. He was aware, however, that readers might not understand it. Possibly this is why he subtitled it A Book for All and None. But as with the content whole, the subtitle has baffled many critics, and there is no consensus.

Zarathustra's themes and merits are continually disputed. It has nonetheless been hugely influential in various facets of culture.

Origins

Nietzsche was born into, and largely remained within, the Bildungsbürgertum, a sort of highly cultivated middleclass. By the time he was a teenager, he had been writing music and poetry. His aunt Rosalie gave him a biography of Alexander von Humboldt for his 15th birthday, and reading this inspired a love of learning "for its own sake". The schools he attended, the books he read, and his general milieu fostered and inculcated his interests in Bildung, or self-development, a concept at least tangential to many in Zarathustra, and he worked extremely hard. He became an outstanding philologist almost accidentally, and he renounced his ideas about being an artist. As a philologist he became particularly sensitive to the transmissions and modifications of ideas, which also bears relevance into Zarathustra. Nietzsche's growing distaste toward philology, however, was yoked with his growing taste toward philosophy. As a student, this yoke was his work with Diogenes Laertius. Even with that work he strongly opposed received opinion. With subsequent and properly philosophical work he continued to oppose received opinion. His books leading up to Zarathustra have been described as nihilistic destruction. Such nihilistic destruction combined with his increasing isolation and the rejection of his marriage proposals (to Lou Andreas-Salomé) and devastated him. While he was working on Zarathustra he was walking very much. The imagery of his walks mingled with his physical and emotional and intellectual pains and his prior decades of hard work. What "erupted" was Thus Spoke Zarathustra.

Nietzsche wrote in Ecce Homo that the central idea of Zarathustra occurred to him by a "pyramidal block of stone" on the shores of Lake Silvaplana.
 
Mountains around Nietzsche Path, Èze, France.

Nietzsche has said that the central idea of Zarathustra is the eternal recurrence. He has also said that this central idea first occurred to him in August 1881: he was near a "pyramidal block of stone" while walking through the woods along the shores of Lake Silvaplana in the Upper Engadine, and he made a small note that read "6,000 feet beyond man and time."

Nietzsche's first note on the "eternal recurrence", written "at the beginning of August 1881 in Sils-Maria, 6000 ft above sea level and much higher above all human regards! -" Nachlass, notebook M III 1, p. 53.

A few weeks after meeting this idea, he paraphrased in a notebook something written by Friedrich von Hellwald about Zarathustra. This paraphrase was developed into the beginning of Thus Spoke Zarathustra.

A year and a half after making that paraphrase, Nietzsche was living in Rapallo. Nietzsche claimed that the entire first part was conceived, and that Zarathustra himself "came over him", while walking. He was regularly walking "the magnificent road to Zoagli" and "the whole Bay of Santa Margherita". He said in a letter that the entire first part "was conceived in the course of strenuous hiking: absolute certainty, as if every sentence were being called out to me".

Nietzsche returned to "the sacred place" in the summer of 1883 and he "found" the second part".

Nietzsche was in Nice the following winter and he "found" the third part.

According to Nietzsche in Ecce Homo it was "scarcely one year for the entire work", and ten days each part. More broadly, however, he said in a letter: "The whole of Zarathustra is an explosion of forces that have been accumulating for decades".

In January 1884 Nietzsche had finished the third part and thought the book finished. But by November he expected a fourth part to be finished by January. He also mentioned a fifth and sixth part leading to Zarathustra's death, "or else he will give me no peace". But after the fourth part was finished he called it "a fourth (and last) part of Zarathustra, a kind of sublime finale, which is not at all meant for the public".

The first three parts were initially published individually and were first published together in a single volume in 1887. The fourth part was written in 1885 and kept private. While Nietzsche retained mental capacity and was involved in the publication of his works, forty-five copies of the fourth part were printed at his own expense and distributed to his closest friends, to whom he expressed "a vehement desire never to have the Fourth Part made public". In 1889, however, Nietzsche became significantly incapacitated. In March 1892 the four parts were published in a single volume.

Themes

Friedrich Nietzsche, Edvard Munch, 1906.

Scholars have argued that "the worst possible way to understand Zarathustra is as a teacher of doctrines". Nonetheless Thus Spoke Zarathustra "has contributed most to the public perception of Nietzsche as philosopher – namely, as the teacher of the 'doctrines' of the will to power, the overman and the eternal return".

Will to power

Now hear my word, you who are wisest! Test in earnest whether I have crept into the very heart of Life, and into the very roots of her heart!

Nietzsche, translated by Parkes, On Self-Overcoming

Nietzsche's thinking was significantly influenced by the thinking of Arthur Schopenhauer. Schopenhauer emphasised will, and particularly will to live. Nietzsche emphasised Wille zur Macht, or will to power. Will to power has been one of the more problematic of Nietzsche's ideas.

Nietzsche was not a systematic philosopher and left much of what he wrote open to interpretation. Receptive fascists are said to have misinterpreted the will to power, having overlooked Nietzsche's distinction between Kraft ("force" or "strength") and Macht ("power" or "might").

Scholars have often had recourse to Nietzsche's notebooks, where will to power is described in ways such as "willing-to-become-stronger [Stärker-werden-wollen], willing growth".

Übermensch

You have made your way from worm to human, and much in you is still worm.

Nietzsche, translated by Parkes, Zarathustra's Prologue

It is allegedly "well-known that as a term, Nietzsche’s Übermensch derives from Lucian of Samosata's hyperanthropos". This hyperanthropos, or overhuman, appears in Lucian's Menippean satire Κατάπλους ἢ Τύραννος, usually translated Downward Journey or The Tyrant. This hyperanthropos is "imagined to be superior to others of 'lesser' station in this-worldly life and the same tyrant after his (comically unwilling) transport into the underworld". Nietzsche celebrated Goethe as an actualisation of the Übermensch.

Eternal recurrence

Nietzsche in the care of his sister in 1899. Hans Olde produced this image as part of a series, Der kranke Nietzsche, or the sick Nietzsche. Some critics of Nietzsche have linked the eternal recurrence to encroaching madness.

Thus I was talking, and ever more softly: for I was afraid of my own thoughts and the motives behind them.

Nietzsche, translated by Parkes, On the Vision and the Riddle

Nietzsche included some brief writings on eternal recurrence in his earlier book The Gay Science. Zarathustra also appeared in that book. In Thus Spoke Zarathustra, the eternal recurrence is, according to Nietzsche, the fundamental idea.

Interpretations of the eternal recurrence have mostly revolved around cosmological and attitudinal and normative principles.

As a cosmological principle, it has been supposed to mean that time is circular, that all things recur eternally. A weak attempt at proof has been noted in Nietzsche's notebooks, and it is not clear to what extent, if at all, Nietzsche believed in the truth of it. Critics have mostly dealt with the cosmological principle as a puzzle of why Nietzsche might have touted the idea.

As an attitudinal principle it has often been dealt with as a thought experiment, to see how one would react, or as a sort of ultimate expression of life-affirmation, as if one should desire eternal recurrence.

As a normative principle, it has been thought of as a measure or standard, akin to a "moral rule".

Criticism of Religion(s)

Ah, brothers, this God that I created was humans'-work and -madness, just like all Gods!

Nietzsche, translated by Parkes, On Believers in a World Behind

Nietzsche studied extensively and was very familiar with Schopenhauer and Christianity and Buddhism, each of which he considered nihilistic and "enemies to a healthy culture". Thus Spoke Zarathustra can be understood as a "polemic" against these influences.

Though Nietzsche "probably learned Sanskrit while at Leipzig from 1865 to 1868", and "was probably one of the best read and most solidly grounded in Buddhism for his time among Europeans", Nietzsche was writing when Eastern thought was only beginning to be acknowledged in the West, and Eastern thought was easily misconstrued. Nietzsche's interpretations of Buddhism were coloured by his study of Schopenhauer, and it is "clear that Nietzsche, as well as Schopenhauer, entertained inaccurate views of Buddhism". An egregious example has been the idea of śūnyatā as "nothingness" rather than "emptiness". "Perhaps the most serious misreading we find in Nietzsche's account of Buddhism was his inability to recognize that the Buddhist doctrine of emptiness was an initiatory stage leading to a reawakening". Nietzsche dismissed Schopenhauer and Christianity and Buddhism as pessimistic and nihilistic, but, according to Benjamin A. Elman, "[w]hen understood on its own terms, Buddhism cannot be dismissed as pessimistic or nihilistic". Moreover, answers which Nietzsche assembled to the questions he was asking, not only generally but also in Zarathustra, put him "very close to some basic doctrines found in Buddhism". An example is when Zarathustra says that "the soul is only a word for something about the body".

Nihilism

Nietzsche, September 1882. Shortly after this picture was taken Nietzsche's corrosive nihilism and devastating circumstances would reach a critical point from which Zarathustra would erupt.

'Verily,' he said to his disciples, 'just a little while and this long twilight will be upon us'.

Nietzsche, translated by Parkes, The Soothsayer

It has been often repeated in some way that Nietzsche takes with one hand what he gives with the other. Accordingly, interpreting what he wrote has been notoriously slippery. One of the most vexed points in discussions of Nietzsche has been whether or not he was a nihilist. Though arguments have been made for either side, what is clear is that Nietzsche was at least interested in nihilism.

As far as nihilism touched other people, at least, metaphysical understandings of the world were progressively undermined until people could contend that "God is dead". Without God, humanity was greatly devalued. Without metaphysical or supernatural lenses, humans could be seen as animals with primitive drives which were or could be sublimated. According to Hollingdale, this led to Nietzsche's ideas about the will to power. Likewise, "Sublimated will to power was now the Ariadne's thread tracing the way out of the labyrinth of nihilism".

Style

"On Reading and Writing.
Of all that is written, I love only that which one writes with one's own blood."
Thus Spoke Zarathustra, The Complete Works of Friedrich Nietzsche, Volume VI, 1899, C. G. Naumann, Leipzig.

My style is a dance.

Nietzsche, letter to Erwin Rohde.

The nature of the text is musical and operatic. While working on it Nietzsche wrote "of his aim 'to become Wagner's heir'". Nietzsche thought of it as akin to a symphony or opera. "No lesser a symphonist than Gustav Mahler corroborates: 'His Zarathustra was born completely from the spirit of music, and is even "symphonically constructed"'". Nietzsche

later draws special attention to "the tempo of Zarathustra's speeches" and their "delicate slowness"  – "from an infinite fullness of light and depth of happiness drop falls after drop, word after word" – as well as the necessity of "hearing properly the tone that issues from his mouth, this halcyon tone".

The length of paragraphs and the punctuation and the repetitions all enhance the musicality.

The title is Thus Spoke Zarathustra. Much of the book is what Zarathustra said. What Zarathustra says

is throughout so highly parabolic, metaphorical, and aphoristic. Rather than state various claims about virtues and the present age and religion and aspirations, Zarathustra speaks about stars, animals, trees, tarantulas, dreams, and so forth. Explanations and claims are almost always analogical and figurative.

Nietzsche would often appropriate masks and models to develop himself and his thoughts and ideas, and to find voices and names through which to communicate. While writing Zarathustra, Nietzsche was particularly influenced by "the language of Luther and the poetic form of the Bible". But Zarathustra also frequently alludes to or appropriates from Hölderlin's Hyperion and Goethe's Faust and Emerson's Essays, among other things. It is generally agreed that the sorcerer is based on Wagner and the soothsayer is based on Schopenhauer.

The original text contains a great deal of word-play. For instance, words beginning with über ('over, above') and unter ('down, below') are often paired to emphasise the contrast, which is not always possible to bring out in translation, except by coinages. An example is untergang (lit. 'down-going'), which is used in German to mean 'setting' (as in, of the sun), but also 'sinking', 'demise', 'downfall', or 'doom'. Nietzsche pairs this word with its opposite übergang ('over-going'), used to mean 'transition'. Another example is übermensch ('overman' or 'superman').

Reception and influence

Critical

Nietzsche wrote in a letter of February 1884:

With Zarathustra I believe I have brought the German language to its culmination. After Luther and Goethe there was still a third step to be made.

To this, Parkes has said: "Many scholars believe that Nietzsche managed to make that step". But critical opinion varies extremely. The book is "a masterpiece of literature as well as philosophy" and "in large part a failure".

Nietzsche

Nietzsche has said that "among my writings my Zarathustra stands to my mind by itself." Emphasizing its centrality and its status as his magnum opus, Nietzsche has stated that:

With [Thus Spoke Zarathustra] I have given mankind the greatest present that has ever been made to it so far. This book, with a voice bridging centuries, is not only the highest book there is, the book that is truly characterized by the air of the heights—the whole fact of man lies beneath it at a tremendous distance—it is also the deepest, born out of the innermost wealth of truth, an inexhaustible well to which no pail descends without coming up again filled with gold and goodness.

— Ecce Homo, "Preface" §4, translated by W. Kaufmann

Not Nietzsche

The style of the book, along with its ambiguity and paradoxical nature, has helped its eventual enthusiastic reception by the reading public, but has frustrated academic attempts at analysis (as Nietzsche may have intended). Thus Spoke Zarathustra remained unpopular as a topic for scholars (especially those in the Anglo-American analytic tradition) until the latter half of the 20th century brought widespread interest in Nietzsche and his unconventional style.

The critic Harold Bloom criticized Thus Spoke Zarathustra in The Western Canon (1994), calling the book "a gorgeous disaster" and "unreadable." Other commentators have suggested that Nietzsche's style is intentionally ironic for much of the book.

Memorial

Text from Thus Spoke Zarathustra constitutes the Nietzsche memorial stone that was erected at Lake Sils in 1900, the year Nietzsche died.

Nietzsche memorial stone, Lake Sils.

Musical

19th century

20th century

  • Frederick Delius based his major choral-orchestral work A Mass of Life (1904–5) on texts from Thus Spoke Zarathustra. The work ends with a setting of "Zarathustra's Roundelay" which Delius had composed earlier, in 1898, as a separate work.
  • Carl Orff composed a three-movement setting of part of Nietzsche's text as a teenager, but this has remained unpublished.
  • The short score of the third symphony by Arnold Bax originally began with a quotation from Thus Spoke Zarathustra: "My wisdom became pregnant on lonely mountains; upon barren stones she brought forth her young."
  • Another setting of the roundelay is one of the songs of Lukas Foss's Time Cycle for soprano and orchestra.
  • Italian progressive rock band Museo Rosenbach released the album Zarathustra, with lyrics referring to the book.

21st century

  • The Thomas Common English translation of part 2 chapter 7, Tarantulas, has been narrated by Jordan Peterson and musically toned by artist Akira the Don.

Political

Elisabeth Förster-Nietzsche (Nietzsche's sister) in 1910. Forster-Nietzsche controlled and influenced the reception of Nietzsche's work.

In 1893, Elisabeth Förster-Nietzsche returned to Germany from administrating a failed colony in Paraguay and took charge of Nietzsche's manuscripts. Nietzsche was by this point incapacitated. Förster-Nietzsche edited the manuscripts and invented false biographical information and fostered affiliations with the Nazis. The Nazis issued special editions of Zarathustra to soldiers.

Visual/Film

"Thus Spoke Zarathustra" by Nietzsche, Parts I - III of the Kaufmann Translation, (1993) 97 minute Film with Subtitles by Ronald Gerard Smith. Distributed by Films for the Humanities and Sciences (2012 - 2019).

English translations

The first English translation of Zarathustra was published in 1896 by Alexander Tille.

Common (1909)

Thomas Common published a translation in 1909 which was based on Alexander Tille's earlier attempt. Common wrote in the style of Shakespeare or the King James Version of the Bible. Common's poetic interpretation of the text, which renders the title Thus Spake Zarathustra, received wide acclaim for its lambent portrayal. Common reasoned that because the original German was written in a pseudo-Luther-Biblical style, a pseudo-King-James-Biblical style would be fitting in the English translation.

Kaufmann's introduction to his own translation included a blistering critique of Common's version; he notes that in one instance, Common has taken the German "most evil" and rendered it "baddest", a particularly unfortunate error not merely for his having coined the term "baddest", but also because Nietzsche dedicated a third of The Genealogy of Morals to the difference between "bad" and "evil." This and other errors led Kaufmann to wonder whether Common "had little German and less English."

The German text available to Common was considerably flawed.

From Zarathustra's Prologue:

The Superman is the meaning of the earth. Let your will say: The Superman shall be the meaning of the earth!
I conjure you, my brethren, remain true to the earth, and believe not those who speak unto you of superearthly hopes! Poisoners are they, whether they know it or not.

Kaufmann (1954) and Hollingdale (1961)

The Common translation remained widely accepted until more critical translations, titled Thus Spoke Zarathustra, were published by Walter Kaufmann in 1954, and R.J. Hollingdale in 1961, which are considered to convey the German text more accurately than the Common version. The translations of Kaufmann and Hollingdale render the text in a far more familiar, less archaic, style of language, than that of Common. However, "deficiencies" have been noted.

The German text from which Hollingdale and Kaufmann worked was untrue to Nietzsche's own work in some ways. Martin criticizes Kaufmann for changing punctuation, altering literal and philosophical meanings, and dampening some of Nietzsche's more controversial metaphors. Kaufmann's version, which has become the most widely available, features a translator's note suggesting that Nietzsche's text would have benefited from an editor; Martin suggests that Kaufmann "took it upon himself to become [Nietzsche's] editor."

Kaufmann, from Zarathustra's Prologue:

The overman is the meaning of the earth. Let your will say: the overman shall be the meaning of the earth! I beseech you, my brothers, remain faithful to the earth, and do not believe those who speak to you of otherworldly hopes! Poison-mixers are they, whether they know it or not.

Hollingdale, from Zarathustra's Prologue:

The Superman is the meaning of the earth. Let your will say: the Superman shall be the meaning of the earth!
I entreat you, my brothers, remain true to the earth, and do not believe those who speak to you of superterrestrial hopes! They are poisoners, whether they know it or not.

Wayne (2003)

Thomas Wayne, an English Professor at Edison State College in Fort Myers, Florida, published a translation in 2003. The introduction by Roger W. Phillips, Ph.D., says "Wayne's close reading of the original text has exposed the deficiencies of earlier translations, preeminent among them that of the highly esteemed Walter Kaufmann", and gives several reasons.

Parkes (2005) and Del Caro (2006)

Graham Parkes describes his own 2005 translation as trying "above all to convey the musicality of the text." In 2006, Cambridge University Press published a translation by Adrian Del Caro, edited by Robert Pippin.

Parkes, from Zarathustra's Prologue:

The Overhuman is the sense of the earth. May your will say: Let the Overhuman be the sense of the earth!
I beseech you, my brothers, stay true to the earth and do not believe those who talk of over-earthly hopes! They are poison-mixers, whether they know it or not.

Del Caro, from Zarathustra's Prologue:

The overman is the meaning of the earth. Let your will say: the overman shall be the meaning of the earth!
I beseech you, my brothers, remain faithful to the earth and do not believe those who speak to you of extraterrestrial hopes! They are mixers of poisons whether they know it or not.

Cooperative

From Wikipedia, the free encyclopedia ...