Macroevolution comprises the evolutionary processes and patterns which occur at and above the species level. In contrast, microevolution
is evolution occurring within the population(s) of a single species. In
other words, microevolution is the scale of evolution that is limited
to intraspecific (within-species) variation, while macroevolution
extends to interspecific (between-species) variation. The evolution of new species (speciation) is an example of macroevolution. This is the common definition for 'macroevolution' used by contemporary scientists. However, the exact usage of the term has varied throughout history.
Macroevolution addresses the evolution of species and higher taxonomic groups (genera, families, orders, etc) and uses evidence from phylogenetics, the fossil record, and molecular biology to answer how different taxonomic groups exhibit different species diversity and/or morphological disparity.
Origin and changing meaning of the term
After Charles Darwin published his book On the Origin of Species in 1859, evolution was widely accepted to be real phenomenon. However, many scientists still disagreed with Darwin that natural selection was the primary mechanism to explain evolution. Prior to the modern synthesis, during the period between the 1880s to the 1930s (dubbed the 'Eclipse of Darwinism') many scientists argued in favor of alternative explanations. These included 'orthogenesis', and among its proponents was the Russian entomologist Yuri A. Filipchenko.
Filipchenko appears to have been the one who coined the term 'macroevolution' in his book Variabilität und Variation (1927). While introducing the concept, he claimed that the field of genetics is insufficient to explain "the origin of higher systematic units" above the species level.
Auf
die Weise hebt die heutige Genetik zweifellos den Schleier von der
Evolution der Biotypen, Jordanone und Linneone (eine Art
Mikroevolution), dagegen jene Evolution der höheren systematischen
Gruppen, welche von jeher die Geister besonders für sich in Anspruch
genommen hat (eine Art Makroevolution), liegt gänzlich außerhalb ihres
Gesichtsfeldes, und dieser Umstand scheint uns die von uns oben
angeführten Erwägungen über das Fehlen einer inneren Beziehung zwischen
der Genetik und der Deszendenzlehre, die sich ja hauptsächlich mit der
Makroevolution befaßt, nur zu unterstreichen. Bei einer solchen Sachlage
muß zugegeben werden, daß die Entscheidung der Frage über die Faktoren
der größeren Züge der Evolution, d. h. dessen, was wir Makroevolution
nennen, unabhängig von den Ergebnissen der gegenwärtigen Genetik
geschehen muß. So vorteilhaft es für uns auch wäre, uns auch in dieser
Frage auf die exakten Resultate der Genetik zu stützen, so sind sie
doch, unserer Meinung nach, zu diesem Zweck ganz unbrauchbar, da die
Frage über die Entstehung der höheren systematischen Einheiten ganz
außerhalb des Forschungsgebietes der Genetik liegt. Infolgedessen ist
letztere auch eine exakte Wissenschaft, während die Dezendenzlehre
heute, ebenso wie auch in XIX. Jahrhundert, einen einen spekulativen
Charakter trägt.
In this way, modern
genetics undoubtedly lifts the veil from the evolution of biotypes,
Jordanones and Linneones [i.e. variations within a species (a kind of microevolution), but that evolution of the higher systematic
groups, which has always particularly occupied the minds of men (a kind
of macroevolution), lies entirely outside its field of vision, and this
circumstance seems to us only to emphasize the considerations we have
given above about the lack of an inner relationship between genetics and
the theory of descent, which is mainly concerned with macroevolution.
In such a state of affairs, it must be admitted that the decision of the
question depends on the factors of the larger features of evolution, of
what we call macroevolution, must occur independently of the results of
current genetics. As advantageous as it would be for us to rely on the
exact results of genetics in this question, they are, in our opinion,
completely useless for this purpose, since the question about the origin
of the higher systematic units lies entirely outside the field research
area of genetics. As a result, the latter is also an exact science,
while the doctrine of descent today, as well as in the 19th century, has
a speculative character.
— Yuri Filipchenko, Variabilität und Variation (1927), pages 93-94
Filipchenko's also claimed that a new taxon cannot evolve from an
older one with a lower rank; e.g. a species cannot evolve into a family.
It must originate from a preceding family. Furthermore, the evolution
of a new family must require the sudden appearance of new traits which
are different in greater magnitude compared to the new traits required
for the evolution of a genus or species.
Hier
scheint uns ein wesentliches Mißverständnis obzuwalten. Davon schon gar
nicht zu reden, daß es kaum richtig ist, in den Jardanonen
Spaltungsprodukte eines Linneone zu sehen, ist es noch unrichtiger
anzunehmen, daß nach den heutigen Anschauungen ein Jordanon sich im
Evolutionsprozeß in ein neues Linneon verwandeln kann oder muß. Im
Gegenteil, uns scheint, daß sich bei der Evolution die verschiedenen
taxonomischen Einheiten so verhalten, daß Gleiches Gleiches erzeugt. Aus
einem Biotyp entsteht durch Mutation ein neuer Biotypus, aus einem
Jordanon bildet sich - durch eine Neugruppierung der ihn bildenden
Biotypen, sowie durch das Auftreten einiger neuer - ein zweites
Jordanon; endlich zerfällt ein aus mehreren Jordanonen bestehendes
Linneon infolge des Verschwindens einiger von ihnen in zwei selbständige
Linneone. Es ist vollkommen richtig, daß niemand eine Umwandlung der
Rassen in eine Art beobachtet hat, aber das braucht auch nicht zu sein,
da im Prozeß der Evolution eine neue Art oder Arten gewöhnlich aus einer
alten Art, eine neue Gattung aus einer anderen Gattung usw. entstehen.
There seems to be a fundamental misunderstanding here. Not to mention that it is hardly correct to see the Jardanones as products of the fission of a Linneone, it is even more incorrect to assume that, according to modern views, a
Jordanone can or must transform into a new Linneone in the process of
evolution. On the contrary, it seems to us that in evolution the various
taxonomic units behave in such a way that like produces like. A new
biotype arises from one biotype through mutation; a Jordanone forms a second
Jordanone through a regrouping of the biotypes that make up it and the
appearance of some new ones; finally, a Linneone consisting of several
Jordanones splits into two independent Linneones as a result of the
disappearance of some of them. It is quite true that no one has observed
a transformation of the races into a species, but that need not be the
case, since in the process of evolution a new species or species usually
arise from an old species, a new genus from another genus, etc.
— Yuri Filipchenko, Variabilität und Variation (1927), page 89
However, Filipchenko's views are not consistent with contemporary understanding of evolution. Furthermore, the Linnaean ranks of 'genus' (and higher) are not real entities but arbitrary concepts. These traditional taxonomic concepts break down when they are applied to common ancestry.
Nevertheless, Filipchenko’s distinction between microevolution
and macroevolution had a major influence on evolutionary biology. The
term macroevolution was adopted by Filipchenko's protégé Theodosius Dobzhansky in his book 'Genetics und the Origin of Species'
(1937), a seminal piece that contributed to the development of the
Modern Synthesis. The term was also used by critics of the Modern
Synthesis. A good example of this is the book The Material Basis of Evolution (1940) by the geneticist Richard Goldschmidt, a close friend of Filipchenko. Goldschmidt suggested saltational evolutionary changes which found a moderate revival in the 'hopeful monster' concept of evolutionary developmental biology (or evo-devo). Occasionally such dramatic changes can lead to novel features that survive.
As an alternative to saltational evolution, Dobzhansky suggested that the difference between macroevolution and microevolution
reflects essentially a difference in time-scales, and that
macroevolutionary changes were simply the sum of microevolutionary
changes over geologic time. This view became broadly accepted in the
middle of the last century but it has been challenged by a number of
scientists who claim that microevolution is necessary but not sufficient
to explain macroevolution. This is the decoupled view (see below).
Microevolution vs Macroevolution
Micro- and macroevolution are both supported by overwhelming evidence. This fact remains uncontroversial within the scientific community. However, there has been considerable debate regarding the connection between microevolution and macroevolution.
Broadly speaking, there are two views regarding this issue. The 'Extrapolation' view holds that macroevolution is merely cumulative microevolution. The 'Decoupled'
view holds that there are separate macroevolutionary processes that
cannot be sufficiently explained by microevolutionary processes alone.
Most scientists who adopt the second viewpoint are not claiming that
macroevolution is incompatible with microevolution. Rather, they see
macroevolution as an autonomous field of study regarding the deep
history of life. For this reason, a full understanding of macroevolution
requires insights that are not limited to microevolution.An example of this argument has been made by Francisco J. Ayala.
"...macroevolutionary processes are
underlain by microevolutionary phenomena and are compatible with
microevolutionary theories, but macroevolutionary studies require the
formulation of autonomous hypotheses and models (which must be tested
using macroevolutionary evidence). In this (epistemologically) very
important sense, macroevolution is decoupled from microevolution:
macroevolution is an autonomous field of evolutionary study."
— Francisco J. Ayala (1983)
Microevolution is characterized by the evolutionary process of
changing heritable characteristics (phenotypes) and changes in allele
frequencies (genotypes) within populations. This involves mechanisms
such as mutation, natural selection, and genetic drift as studied in the field of population genetics. In contrast, macroevolution concerns how species and
and higher taxonomic groups (genera, families, orders, etc) have evolved across geography and vast spans of geological time. For example, whether speciation is sympatric or allopatric; and whether the common mode of macroevolution is better described in terms of phyletic gradualism or punctuated equilibrium. These and other important questions and topics are researched within
various scientific fields, which makes the study of macroevolution
highly interdisciplinary. Examples of these include:
How different species are related to each other is researched in phylogenetics).
The rates of evolutionary change and across time in the fossil record. For example, some groups appear to experience a lot of change while
others remain morphologically stable, which are often referred to as living fossils. However, that term has been criticized for wrongfully implying that such organisms have not evolved at all.
According to Hautmann,
speciation has both micro- and macroevolutionary aspects. Specifically,
speciation also involves the classic process of descent with
modification, i.e. morphological transformation observed across many
generations. This is microevolutionary. In contrast, the species
variation produced by speciation, and the rate at which it successfully
occurs, is macroevolutionary. Stephen J. Gould also saw species as the basic unit of macroevolution.
Speciation is the process in which populations within one species change to an extent at which they become reproductively isolated,
that is, they cannot interbreed anymore. However, this classical
concept has been challenged and more recently, a phylogenetic or
evolutionary species concept has been adopted. Their main criteria for new species is to be diagnosable and monophyletic, that is, they form a clearly defined lineage.
Charles Darwin first discovered that speciation can be extrapolated so that species not only evolve into new species, but also into new genera,
families and other groups of animals. In other words, macroevolution is
reducible to microevolution through selection of traits over long
periods of time. In addition, some scholars have argued that selection at the species level is important as well. The advent of genome sequencing enabled the discovery of gradual
genetic changes both during speciation but also across higher taxa. For
instance, the evolution of humans from ancestral primates or other
mammals can be traced to numerous but individual mutations.
According to the Resource-use hypothesis, the diversification of terrestrial species is closely related to global climatic changes, particularly the Cenozoic
alternation of warming and cooling episodes. Global analysis of
terrestrial mammals supports the view that these physical environmental
changes have shaped macroevolutionary patterns by promoting biome
specialisation. This specialization leads to significantly higher rates
of vicariance and speciation in biome specialist (stenobiomic) lineages
compared to generalist lineages.
Evolution of new organs and tissues
One of the main questions in evolutionary biology is how new structures evolve, such as new organs.
Macroevolution is often thought to require the evolution of structures
that are 'completely new'. However, fundamentally novel structures are
not necessary for dramatic evolutionary change. As can be seen in vertebrate evolution,
most "new" organs are actually not new—they are simply modifications of
previously existing organs. For instance, the evolution of mammal diversity in the past 100 million years has not required any major innovation. All of this diversity can be explained by modification of existing organs, such as the evolution of elephant tusks from incisors. Other examples include wings (modified limbs), feathers (modified reptile scales), lungs (modified swim bladders, e.g. found in fish), or even the heart (a muscularized segment of a vein).
The same concept applies to the evolution of "novel" tissues. Even fundamental tissues such as bone can evolve from combining existing proteins (collagen) with calcium phosphate (specifically, hydroxy-apatite). This probably happened when certain cells that make collagen also accumulated calcium phosphate to get a proto-bone cell.
Examples
Evolutionary faunas
A macroevolutionary benchmark study is Sepkoski'swork on marine animal diversity through the Phanerozoic. His iconic
diagram of the numbers of marine families from the Cambrian to the
Recent illustrates the successive expansion and dwindling of three "evolutionary faunas"
that were characterized by differences in origination rates and
carrying capacities. Long-term ecological changes and major geological
events are postulated to have played crucial roles in shaping these
evolutionary faunas.
Stanley's rule
Macroevolution is driven by differences between species in
origination and extinction rates. Remarkably, these two factors are
generally positively correlated: taxa that have typically high
diversification rates also have high extinction rates. This observation
has been described first by Steven Stanley, who attributed it to a variety of ecological factors. Yet, a positive correlation of origination and extinction rates is also a prediction of the Red Queen hypothesis,
which postulates that evolutionary progress (increase in fitness) of
any given species causes a decrease in fitness of other species,
ultimately driving to extinction those species that do not adapt rapidly
enough. High rates of origination must therefore correlate with high rates of extinction. Stanley's rule, which applies to almost all taxa and geologic ages, is
therefore an indication for a dominant role of biotic interactions in
macroevolution.
The evolution of multicellular organisms is one of the major
breakthroughs in evolution. The first step of converting a unicellular
organism into a metazoan (a multicellular organism) is to allow cells to attach to each other. This can be achieved by one or a few mutations. In fact, many bacteria form multicellular assemblies, e.g. cyanobacteria or myxobacteria. Another species of bacteria, Jeongeupia sacculi, form well-ordered sheets of cells, which ultimately develop into a bulbous structure.Similarly, unicellular yeast cells can become multicellular by a single
mutation in the ACE2 gene, which causes the cells to form a branched
multicellular form.
Evolution of bat wings
The wings of bats have the same structural elements (bones) as any other five-fingered mammal (see periodicity in limb development).
However, the finger bones in bats are dramatically elongated, so the
question is how these bones became so long. It has been shown that
certain growth factors such as bone morphogenetic proteins (specifically Bmp2)
is over expressed so that it stimulates an elongation of certain bones.
Genetic changes in the bat genome identified the changes that lead to
this phenotype and it has been recapitulated in mice: when specific bat
DNA is inserted in the mouse genome, recapitulating these mutations, the
bones of mice grow longer.
Limbloss in lizards can be observed in the genus Lerista which shows many intermediary steps with increasing loss of digits and toes. The species shown here, Lerista cinerea, has no digits and only 1 toe left.
Snakes evolved from lizards. Phylogenetic analysis shows that snakes are actually nested within the phylogenetic tree of lizards, demonstrating that they have a common ancestor. This split happened about 180 million years ago and several intermediary fossils are known to document the origin. In fact, limbs have been lost in numerous clades of reptiles, and there are cases of recent limb loss. For instance, the skink genus Lerista
has lost limbs in multiple cases, with all possible intermediary steps,
that is, there are species which have fully developed limbs, shorter
limbs with 5, 4, 3, 2, 1 or no toes at all.
Human evolution
While human evolution from their primate ancestors did not require
massive morphological changes, our brain has sufficiently changed to
allow human consciousness and intelligence. While the latter involves
relatively minor morphological changes it did result in dramatic changes
to brain function. Thus, macroevolution does not have to be morphological, it can also be functional.
The study of human (brain) evolution benefits from the fact that human and ape genomes are available so that the genomes of our common ancestor can be reconstructed. Even though the precise genetic mechanisms that shaped the human brain
are not known, the mutations involved in human brain evolution are
largely known, given that the genes expressed in the brain are
relatively well understood.
Evolution of viviparity in lizards
The European Common Lizard (Zootoca vivipara)
consists of populations that are egg-laying or live-bearing,
demonstrating that this dramatic difference can even evolve within a
species.
Most lizards are egg-laying and thus need an environment that is warm
enough to incubate their eggs. However, some species have evolved viviparity, that is, they give birth to live young, as almost all mammals
do. In several clades of lizards, egg-laying (oviparous) species have
evolved into live-bearing ones, apparently with very little genetic
change. For instance, a European common lizard, Zootoca vivipara, is viviparous throughout most of its range, but oviparous in the extreme southwest portion. That is, within a single species, a radical change in reproductive
behavior has happened. Similar cases are known from South American
lizards of the genus Liolaemus
which have egg-laying species at lower altitudes, but closely related
viviparous species at higher altitudes, suggesting that the switch from
oviparous to viviparous reproduction does not require many genetic
changes.
Until the turn of the 20th century, the assumption had been that
the three-dimensional geometry of the universe (its description in terms
of locations, shapes, distances, and directions) was distinct from time
(the measurement of when events occur within the universe). However, space and time took on new meanings with the Lorentz transformation and special theory of relativity.
In 1908, Hermann Minkowski
presented a geometric interpretation of special relativity that fused
time and the three spatial dimensions into a single four-dimensional
continuum now known as Minkowski space. This interpretation proved vital to the general theory of relativity, wherein spacetime is curved by mass and energy.
Fundamentals
Definitions
Non-relativistic classical mechanics treats time
as a universal quantity of measurement that is uniform throughout, is
separate from space, and is agreed on by all observers. Classical
mechanics assumes that time has a constant rate of passage, independent
of the observer's state of motion, or anything external. It assumes that space is Euclidean: it assumes that space follows the geometry of common sense.
In the context of special relativity,
time cannot be separated from the three dimensions of space, because
the observed rate at which time passes for an object depends on the
object's velocity relative to the observer.General relativity provides an explanation of how gravitational fields can slow the passage of time for an object as seen by an observer outside the field.
In ordinary space, a position is specified by three numbers, known as dimensions. In the Cartesian coordinate system, these are often called x, y and z. A point in spacetime is called an event,
and requires four numbers to be specified: the three-dimensional
location in space, plus the position in time (Fig. 1). An event is
represented by a set of coordinates x, y, z and t. Spacetime is thus four-dimensional.
Unlike the analogies used in popular writings to explain events,
such as firecrackers or sparks, mathematical events have zero duration
and represent a single point in spacetime. Although it is possible to be in motion relative to the popping of a
firecracker or a spark, it is not possible for an observer to be in
motion relative to an event.
The path of a particle through spacetime can be considered to be a
sequence of events. The series of events can be linked together to form
a curve that represents the particle's progress through spacetime. That
path is called the particle's world line.
Mathematically, spacetime is a manifold,
which is to say, it appears locally "flat" near each point in the same
way that, at small enough scales, the surface of a globe appears to be
flat. A scale factor, (conventionally called the speed-of-light)
relates distances measured in space to distances measured in time. The
magnitude of this scale factor (nearly 300,000 kilometres or 190,000
miles in space being equivalent to one second in time), along with the
fact that spacetime is a manifold, implies that at ordinary,
non-relativistic speeds and at ordinary, human-scale distances, there is
little that humans might observe that is noticeably different from what
they might observe if the world were Euclidean. It was only with the
advent of sensitive scientific measurements in the mid-1800s, such as
the Fizeau experiment and the Michelson–Morley experiment,
that puzzling discrepancies began to be noted between observation
versus predictions based on the implicit assumption of Euclidean space.
Figure 1-1. Each location in spacetime is marked by four numbers defined by a frame of reference:
the position in space, and the time, which can be visualized as the
reading of a clock located at each position in space. The 'observer'
synchronizes the clocks according to their own reference frame.
In special relativity, an observer will, in most cases, mean a frame
of reference from which a set of objects or events is being measured.
This usage differs significantly from the ordinary English meaning of
the term. Reference frames are inherently nonlocal constructs, and
according to this usage of the term, it does not make sense to speak of
an observer as having a location.
In Fig. 1-1, imagine that the frame under consideration is
equipped with a dense lattice of clocks, synchronized within this
reference frame, that extends indefinitely throughout the three
dimensions of space. Any specific location within the lattice is not
important. The latticework of clocks is used to determine the time and
position of events taking place within the whole frame. The term observer refers to the whole ensemble of clocks associated with one inertial frame of reference.
In this idealized case, every point in space has a clock
associated with it, and thus the clocks register each event instantly,
with no time delay between an event and its recording. A real observer
will see a delay between the emission of a signal and its detection due
to the speed of light. To synchronize the clocks, in the data reduction
following an experiment, the time when a signal is received will be
corrected to reflect its actual time were it to have been recorded by an
idealized lattice of clocks.
In many books on special relativity, especially older ones, the
word "observer" is used in the more ordinary sense of the word. It is
usually clear from context which meaning has been adopted.
Physicists distinguish between what one measures or observes,
after one has factored out signal propagation delays, versus what one
visually sees without such corrections. Failing to understand the difference between what one measures and what one sees is the source of much confusion among students of relativity.
Figure
1-2. Michelson and Morley expected that motion through the aether would
cause a differential phase shift between light traversing the two arms
of their apparatus. The most logical explanation of their negative
result, aether dragging, was in conflict with the observation of stellar
aberration.
By the mid-1800s, various experiments such as the observation of the Arago spot and differential measurements of the speed of light in air versus water were considered to have proven the wave nature of light as opposed to a corpuscular theory. Propagation of waves was then assumed to require the existence of a waving medium; in the case of light waves, this was considered to be a hypothetical luminiferous aether. The various attempts to establish the properties of this hypothetical medium yielded contradictory results. For example, the Fizeau experiment of 1851, conducted by French physicist Hippolyte Fizeau,
demonstrated that the speed of light in flowing water was less than the
sum of the speed of light in air plus the speed of the water by an
amount dependent on the water's index of refraction.
Among other issues, the dependence of the partial aether-dragging
implied by this experiment on the index of refraction (which is
dependent on wavelength) led to the unpalatable conclusion that aether simultaneously flows at different speeds for different colors of light. The Michelson–Morley experiment
of 1887 (Fig. 1-2) showed no differential influence of Earth's motions
through the hypothetical aether on the speed of light, and the most
likely explanation, complete aether dragging, was in conflict with the
observation of stellar aberration.
George Francis FitzGerald in 1889, and Hendrik Lorentz
in 1892, independently proposed that material bodies traveling through
the fixed aether were physically affected by their passage, contracting
in the direction of motion by an amount that was exactly what was
necessary to explain the negative results of the Michelson–Morley
experiment. No length changes occur in directions transverse to the
direction of motion.
By 1904, Lorentz had expanded his theory such that he had arrived
at equations formally identical with those that Einstein was to derive
later, i.e. the Lorentz transformation. As a theory of dynamics
(the study of forces and torques and their effect on motion), his
theory assumed actual physical deformations of the physical constituents
of matter. Lorentz's equations predicted a quantity that he called local time, with which he could explain the aberration of light, the Fizeau experiment and other phenomena.
Henri Poincaré was the first to combine space and time into spacetime. He argued in 1898 that the simultaneity of two events is a matter of convention.In 1900, he recognized that Lorentz's "local time" is actually what is indicated by moving clocks by applying an explicitly operational definition of clock synchronization assuming constant light speed. In 1900 and 1904, he suggested the inherent undetectability of the aether by emphasizing the validity of what he called the principle of relativity. In 1905/1906 he mathematically perfected Lorentz's theory of electrons in order to
bring it into accordance with the postulate of relativity.
While discussing various hypotheses on Lorentz invariant
gravitation, he introduced the innovative concept of a 4-dimensional
spacetime by defining various four-vectors, namely four-position, four-velocity, and four-force. He did not pursue the 4-dimensional formalism in subsequent papers,
however, stating that this line of research seemed to "entail great pain
for limited profit", ultimately concluding "that three-dimensional
language seems the best suited to the description of our world". Even as late as 1909, Poincaré continued to describe the dynamical interpretation of the Lorentz transform.
In 1905, Albert Einstein analyzed special relativity in terms of kinematics
(the study of moving bodies without reference to forces) rather than
dynamics. His results were mathematically equivalent to those of Lorentz
and Poincaré. He obtained them by recognizing that the entire theory
can be built upon two postulates: the principle of relativity and the
principle of the constancy of light speed. His work was filled with
vivid imagery involving the exchange of light signals between clocks in
motion, careful measurements of the lengths of moving rods, and other
such examples.
Einstein in 1905 superseded previous attempts of an electromagnetic mass–energy relation by introducing the general equivalence of mass and energy, which was instrumental for his subsequent formulation of the equivalence principle
in 1907, which declares the equivalence of inertial and gravitational
mass. By using the mass–energy equivalence, Einstein showed that the
gravitational mass of a body is proportional to its energy content,
which was one of the early results in developing general relativity. While it would appear that he did not at first think geometrically about spacetime, in the further development of general relativity, Einstein fully incorporated the spacetime formalism.
When Einstein published in 1905, another of his competitors, his former mathematics professor Hermann Minkowski, had also arrived at most of the basic elements of special relativity. Max Born recounted a meeting he had made with Minkowski, seeking to be Minkowski's student/collaborator:
I went to Cologne, met Minkowski
and heard his celebrated lecture 'Space and Time' delivered on 2
September 1908. [...] He told me later that it came to him as a great
shock when Einstein published his paper in which the equivalence of the
different local times of observers moving relative to each other was
pronounced; for he had reached the same conclusions independently but
did not publish them because he wished first to work out the
mathematical structure in all its splendor. He never made a priority
claim and always gave Einstein his full share in the great discovery.
Minkowski had been concerned with the state of electrodynamics after
Michelson's disruptive experiments at least since the summer of 1905,
when Minkowski and David Hilbert
led an advanced seminar attended by notable physicists of the time to
study the papers of Lorentz, Poincaré et al. Minkowski saw Einstein's
work as an extension of Lorentz's, and was most directly influenced by
Poincaré.
Figure 1–4. Hand-colored transparency presented by Minkowski in his 1908 Raum und Zeit lecture
On 5 November 1907 (a little more than a year before his death),
Minkowski introduced his geometric interpretation of spacetime in a
lecture to the Göttingen Mathematical society with the title, The Relativity Principle (Das Relativitätsprinzip). On 21 September 1908, Minkowski presented his talk, Space and Time (Raum und Zeit), to the German Society of Scientists and Physicians. The opening words of Space and Time
include Minkowski's statement that "Henceforth, space for itself, and
time for itself shall completely reduce to a mere shadow, and only some
sort of union of the two shall preserve independence." Space and Time
included the first public presentation of spacetime diagrams
(Fig. 1-4), and included a remarkable demonstration that the concept of
the invariant interval (discussed below),
along with the empirical observation that the speed of light is finite,
allows derivation of the entirety of special relativity.
Einstein, for his part, was initially dismissive of Minkowski's geometric interpretation of special relativity, regarding it as überflüssige Gelehrsamkeit
(superfluous learnedness). However, in order to complete his search for
general relativity that started in 1907, the geometric interpretation
of relativity proved to be vital. In 1916, Einstein fully acknowledged
his indebtedness to Minkowski, whose interpretation greatly facilitated
the transition to general relativity.
Since there are other types of spacetime, such as the curved spacetime
of general relativity, the spacetime of special relativity is today
known as Minkowski spacetime.
Although two viewers may measure the x, y, and z
position of the two points using different coordinate systems, the
distance between the points will be the same for both, assuming that
they are measuring using the same units. The distance is "invariant".
In special relativity, however, the distance between two points
is no longer the same if measured by two different observers, when one
of the observers is moving, because of Lorentz contraction.
The situation is even more complicated if the two points are separated
in time as well as in space. For example, if one observer sees two
events occur at the same place, but at different times, a person moving
with respect to the first observer will see the two events occurring at
different places, because the moving point of view sees itself as
stationary, and the position of the event as receding or approaching.
Thus, a different measure must be used to measure the effective
"distance" between two events.
In four-dimensional spacetime, the analog to distance is the
interval. Although time comes in as a fourth dimension, it is treated
differently than the spatial dimensions. Minkowski space hence differs
in important respects from four-dimensional Euclidean space.
The fundamental reason for merging space and time into spacetime is
that space and time are separately not invariant, which is to say that,
under the proper conditions, different observers will disagree on the
length of time between two events (because of time dilation) or the distance between the two events (because of length contraction). Special relativity provides a new invariant, called the spacetime interval,
which combines distances in space and in time. All observers who
measure the time and distance between any two events will end up
computing the same spacetime interval. Suppose an observer measures two
events as being separated in time by and a spatial distance Then the squared spacetime interval between the two events that are separated by a distance in space and by in the -coordinate is:
or for three space dimensions,
The constant the speed of light, converts time units (like seconds) into space units (like meters). The squared interval
is a measure of separation between events A and B that are time
separated and in addition space separated either because there are two
separate objects undergoing events, or because a single object in space
is moving inertially between its events. The separation interval is the
difference between the square of the spatial distance separating event B
from event A and the square of the spatial distance traveled by a light
signal in that same time interval . If the event separation is due to a light signal, then this difference vanishes and .
When the event considered is infinitesimally close to each other, then we may write
In a different inertial frame, say with coordinates , the spacetime interval
can be written in a same form as above. Because of the constancy of
speed of light, the light events in all inertial frames belong to zero
interval, . For any other infinitesimal event where , one can prove that
which in turn upon integration leads to . The invariance of the spacetime interval between the same events for
all inertial frames of reference is one of the fundamental results of
special theory of relativity.
Although for brevity, one frequently sees interval expressions
expressed without deltas, including in most of the following discussion,
it should be understood that in general, means , etc. We are always concerned with differences
of spatial or temporal coordinate values belonging to two events, and
since there is no preferred origin, single coordinate values have no
essential meaning.
Figure
2–1. Spacetime diagram illustrating two photons, A and B, originating
at the same event, and a slower-than-light-speed object, C
The equation above is similar to the Pythagorean theorem, except with a minus sign between the and the terms. The spacetime interval is the quantity not
itself. The reason is that unlike distances in Euclidean geometry,
intervals in Minkowski spacetime can be negative. Rather than deal with
square roots of negative numbers, physicists customarily regard as a distinct symbol in itself, rather than the square of something.
Note: There are two sign conventions in use in the relativity literature:
and
These sign conventions are associated with the metric signatures(+−−−) and (−+++).
A minor variation is to place the time coordinate last rather than
first. Both conventions are widely used within the field of study.
In the following discussion, we use the first convention.
In general can assume any real number value. If is positive, the spacetime interval is referred to as timelike.
Since spatial distance traversed by any massive object is always less
than distance traveled by the light for the same time interval, positive
intervals are always timelike. If is negative, the spacetime interval is said to be spacelike. Spacetime intervals are equal to zero when
In other words, the spacetime interval between two events on the world
line of something moving at the speed of light is zero. Such an interval
is termed lightlike or null. A photon arriving in our eye
from a distant star will not have aged, despite having (from our
perspective) spent years in its passage.
A spacetime diagram is typically drawn with only a single space
and a single time coordinate. Fig. 2-1 presents a spacetime diagram
illustrating the world lines
(i.e. paths in spacetime) of two photons, A and B, originating from the
same event and going in opposite directions. In addition, C illustrates
the world line of a slower-than-light-speed object. The vertical time
coordinate is scaled by
so that it has the same units (meters) as the horizontal space
coordinate. Since photons travel at the speed of light, their world
lines have a slope of ±1. In other words, every meter that a photon travels to the left or right requires approximately 3.3 nanoseconds of time.
Reference frames
Figure 2-2. Galilean diagram of two frames of reference in standard configurationFigure
2–3. (a) Galilean diagram of two frames of reference in standard
configuration, (b) spacetime diagram of two frames of reference, (c)
spacetime diagram showing the path of a reflected light pulse
To gain insight in how spacetime coordinates measured by observers in different reference frames compare with each other, it is useful to work with a simplified setup with frames in a standard configuration.
With care, this allows simplification of the math with no loss of
generality in the conclusions that are reached. In Fig. 2-2, two Galilean reference frames
(i.e. conventional 3-space frames) are displayed in relative motion.
Frame S belongs to a first observer O, and frame S′ (pronounced
"S prime") belongs to a second observer O′.
The x, y, z axes of frame S are oriented parallel to the respective primed axes of frame S′.
Frame S′ moves in the x-direction of frame S with a constant velocity v as measured in frame S.
The origins of frames S and S′ are coincident when time t = 0 for frame S and t′ = 0 for frame S′.
Fig. 2-3a redraws Fig. 2-2 in a different orientation. Fig. 2-3b illustrates a relativistic
spacetime diagram from the viewpoint of observer O. Since S and S′ are
in standard configuration, their origins coincide at times t = 0 in frame S and t′ = 0 in frame S′. The ct′ axis passes through the events in frame S′ which have x′ = 0. But the points with x′ = 0 are moving in the x-direction of frame S with velocity v, so that they are not coincident with the ct axis at any time other than zero. Therefore, the ct′ axis is tilted with respect to the ct axis by an angle θ given by
The x′ axis is also tilted with respect to the x axis.
To determine the angle of this tilt, we recall that the slope of the
world line of a light pulse is always ±1. Fig. 2-3c presents a spacetime
diagram from the viewpoint of observer O′. Event P represents the
emission of a light pulse at x′ = 0,ct′ = −a. The pulse is reflected from a mirror situated at distance a from the light source (event Q), and returns to the light source at x′ = 0, ct′ = a (event R).
The same events P, Q, R are plotted in Fig. 2-3b in the frame of observer O. The light paths have slopes = 1 and −1, so that △PQR forms a right triangle with PQ and QR both at 45 degrees to the x and ct axes. Since OP = OQ = OR, the angle between x′ and x must also be θ.
While the rest frame has space and time axes that meet at right
angles, the moving frame is drawn with axes that meet at an acute angle.
The frames are actually equivalent. The asymmetry is due to unavoidable distortions in how spacetime coordinates can map onto a Cartesian plane, and should be considered no stranger than the manner in which, on a Mercator projection
of the Earth, the relative sizes of land masses near the poles
(Greenland and Antarctica) are highly exaggerated relative to land
masses near the Equator.
Figure 2–4. The light cone centered on an event divides the rest of spacetime into the future, the past, and "elsewhere"
In Fig. 2–4, event O is at the origin of a spacetime diagram, and the
two diagonal lines represent all events that have zero spacetime
interval with respect to the origin event. These two lines form what is
called the light cone of the event O, since adding a second spatial dimension (Fig. 2-5) makes the appearance that of two right circular cones meeting with their apices at O. One cone extends into the future (t>0), the other into the past (t<0).
Figure 2–5. Light cone in 2D space plus a time dimension
A light (double) cone divides spacetime into separate regions with
respect to its apex. The interior of the future light cone consists of
all events that are separated from the apex by more time (temporal distance) than necessary to cross their spatial distance at lightspeed; these events comprise the timelike future of the event O. Likewise, the timelike past comprises the interior events of the past light cone. So in timelike intervals Δct is greater than Δx, making timelike intervals positive.
The region exterior to the light cone consists of events that are separated from the event O by more space than can be crossed at lightspeed in the given time. These events comprise the so-called spacelike region of the event O, denoted "Elsewhere" in Fig. 2-4. Events on the light cone itself are said to be lightlike (or null separated)
from O. Because of the invariance of the spacetime interval, all
observers will assign the same light cone to any given event, and thus
will agree on this division of spacetime.
The light cone has an essential role within the concept of causality.
It is possible for a not-faster-than-light-speed signal to travel from
the position and time of O to the position and time of D (Fig. 2-4). It
is hence possible for event O to have a causal influence on event D. The
future light cone contains all the events that could be causally
influenced by O. Likewise, it is possible for a
not-faster-than-light-speed signal to travel from the position and time
of A, to the position and time of O. The past light cone contains all
the events that could have a causal influence on O. In contrast,
assuming that signals cannot travel faster than the speed of light, any
event, like e.g. B or C, in the spacelike region (Elsewhere), cannot
either affect event O, nor can they be affected by event O employing
such signalling. Under this assumption any causal relationship between
event O and any events in the spacelike region of a light cone is
excluded.
Figure 2–6. Animation illustrating relativity of simultaneity
All observers will agree that for any given event, an event within the given event's future light cone occurs after the given event. Likewise, for any given event, an event within the given event's past light cone occurs before the given event. The before–after relationship observed for timelike-separated events remains unchanged no matter what the reference frame
of the observer, i.e. no matter how the observer may be moving. The
situation is quite different for spacelike-separated events. Fig. 2-4 was drawn from the reference frame of an observer moving at v = 0. From this reference frame, event C is observed to occur after event O, and event B is observed to occur before event O.
From a different reference frame, the orderings of these
non-causally-related events can be reversed. In particular, one notes
that if two events are simultaneous in a particular reference frame,
they are necessarily separated by a spacelike interval and thus
are noncausally related. The observation that simultaneity is not
absolute, but depends on the observer's reference frame, is termed the relativity of simultaneity.
Fig. 2-6 illustrates the use of spacetime diagrams in the
analysis of the relativity of simultaneity. The events in spacetime are
invariant, but the coordinate frames transform as discussed above for
Fig. 2-3. The three events (A, B, C) are simultaneous from the reference frame of an observer moving at v = 0. From the reference frame of an observer moving at v = 0.3c, the events appear to occur in the order C, B, A. From the reference frame of an observer moving at v = −0.5c, the events appear to occur in the order A, B, C. The white line represents a plane of simultaneity
being moved from the past of the observer to the future of the
observer, highlighting events residing on it. The gray area is the light
cone of the observer, which remains invariant.
A spacelike spacetime interval gives the same distance that an
observer would measure if the events being measured were simultaneous to
the observer. A spacelike spacetime interval hence provides a measure
of proper distance, i.e. the true distance =
Likewise, a timelike spacetime interval gives the same measure of time
as would be presented by the cumulative ticking of a clock that moves
along a given world line. A timelike spacetime interval hence provides a
measure of the proper time =
Invariant hyperbola
Figure 2–7. (a) Families of invariant hyperbolae, (b) Hyperboloids of two sheets and one sheet
In Euclidean space (having spatial dimensions only), the set of
points equidistant (using the Euclidean metric) from some point form a
circle (in two dimensions) or a sphere (in three dimensions). In (1+1)-dimensional
Minkowski spacetime (having one temporal and one spatial dimension),
the points at some constant spacetime interval away from the origin
(using the Minkowski metric) form curves given by the two equations
with some positive real constant. These equations describe two families of hyperbolae in an x–ct spacetime diagram, which are termed invariant hyperbolae.
In Fig. 2-7a, each magenta hyperbola connects all events having
some fixed spacelike separation from the origin, while the green
hyperbolae connect events of equal timelike separation.
The magenta hyperbolae, which cross the x axis, are
timelike curves, which is to say that these hyperbolae represent actual
paths that can be traversed by (constantly accelerating) particles in
spacetime: Between any two events on one hyperbola a causality relation
is possible, because the inverse of the slope—representing the necessary
speed—for all secants is less than . On the other hand, the green hyperbolae, which cross the ct axis, are spacelike curves because all intervals along
these hyperbolae are spacelike intervals: No causality is possible
between any two points on one of these hyperbolae, because all secants
represent speeds larger than .
Fig. 2-7b reflects the situation in (1+2)-dimensional
Minkowski spacetime (one temporal and two spatial dimensions) with the
corresponding hyperboloids. The invariant hyperbolae displaced by
spacelike intervals from the origin generate hyperboloids
of one sheet, while the invariant hyperbolae displaced by timelike
intervals from the origin generate hyperboloids of two sheets.
The (1+2)-dimensional boundary between space- and time-like
hyperboloids, established by the events forming a zero spacetime
interval to the origin, is made up by degenerating the hyperboloids to
the light cone. In (1+1)-dimensions the hyperbolae degenerate to the two
grey 45°-lines depicted in Fig. 2-7a.
Time dilation and length contraction
Figure
2–8. The invariant hyperbola comprises the points that can be reached
from the origin in a fixed proper time by clocks traveling at different
speeds
Fig. 2-8 illustrates the invariant hyperbola for all events that can
be reached from the origin in a proper time of 5 meters (approximately 1.67×10−8 s).
Different world lines represent clocks moving at different speeds. A
clock that is stationary with respect to the observer has a world line
that is vertical, and the elapsed time measured by the observer is the
same as the proper time. For a clock traveling at 0.3 c, the elapsed time measured by the observer is 5.24 meters (1.75×10−8 s), while for a clock traveling at 0.7 c, the elapsed time measured by the observer is 7.00 meters (2.34×10−8 s).
This illustrates the phenomenon known as time dilation.
Clocks that travel faster take longer (in the observer frame) to tick
out the same amount of proper time, and they travel further along the
x–axis within that proper time than they would have without time
dilation. The measurement of time dilation by two observers in different inertial
reference frames is mutual. If observer O measures the clocks of
observer O′ as running slower in his frame, observer O′ in turn will
measure the clocks of observer O as running slower.
Figure
2–9. In this spacetime diagram, the 1 m length of the moving rod, as
measured in the primed frame, is the foreshortened distance OC when
projected onto the unprimed frame.
Length contraction,
like time dilation, is a manifestation of the relativity of
simultaneity. Measurement of length requires measurement of the
spacetime interval between two events that are simultaneous in one's
frame of reference. But events that are simultaneous in one frame of
reference are, in general, not simultaneous in other frames of
reference.
Fig. 2-9 illustrates the motions of a 1 m rod that is traveling at 0.5 c along the x
axis. The edges of the blue band represent the world lines of the rod's
two endpoints. The invariant hyperbola illustrates events separated
from the origin by a spacelike interval of 1 m. The endpoints O and B
measured when t′ = 0
are simultaneous events in the S′ frame. But to an observer in frame S,
events O and B are not simultaneous. To measure length, the observer in
frame S measures the endpoints of the rod as projected onto the x-axis along their world lines. The projection of the rod's world sheet onto the x axis yields the foreshortened length OC.
(not illustrated) Drawing a vertical line through A so that it intersects the x′
axis demonstrates that, even as OB is foreshortened from the point of
view of observer O, OA is likewise foreshortened from the point of view
of observer O′. In the same way that each observer measures the other's
clocks as running slow, each observer measures the other's rulers as
being contracted.
In regards to mutual length contraction, Fig. 2-9 illustrates that the primed and unprimed frames are mutually rotated by a hyperbolic angle (analogous to ordinary angles in Euclidean geometry). Because of this rotation, the projection of a primed meter-stick onto the unprimed x-axis is foreshortened, while the projection of an unprimed meter-stick onto the primed x′-axis is likewise foreshortened.
Mutual time dilation and length contraction tend to strike beginners
as inherently self-contradictory concepts. If an observer in frame S
measures a clock, at rest in frame S', as running slower than his',
while S' is moving at speed v in S, then the principle of
relativity requires that an observer in frame S' likewise measures a
clock in frame S, moving at speed −v in S', as running slower than hers. How two clocks can run both slower than the other, is an important question that "goes to the heart of understanding special relativity."
This apparent contradiction stems from not correctly taking into
account the different settings of the necessary, related measurements.
These settings allow for a consistent explanation of the only apparent
contradiction. It is not about the abstract ticking of two identical
clocks, but about how to measure in one frame the temporal distance of
two ticks of a moving clock. It turns out that in mutually observing the
duration between ticks of clocks, each moving in the respective frame,
different sets of clocks must be involved. In order to measure in frame S
the tick duration of a moving clock W′ (at rest in S′), one uses two additional, synchronized clocks W1 and W2 at rest in two arbitrarily fixed points in S with the spatial distance d.
Two events can be defined by the condition "two clocks are simultaneously at one place", i.e., when W′ passes each W1 and W2. For both events the two readings of the collocated clocks are recorded. The difference of the two readings of W1 and W2 is the temporal distance of the two events in S, and their spatial distance is d.
The difference of the two readings of W′ is the temporal distance of
the two events in S′. In S′ these events are only separated in time,
they happen at the same place in S′. Because of the invariance of the
spacetime interval spanned by these two events, and the nonzero spatial
separation d in S, the temporal distance in S′ must be smaller than the one in S: the smaller temporal distance between the two events, resulting from the readings of the moving clock W′, belongs to the slower running clock W′.
Conversely, for judging in frame S′ the temporal distance of two
events on a moving clock W (at rest in S), one needs two clocks at rest
in S′.
In this comparison the clock W is moving by with velocity −v.
Recording again the four readings for the events, defined by "two
clocks simultaneously at one place", results in the analogous temporal
distances of the two events, now temporally and spatially separated in
S′, and only temporally separated but collocated in S. To keep the
spacetime interval invariant, the temporal distance in S must be smaller
than in S′, because of the spatial separation of the events in S′: now
clock W is observed to run slower.
The necessary recordings for the two judgements, with "one moving
clock" and "two clocks at rest" in respectively S or S′, involves two
different sets, each with three clocks. Since there are different sets
of clocks involved in the measurements, there is no inherent necessity
that the measurements be reciprocally "consistent" such that, if one
observer measures the moving clock to be slow, the other observer
measures the other clock to be fast.
Figure 2-10. Mutual time dilation
Fig. 2-10 illustrates the previous discussion of mutual time dilation with Minkowski diagrams.
The upper picture reflects the measurements as seen from frame S "at
rest" with unprimed, rectangular axes, and frame S′ "moving with v > 0",
coordinatized by primed, oblique axes, slanted to the right; the lower
picture shows frame S′ "at rest" with primed, rectangular coordinates,
and frame S "moving with −v < 0", with unprimed, oblique axes, slanted to the left.
Each line drawn parallel to a spatial axis (x, x′) represents a line of simultaneity. All events on such a line have the same time value (ct, ct′). Likewise, each line drawn parallel to a temporal axis (ct, ct′) represents a line of equal spatial coordinate values (x, x′).
One may designate in both pictures the origin O (= O′)
as the event, where the respective "moving clock" is collocated with
the "first clock at rest" in both comparisons. Obviously, for this event
the readings on both clocks in both comparisons are zero. As a
consequence, the worldlines of the moving clocks are the slanted to the
right ct′-axis (upper pictures, clock W′) and the slanted to the left ct-axes (lower pictures, clock W). The worldlines of W1 and W′1 are the corresponding vertical time axes (ct in the upper pictures, and ct′ in the lower pictures).
In the upper picture the place for W2 is taken to be Ax > 0, and thus the worldline (not shown in the pictures) of this clock intersects the worldline of the moving clock (the ct′-axis) in the event labelled A, where "two clocks are simultaneously at one place". In the lower picture the place for W′2 is taken to be Cx′ < 0, and so in this measurement the moving clock W passes W′2 in the event C.
In the upper picture the ct-coordinate At of the event A (the reading of W2) is labeled B, thus giving the elapsed time between the two events, measured with W1 and W2, as OB. For a comparison, the length of the time interval OA, measured with W′, must be transformed to the scale of the ct-axis. This is done by the invariant hyperbola (see also Fig. 2-8) through A, connecting all events with the same spacetime interval from the origin as A. This yields the event C on the ct-axis, and obviously: OC < OB, the "moving" clock W′ runs slower.
To show the mutual time dilation immediately in the upper picture, the event D may be constructed as the event at x′ = 0 (the location of clock W′ in S′), that is simultaneous to C (OC has equal spacetime interval as OA) in S′. This shows that the time interval OD is longer than OA, showing that the "moving" clock runs slower.
In the lower picture the frame S is moving with velocity −v in the frame S′ at rest. The worldline of clock W is the ct-axis (slanted to the left), the worldline of W′1 is the vertical ct′-axis, and the worldline of W′2 is the vertical through event C, with ct′-coordinate D. The invariant hyperbola through event C scales the time interval OC to OA, which is shorter than OD; also, B is constructed (similar to D in the upper pictures) as simultaneous to A in S, at x = 0. The result OB > OC corresponds again to above.
The word "measure" is important. In classical physics an observer
cannot affect an observed object, but the object's state of motion can affect the observer's observations of the object.
Twin paradox
Many introductions to special relativity illustrate the differences
between Galilean relativity and special relativity by posing a series of
"paradoxes". These paradoxes are, in fact, ill-posed problems,
resulting from our unfamiliarity with velocities comparable to the speed
of light. The remedy is to solve many problems in special relativity
and to become familiar with its so-called counter-intuitive predictions.
The geometrical approach to studying spacetime is considered one of the
best methods for developing a modern intuition.
The twin paradox is a thought experiment
involving identical twins, one of whom makes a journey into space in a
high-speed rocket, returning home to find that the twin who remained on
Earth has aged more. This result appears puzzling because each twin
observes the other twin as moving, and so at first glance, it would
appear that each should find the other to have aged less. The twin
paradox sidesteps the justification for mutual time dilation presented
above by avoiding the requirement for a third clock. Nevertheless, the twin paradox is not a true paradox because it is easily understood within the context of special relativity.
The impression that a paradox exists stems from a
misunderstanding of what special relativity states. Special relativity
does not declare all frames of reference to be equivalent, only inertial
frames. The traveling twin's frame is not inertial during periods when
she is accelerating. Furthermore, the difference between the twins is
observationally detectable: the traveling twin needs to fire her rockets
to be able to return home, while the stay-at-home twin does not.
Figure 2–11. Spacetime explanation of the twin paradox
These distinctions should result in a difference in the twins' ages.
The spacetime diagram of Fig. 2-11 presents the simple case of a twin
going straight out along the x axis and immediately turning back. From
the standpoint of the stay-at-home twin, there is nothing puzzling about
the twin paradox at all. The proper time measured along the traveling
twin's world line from O to C, plus the proper time measured from C to
B, is less than the stay-at-home twin's proper time measured from O to A
to B. More complex trajectories require integrating the proper time
between the respective events along the curve (i.e. the path integral) to calculate the total amount of proper time experienced by the traveling twin.
Complications arise if the twin paradox is analyzed from the traveling twin's point of view.
Weiss's nomenclature, designating the stay-at-home twin as Terence and the traveling twin as Stella, is hereafter used.
Stella is not in an inertial frame. Given this fact, it is
sometimes incorrectly stated that full resolution of the twin paradox
requires general relativity:
A pure SR analysis would be as
follows: Analyzed in Stella's rest frame, she is motionless for the
entire trip. When she fires her rockets for the turnaround, she
experiences a pseudo force which resembles a gravitational force. Figs. 2-6 and 2-11 illustrate the concept of lines (planes) of simultaneity: Lines parallel to the observer's x-axis (xy-plane)
represent sets of events that are simultaneous in the observer frame.
In Fig. 2-11, the blue lines connect events on Terence's world line
which, from Stella's point of view, are simultaneous with events
on her world line. (Terence, in turn, would observe a set of horizontal
lines of simultaneity.) Throughout both the outbound and the inbound
legs of Stella's journey, she measures Terence's clocks as running
slower than her own. But during the turnaround (i.e. between the
bold blue lines in the figure), a shift takes place in the angle of her
lines of simultaneity, corresponding to a rapid skip-over of the events
in Terence's world line that Stella considers to be simultaneous with
her own. Therefore, at the end of her trip, Stella finds that Terence
has aged more than she has.
Although general relativity is not required to analyze the twin paradox, application of the Equivalence Principle
of general relativity does provide some additional insight into the
subject. Stella is not stationary in an inertial frame. Analyzed in
Stella's rest frame, she is motionless for the entire trip. When she is
coasting her rest frame is inertial, and Terence's clock will appear to
run slow. But when she fires her rockets for the turnaround, her rest
frame is an accelerated frame and she experiences a force which is
pushing her as if she were in a gravitational field. Terence will appear
to be high up in that field and because of gravitational time dilation,
his clock will appear to run fast, so much so that the net result will
be that Terence has aged more than Stella when they are back together. The theoretical arguments predicting gravitational time dilation are
not exclusive to general relativity. Any theory of gravity will predict
gravitational time dilation if it respects the principle of equivalence,
including Newton's theory.
Gravitation
This introductory section has focused on the spacetime of special
relativity, since it is the easiest to describe. Minkowski spacetime is
flat, takes no account of gravity, is uniform throughout, and serves as
nothing more than a static background for the events that take place in
it. The presence of gravity greatly complicates the description of
spacetime. In general relativity, spacetime is no longer a static
background, but actively interacts with the physical systems that it
contains. Spacetime curves in the presence of matter, can propagate
waves, bends light, and exhibits a host of other phenomena. A few of these phenomena are described in the later sections of this article.
A basic goal is to be able to compare measurements made by observers
in relative motion. If there is an observer O in frame S who has
measured the time and space coordinates of an event, assigning this
event three Cartesian coordinates and the time as measured on his
lattice of synchronized clocks (x, y, z, t) (see Fig. 1-1).
A second observer O′ in a different frame S′ measures the same event in
her coordinate system and her lattice of synchronized clocks (x′, y′, z′, t′). With inertial frames, neither observer is under acceleration, and a simple set of equations allows us to relate coordinates (x, y, z, t) to (x′, y′, z′, t′). Given that the two coordinate systems are in standard configuration, meaning that they are aligned with parallel (x, y, z) coordinates and that t = 0 when t′ = 0, the coordinate transformation is as follows:
Figure 3–1. Galilean spacetime and composition of velocities
Fig. 3-1 illustrates that in Newton's theory, time is universal, not the velocity of light.
Consider the following thought experiment: The red arrow illustrates a
train that is moving at 0.4 c with respect to the platform. Within the
train, a passenger shoots a bullet with a speed of 0.4 c in the frame of
the train. The blue arrow illustrates that a person standing on the
train tracks measures the bullet as traveling at 0.8 c. This is in
accordance with our naive expectations.
More generally, assuming that frame S′ is moving at velocity v with respect to frame S, then within frame S′, observer O′ measures an object moving with velocity u′. Velocity u with respect to frame S, since x = ut, x′ = x − vt, and t = t′, can be written as x′ = ut − vt = (u − v)t = (u − v)t′. This leads to u′ = x′/t′ and ultimately
or
which is the common-sense Galilean law for the addition of velocities.
Figure 3–2. Relativistic composition of velocities
The composition of velocities is quite different in relativistic
spacetime. To reduce the complexity of the equations slightly, we
introduce a common shorthand for the ratio of the speed of an object
relative to light,
Fig. 3-2a illustrates a red train that is moving forward at a speed given by v/c = β = s/a. From the primed frame of the train, a passenger shoots a bullet with a speed given by u′/c = β′ = n/m, where the distance is measured along a line parallel to the red x′ axis rather than parallel to the black x axis. What is the composite velocity u of the bullet relative to the platform, as represented by the blue arrow? Referring to Fig. 3-2b:
From the platform, the composite speed of the bullet is given by u = c(s + r)/(a + b).
The two yellow triangles are similar because they are right triangles that share a common angle α. In the large yellow triangle, the ratio s/a = v/c = β.
The ratios of corresponding sides of the two yellow triangles are constant, so that r/a = b/s = n/m = β′. So b = u′s/c and r = u′a/c.
Substitute the expressions for b and r into the expression for u in step 1 to yield Einstein's formula for the addition of velocities:
The relativistic formula for addition of velocities presented above exhibits several important features:
If u′ and v are both very small compared with the speed of light, then the product vu′/c2
becomes vanishingly small, and the overall result becomes
indistinguishable from the Galilean formula (Newton's formula) for the
addition of velocities: u = u′ + v. The Galilean formula is a special case of the relativistic formula applicable to low velocities.
If u′ is set equal to c, then the formula yields u = c regardless of the starting value of v. The velocity of light is the same for all observers regardless their motions relative to the emitting source.
Figure 3-3. Spacetime diagrams illustrating time dilation and length contraction
It is straightforward to obtain quantitative expressions for time
dilation and length contraction. Fig. 3-3 is a composite image
containing individual frames taken from two previous animations,
simplified and relabeled for the purposes of this section.
To reduce the complexity of the equations slightly, there are a variety of different shorthand notations for ct:
and are common.
One also sees very frequently the use of the convention
Figure 3–4. Lorentz factor as a function of velocity
In Fig. 3-3a, segments OA and OK represent equal spacetime intervals. Time dilation is represented by the ratio OB/OK. The invariant hyperbola has the equation w = √x2 + k2 where k = OK, and the red line representing the world line of a particle in motion has the equation w = x/β = xc/v. A bit of algebraic manipulation yields
The expression involving the square root symbol appears very
frequently in relativity, and one over the expression is called the
Lorentz factor, denoted by the Greek letter gamma :
If v is greater than or equal to c, the expression for becomes physically meaningless, implying that c is the maximum possible speed in nature. For any v
greater than zero, the Lorentz factor will be greater than one,
although the shape of the curve is such that for low speeds, the Lorentz
factor is extremely close to one.
In Fig. 3-3b, segments OA and OK represent equal spacetime intervals. Length contraction is represented by the ratio OB/OK. The invariant hyperbola has the equation x = √w2 + k2, where k = OK, and the edges of the blue band representing the world lines of the endpoints of a rod in motion have slope 1/β = c/v. Event A has coordinates
(x, w) = (γk, γβk). Since the tangent line through A and B has the equation w = (x − OB)/β, we have γβk = (γk − OB)/β and
The Galilean transformations and their consequent commonsense law of
addition of velocities work well in our ordinary low-speed world of
planes, cars and balls. Beginning in the mid-1800s, however, sensitive
scientific instrumentation began finding anomalies that did not fit well
with the ordinary addition of velocities.
Lorentz transformations are used to transform the coordinates of an event from one frame to another in special relativity.
The Lorentz factor appears in the Lorentz transformations:
The inverse Lorentz transformations are:
When v ≪ c and x is small enough, the v2/c2 and vx/c2 terms approach zero, and the Lorentz transformations approximate to the Galilean transformations.
etc., most often really mean etc. Although for brevity the Lorentz transformation equations are written without deltas, x means Δx, etc. We are, in general, always concerned with the space and time differences between events.
Calling one set of transformations the normal Lorentz
transformations and the other the inverse transformations is misleading,
since there is no intrinsic difference between the frames. Different
authors call one or the other set of transformations the "inverse" set.
The forwards and inverse transformations are trivially related to each
other, since the S frame can only be moving forwards or reverse with respect to S′. So inverting the equations simply entails switching the primed and unprimed variables and replacing v with −v.
Example: Terence and Stella are at an Earth-to-Mars
space race. Terence is an official at the starting line, while Stella
is a participant. At time t = t′ = 0, Stella's spaceship accelerates instantaneously to a speed of 0.5 c. The distance from Earth to Mars is 300 light-seconds (about 90.0×106 km). Terence observes Stella crossing the finish-line clock at t = 600.00 s. But Stella observes the time on her ship chronometer to be
as she passes the finish line, and she calculates the distance between
the starting and finish lines, as measured in her frame, to be 259.81
light-seconds (about 77.9×106 km).
1).
There have been many dozens of derivations of the Lorentz transformations
since Einstein's original work in 1905, each with its particular focus.
Although Einstein's derivation was based on the invariance of the speed
of light, there are other physical principles that may serve as
starting points. Ultimately, these alternative starting points can be
considered different expressions of the underlying principle of locality, which states that the influence that one particle exerts on another can not be transmitted instantaneously.
The derivation given here and illustrated in Fig. 3-5 is based on one presented by Bais and makes use of previous results from the Relativistic Composition of
Velocities, Time Dilation, and Length Contraction sections. Event P has
coordinates (w, x) in the black "rest system" and coordinates (w′, x′) in the red frame that is moving with velocity parameter β = v/c. To determine w′ and x′ in terms of w and x (or the other way around) it is easier at first to derive the inverse Lorentz transformation.
There can be no such thing as length expansion/contraction in the transverse directions. y' must equal y and z′ must equal z,
otherwise whether a fast moving 1 m ball could fit through a 1 m
circular hole would depend on the observer. The first postulate of
relativity states that all inertial frames are equivalent, and
transverse expansion/contraction would violate this law.
From the drawing, w = a + b and x = r + s
From previous results using similar triangles, we know that s/a = b/r = v/c = β.
Because of time dilation, a = γw′
Substituting equation (4) into s/a = β yields s = γw′β.
Length contraction and similar triangles give us r = γx′ and b = βr = βγx′
Substituting the expressions for s, a, r and b into the equations in Step 2 immediately yield
The above equations are alternate expressions for the t and x
equations of the inverse Lorentz transformation, as can be seen by
substituting ct for w, ct′ for w′, and v/c for β. From the inverse transformation, the equations of the forwards transformation can be derived by solving for t′ and x′.
Linearity of the Lorentz transformations
The Lorentz transformations have a mathematical property called linearity, since x′ and t′ are obtained as linear combinations of x and t,
with no higher powers involved. The linearity of the transformation
reflects a fundamental property of spacetime that was tacitly assumed in
the derivation, namely, that the properties of inertial frames of
reference are independent of location and time. In the absence of
gravity, spacetime looks the same everywhere. All inertial observers will agree on what constitutes accelerating and non-accelerating motion. Any one observer can use her own measurements of space and time, but
there is nothing absolute about them. Another observer's conventions
will do just as well.
A result of linearity is that if two Lorentz transformations are
applied sequentially, the result is also a Lorentz transformation.
Example: Terence observes Stella speeding away from him at 0.500 c, and he can use the Lorentz transformations with β = 0.500 to relate Stella's measurements to his own. Stella, in her frame, observes Ursula traveling away from her at 0.250 c, and she can use the Lorentz transformations with β = 0.250
to relate Ursula's measurements with her own. Because of the linearity
of the transformations and the relativistic composition of velocities,
Terence can use the Lorentz transformations with β = 0.666 to relate Ursula's measurements with his own.
The Doppler effect
is the change in frequency or wavelength of a wave for a receiver and
source in relative motion. For simplicity, we consider here two basic
scenarios: (1) The motions of the source and/or receiver are exactly
along the line connecting them (longitudinal Doppler effect), and (2)
the motions are at right angles to the said line (transverse Doppler effect). We are ignoring scenarios where they move along intermediate angles.
Longitudinal Doppler effect
The classical Doppler analysis deals with waves that are propagating
in a medium, such as sound waves or water ripples, and which are
transmitted between sources and receivers that are moving towards or
away from each other. The analysis of such waves depends on whether the
source, the receiver, or both are moving relative to the medium. Given
the scenario where the receiver is stationary with respect to the
medium, and the source is moving directly away from the receiver at a
speed of vs for a velocity parameter of βs, the wavelength is increased, and the observed frequency f is given by
On the other hand, given the scenario where source is stationary, and
the receiver is moving directly away from the source at a speed of vr for a velocity parameter of βr, the wavelength is not changed, but the transmission velocity of the waves relative to the receiver is decreased, and the observed frequency f is given by
Figure 3–6. Spacetime diagram of relativistic Doppler effect
Light, unlike sound or water ripples, does not propagate through a
medium, and there is no distinction between a source moving away from
the receiver or a receiver moving away from the source. Fig. 3-6
illustrates a relativistic spacetime diagram showing a source separating
from the receiver with a velocity parameter so that the separation between source and receiver at time is . Because of time dilation, Since the slope of the green light ray is −1, Hence, the relativistic Doppler effect is given by
Transverse Doppler effect
Figure 3–7. Transverse Doppler effect scenarios
Suppose that a source and a receiver, both approaching each other in
uniform inertial motion along non-intersecting lines, are at their
closest approach to each other. It would appear that the classical
analysis predicts that the receiver detects no Doppler shift. Due to
subtleties in the analysis, that expectation is not necessarily true.
Nevertheless, when appropriately defined, transverse Doppler shift is a
relativistic effect that has no classical analog. The subtleties are
these:
Fig. 3-7a. What is the frequency measurement when the receiver
is geometrically at its closest approach to the source? This scenario is
most easily analyzed from the frame S′ of the source.
Fig. 3-7b. What is the frequency measurement when the receiver sees the source as being closest to it? This scenario is most easily analyzed from the frame S of the receiver.
Two other scenarios are commonly examined in discussions of transverse Doppler shift:
Fig. 3-7c. If the receiver is moving in a circle around the source, what frequency does the receiver measure?
Fig. 3-7d. If the source is moving in a circle around the receiver, what frequency does the receiver measure?
In scenario (a), the point of closest approach is frame-independent
and represents the moment where there is no change in distance versus
time (i.e. dr/dt = 0 where r is the distance between receiver and
source) and hence no longitudinal Doppler shift. The source observes
the receiver as being illuminated by light of frequency f′, but also observes the receiver as having a time-dilated clock. In frame S, the receiver is therefore illuminated by blueshifted light of frequency
In scenario (b) the illustration shows the receiver being illuminated
by light from when the source was closest to the receiver, even though
the source has moved on. Because the source's clocks are time dilated as
measured in frame S, and since dr/dt was equal to zero at this point,
the light from the source, emitted from this closest point, is redshifted with frequency
Scenarios (c) and (d) can be analyzed by simple time dilation
arguments. In (c), the receiver observes light from the source as being
blueshifted by a factor of ,
and in (d), the light is redshifted. The only seeming complication is
that the orbiting objects are in accelerated motion. However, if an
inertial observer looks at an accelerating clock, only the clock's
instantaneous speed is important when computing time dilation. (The
converse, however, is not true.)
Most reports of transverse Doppler shift refer to the effect as a
redshift and analyze the effect in terms of scenarios (b) or (d).
Figure
3–8. Relativistic spacetime momentum vector. The coordinate axes of the
rest frame are: momentum, p, and mass * c. For comparison, we have
overlaid a spacetime coordinate system with axes: position, and time *
c.
In classical mechanics, the state of motion of a particle is characterized by its mass and its velocity. Linear momentum, the product of a particle's mass and velocity, is a vector quantity, possessing the same direction as the velocity: p = mv. It is a conserved quantity, meaning that if a closed system is not affected by external forces, its total linear momentum cannot change.
In relativistic mechanics, the momentum vector is extended to
four dimensions. Added to the momentum vector is a time component that
allows the spacetime momentum vector to transform like the spacetime
position vector .
In exploring the properties of the spacetime momentum, we start, in
Fig. 3-8a, by examining what a particle looks like at rest. In the rest
frame, the spatial component of the momentum is zero, i.e. p = 0, but the time component equals mc.
We can obtain the transformed components of this vector in the
moving frame by using the Lorentz transformations, or we can read it
directly from the figure because we know that and ,
since the red axes are rescaled by gamma. Fig. 3-8b illustrates the
situation as it appears in the moving frame. It is apparent that the
space and time components of the four-momentum go to infinity as the
velocity of the moving frame approaches c.
We will use this information shortly to obtain an expression for the four-momentum.
Momentum of light
Figure 3–9. Energy and momentum of light in different inertial frames
Light particles, or photons, travel at the speed of c, the constant that is conventionally known as the speed of light.
This statement is not a tautology, since many modern formulations of
relativity do not start with constant speed of light as a postulate.
Photons therefore propagate along a lightlike world line and, in
appropriate units, have equal space and time components for every
observer.
A consequence of Maxwell's theory of electromagnetism is that light carries energy and momentum, and that their ratio is a constant: . Rearranging, , and since for photons, the space and time components are equal, E/c must therefore be equated with the time component of the spacetime momentum vector.
Photons travel at the speed of light, yet have finite momentum and energy. For this to be so, the mass term in γmc must be zero, meaning that photons are massless particles. Infinity times zero is an ill-defined quantity, but E/c is well-defined.
By this analysis, if the energy of a photon equals E in the rest frame, it equals
in a moving frame. This result can be derived by inspection of Fig. 3-9
or by application of the Lorentz transformations, and is consistent
with the analysis of Doppler effect given previously.
Mass–energy relationship
Consideration of the interrelationships between the various
components of the relativistic momentum vector led Einstein to several
important conclusions.
In the low speed limit as β = v/c approaches zero, γ approaches 1, so the spatial component of the relativistic momentum approaches mv, the classical term for momentum. Following this perspective, γm can be interpreted as a relativistic generalization of m. Einstein proposed that the relativistic mass of an object increases with velocity according to the formula .
Likewise, comparing the time component of the relativistic momentum with that of the photon, , so that Einstein arrived at the relationship . Simplified to the case of zero velocity, this is Einstein's equation relating energy and mass.
Another way of looking at the relationship between mass and energy is to consider a series expansion of γmc2 at low velocity:
The second term is just an expression for the kinetic energy of the particle. Mass indeed appears to be another form of energy.
The concept of relativistic mass that Einstein introduced in 1905, mrel,
although amply validated every day in particle accelerators around the
globe (or indeed in any instrumentation whose use depends on high
velocity particles, such as electron microscopes, old-fashioned color television sets, etc.), has nevertheless not proven to be a fruitful
concept in physics in the sense that it is not a concept that has
served as a basis for other theoretical development. Relativistic mass,
for instance, plays no role in general relativity.
For this reason, as well as for pedagogical concerns, most
physicists currently prefer a different terminology when referring to
the relationship between mass and energy. "Relativistic mass" is a deprecated term. The term "mass" by itself refers to the rest mass or invariant mass, and is equal to the invariant length of the relativistic momentum vector. Expressed as a formula,
This formula applies to all particles, massless as well as massive. For photons where mrest equals zero, it yields, .
Four-momentum
Because of the close relationship between mass and energy, the
four-momentum (also called 4-momentum) is also called the
energy–momentum 4-vector. Using an uppercase P to represent the four-momentum and a lowercase p to denote the spatial momentum, the four-momentum may be written as
In physics, conservation laws state that certain particular
measurable properties of an isolated physical system do not change as
the system evolves over time. In 1915, Emmy Noether discovered that underlying each conservation law is a fundamental symmetry of nature. The fact that physical processes do not care where in space they take place (space translation symmetry) yields conservation of momentum, the fact that such processes do not care when they take place (time translation symmetry) yields conservation of energy,
and so on. In this section, we examine the Newtonian views of
conservation of mass, momentum and energy from a relativistic
perspective.
Total momentum
Figure 3–10. Relativistic conservation of momentum
To understand how the Newtonian view of conservation of momentum
needs to be modified in a relativistic context, we examine the problem
of two colliding bodies limited to a single dimension.
In Newtonian mechanics, two extreme cases of this problem may be distinguished yielding mathematics of minimum complexity:
The two bodies rebound from each other in a completely elastic collision.
The two bodies stick together and continue moving as a single
particle. This second case is the case of completely inelastic
collision.
For both cases (1) and (2), momentum, mass, and total energy are
conserved. However, kinetic energy is not conserved in cases of
inelastic collision. A certain fraction of the initial kinetic energy is
converted to heat.
In case (2), two masses with momentums
and collide to produce a single particle of conserved mass traveling at the center of mass velocity of the original system, . The total momentum is conserved.
Fig. 3-10 illustrates the inelastic collision of two particles from a relativistic perspective. The time components and add up to total E/c of the resultant vector, meaning that energy is conserved. Likewise, the space components and add up to form p
of the resultant vector. The four-momentum is, as expected, a conserved
quantity. However, the invariant mass of the fused particle, given by
the point where the invariant hyperbola of the total momentum intersects
the energy axis, is not equal to the sum of the invariant masses of the
individual particles that collided. Indeed, it is larger than the sum
of the individual masses: .
Looking at the events of this scenario in reverse sequence, we
see that non-conservation of mass is a common occurrence: when an
unstable elementary particle
spontaneously decays into two lighter particles, total energy is
conserved, but the mass is not. Part of the mass is converted into
kinetic energy.
Choice of reference frames
Figure 3-11. (above) Lab Frame. (below) Center of Momentum Frame.
The freedom to choose any frame in which to perform an analysis
allows us to pick one which may be particularly convenient. For analysis
of momentum and energy problems, the most convenient frame is usually
the "center-of-momentum frame"
(also called the zero-momentum frame, or COM frame). This is the frame
in which the space component of the system's total momentum is zero.
Fig. 3-11 illustrates the breakup of a high speed particle into two
daughter particles. In the lab frame, the daughter particles are
preferentially emitted in a direction oriented along the original
particle's trajectory. In the COM frame, however, the two daughter
particles are emitted in opposite directions, although their masses and
the magnitude of their velocities are generally not the same.
Energy and momentum conservation
In a Newtonian analysis of interacting particles, transformation
between frames is simple because all that is necessary is to apply the
Galilean transformation to all velocities. Since , the momentum .
If the total momentum of an interacting system of particles is observed
to be conserved in one frame, it will likewise be observed to be
conserved in any other frame.
Conservation of momentum in the COM frame amounts to the requirement that p = 0 both before and after collision. In the Newtonian analysis, conservation of mass dictates that .
In the simplified, one-dimensional scenarios that we have been
considering, only one additional constraint is necessary before the
outgoing momenta of the particles can be determined—an energy condition.
In the one-dimensional case of a completely elastic collision with no
loss of kinetic energy, the outgoing velocities of the rebounding
particles in the COM frame will be precisely equal and opposite to their
incoming velocities. In the case of a completely inelastic collision
with total loss of kinetic energy, the outgoing velocities of the
rebounding particles will be zero.
Newtonian momenta, calculated as , fail to behave properly under Lorentzian transformation. The linear transformation of velocities is replaced by the highly nonlinear
so that a calculation demonstrating conservation of momentum in one
frame will be invalid in other frames. Einstein was faced with either
having to give up conservation of momentum, or to change the definition
of momentum. This second option was what he chose.
Figure 3-12a. Energy–momentum diagram for decay of a charged pion.
Figure 3-12b. Graphing calculator analysis of charged pion decay.
The relativistic conservation law for energy and momentum replaces
the three classical conservation laws for energy, momentum and mass.
Mass is no longer conserved independently, because it has been subsumed
into the total relativistic energy. This makes the relativistic
conservation of energy a simpler concept than in nonrelativistic
mechanics, because the total energy is conserved without any
qualifications. Kinetic energy converted into heat or internal potential
energy shows up as an increase in mass.
Example: Because of the equivalence of mass and energy, elementary particle masses are customarily stated in energy units, where 1 MeV = 106
electron volts. A charged pion is a particle of mass 139.57 MeV
(approx. 273 times the electron mass). It is unstable, and decays into a
muon of mass 105.66 MeV (approx. 207 times the electron mass) and an
antineutrino, which has an almost negligible mass. The difference
between the pion mass and the muon mass is 33.91 MeV.
Fig. 3-12a illustrates the energy–momentum diagram for this decay
reaction in the rest frame of the pion. Because of its negligible mass, a
neutrino travels at very nearly the speed of light. The relativistic
expression for its energy, like that of the photon, is
which is also the value of the space component of its momentum. To
conserve momentum, the muon has the same value of the space component of
the neutrino's momentum, but in the opposite direction.
Algebraic analyses of the energetics of this decay reaction are available online, so Fig. 3-12b presents instead a graphing calculator solution. The
energy of the neutrino is 29.79 MeV, and the energy of the muon is 33.91 MeV − 29.79 MeV = 4.12 MeV. Most of the energy is carried off by the near-zero-mass neutrino.
Introduction to curved spacetime
Newton's theories assumed that motion takes place against the backdrop of a rigid Euclidean reference frame
that extends throughout all space and all time. Gravity is mediated by a
mysterious force, acting instantaneously across a distance, whose
actions are independent of the intervening space.[note 12]
In contrast, Einstein denied that there is any background Euclidean
reference frame that extends throughout space. Nor is there any such
thing as a force of gravitation, only the structure of spacetime itself.
Figure 5–1. Tidal effects.
In spacetime terms, the path of a satellite orbiting the Earth is not
dictated by the distant influences of the Earth, Moon and Sun. Instead,
the satellite moves through space only in response to local conditions.
Since spacetime is everywhere locally flat when considered on a
sufficiently small scale, the satellite is always following a straight
line in its local inertial frame. We say that the satellite always
follows along the path of a geodesic. No evidence of gravitation can be discovered following alongside the motions of a single particle.
In any analysis of spacetime, evidence of gravitation requires that one observe the relative accelerations of two
bodies or two separated particles. In Fig. 5-1, two separated
particles, free-falling in the gravitational field of the Earth, exhibit
tidal accelerations due to local inhomogeneities in the gravitational
field such that each particle follows a different path through
spacetime. The tidal accelerations that these particles exhibit with
respect to each other do not require forces for their explanation.
Rather, Einstein described them in terms of the geometry of spacetime,
i.e. the curvature of spacetime. These tidal accelerations are strictly
local. It is the cumulative total effect of many local manifestations of
curvature that result in the appearance of a gravitational force acting at a long range from Earth.
Different observers viewing the scenarios presented in
this figure interpret the scenarios differently depending on their
knowledge of the situation. (i) A first observer, at the center of mass
of particles 2 and 3 but unaware of the large mass 1, concludes that a
force of repulsion exists between the particles in scenario A while a
force of attraction exists between the particles in scenario B. (ii) A
second observer, aware of the large mass 1, smiles at the first
reporter's naiveté. This second observer knows that in reality, the
apparent forces between particles 2 and 3 really represent tidal effects
resulting from their differential attraction by mass 1. (iii) A third
observer, trained in general relativity, knows that there are, in fact,
no forces at all acting between the three objects. Rather, all three
objects move along geodesics in spacetime.
Two central propositions underlie general relativity.
The first crucial concept is coordinate independence: The laws
of physics cannot depend on what coordinate system one uses. This is a
major extension of the principle of relativity
from the version used in special relativity, which states that the laws
of physics must be the same for every observer moving in
non-accelerated (inertial) reference frames. In general relativity, to
use Einstein's own (translated) words, "the laws of physics must be of
such a nature that they apply to systems of reference in any kind of
motion."
This leads to an immediate issue: In accelerated frames, one feels
forces that seemingly would enable one to assess one's state of
acceleration in an absolute sense. Einstein resolved this problem
through the principle of equivalence.
Figure 5–2. Equivalence principle
The equivalence principle states that in any sufficiently small region of space, the effects of gravitation are the same as those from acceleration. In Fig. 5-2, person A is in a spaceship, far from any massive objects, that undergoes a uniform acceleration of g.
Person B is in a box resting on Earth. Provided that the spaceship is
sufficiently small so that tidal effects are non-measurable (given the
sensitivity of current gravity measurement instrumentation, A and B
presumably should be Lilliputians), there are no experiments that A and B can perform which will enable them to tell which setting they are in. An alternative expression of the equivalence principle is to note that in Newton's universal law of gravitation, F = GMmg/r2 =mgg and in Newton's second law, F = mia, there is no a priori reason why the gravitational massmg should be equal to the inertial massmi. The equivalence principle states that these two masses are identical.
To go from the elementary description above of curved spacetime to a
complete description of gravitation requires tensor calculus and differential geometry, topics both requiring considerable study. Without these mathematical tools, it is possible to write about general relativity, but it is not possible to demonstrate any non-trivial derivations.
Technical topics
Is spacetime really curved?
In Poincaré's conventionalist
views, the essential criteria according to which one should select a
Euclidean versus non-Euclidean geometry would be economy and simplicity.
A realist would say that Einstein discovered spacetime to be
non-Euclidean. A conventionalist would say that Einstein merely found it
more convenient to use non-Euclidean geometry. The
conventionalist would maintain that Einstein's analysis said nothing
about what the geometry of spacetime really is.
Such being said,
Is it possible to represent general relativity in terms of flat spacetime?
Are there any situations where a flat spacetime interpretation of general relativity may be more convenient than the usual curved spacetime interpretation?
In response to the first question, a number of authors including
Deser, Grishchuk, Rosen, Weinberg, etc. have provided various
formulations of gravitation as a field in a flat manifold. Those
theories are variously called "bimetric gravity", the "field-theoretical approach to general relativity", and so forth. Kip Thorne has provided a popular review of these theories.
The flat spacetime paradigm posits that matter creates a
gravitational field that causes rulers to shrink when they are turned
from circumferential orientation to radial, and that causes the ticking
rates of clocks to dilate. The flat spacetime paradigm is fully
equivalent to the curved spacetime paradigm in that they both represent
the same physical phenomena. However, their mathematical formulations
are entirely different. Working physicists routinely switch between
using curved and flat spacetime techniques depending on the requirements
of the problem. The flat spacetime paradigm is convenient when
performing approximate calculations in weak fields. Hence, flat
spacetime techniques tend be used when solving gravitational wave
problems, while curved spacetime techniques tend be used in the analysis
of black holes.
The spacetime symmetry group for Special Relativity is the Poincaré group,
which is a ten-dimensional group of three Lorentz boosts, three
rotations, and four spacetime translations. It is logical to ask what
symmetries if any might apply in General Relativity.
A tractable case might be to consider the symmetries of spacetime as
seen by observers located far away from all sources of the gravitational
field. The naive expectation for asymptotically flat spacetime
symmetries might be simply to extend and reproduce the symmetries of
flat spacetime of special relativity, viz., the Poincaré group.
In 1962 Hermann Bondi, M. G. van der Burg, A. W. Metzner and Rainer K. Sachs addressed this asymptotic symmetry problem in order to investigate the flow of energy at infinity due to propagating gravitational waves.
Their first step was to decide on some physically sensible boundary
conditions to place on the gravitational field at lightlike infinity to
characterize what it means to say a metric is asymptotically flat,
making no a priori assumptions about the nature of the asymptotic
symmetry group—not even the assumption that such a group exists. Then
after designing what they considered to be the most sensible boundary
conditions, they investigated the nature of the resulting asymptotic
symmetry transformations that leave invariant the form of the boundary
conditions appropriate for asymptotically flat gravitational fields.
What they found was that the asymptotic symmetry transformations
actually do form a group and the structure of this group does not depend
on the particular gravitational field that happens to be present. This
means that, as expected, one can separate the kinematics of spacetime
from the dynamics of the gravitational field at least at spatial
infinity. The puzzling surprise in 1962 was their discovery of a rich
infinite-dimensional group (the so-called BMS group) as the asymptotic
symmetry group, instead of the finite-dimensional Poincaré group, which
is a subgroup of the BMS group. Not only are the Lorentz transformations
asymptotic symmetry transformations, there are also additional
transformations that are not Lorentz transformations but are asymptotic
symmetry transformations. In fact, they found an additional infinity of
transformation generators known as supertranslations. This implies the conclusion that General Relativity (GR) does not reduce to special relativity in the case of weak fields at long distances.
Riemannian geometry
Riemannian geometry is the branch of differential geometry that studies Riemannian manifolds. An example of a Riemannian manifold is a surface,
on which distances are measured by the length of curves on the surface.
Riemannian geometry is the study of surfaces and their
higher-dimensional analogs (called manifolds), in which distances are calculated along curves belonging to the manifold. Formally, Riemannian geometry is the study of smooth manifolds with a Riemannian metric (an inner product on the tangent space at each point that varies smoothly from point to point). This gives, in particular, local notions of angle, length of curves, surface area and volume. From those, some other global quantities can be derived by integrating local contributions.
Riemannian geometry originated with the vision of Bernhard Riemann expressed in his inaugural lecture "Über die Hypothesen, welche der Geometrie zu Grunde liegen" ("On the Hypotheses on which Geometry is Based"). It is a very broad and abstract generalization of the differential geometry of surfaces in R3.
Development of Riemannian geometry resulted in synthesis of diverse
results concerning the geometry of surfaces and the behavior of geodesics on them, with techniques that can be applied to the study of differentiable manifolds of higher dimensions. It enabled the formulation of Einstein's general theory of relativity, made profound impact on group theory and representation theory, as well as analysis, and spurred the development of algebraic and differential topology.
For physical reasons, a spacetime continuum is mathematically defined as a four-dimensional, smooth, connected Lorentzian manifold. This means the smooth Lorentz metric has signature. The metric determines the geometry of spacetime, as well as determining the geodesics of particles and light beams. About each point (event) on this manifold, coordinate charts are used to represent observers in reference frames. Usually, Cartesian coordinates are used. Moreover, for simplicity's sake, units of measurement are usually chosen such that the speed of light is equal to 1.
A reference frame (observer) can be identified with one of these coordinate charts; any such observer can describe any event . Another reference frame may be identified by a second coordinate chart about . Two observers (one in each reference frame) may describe the same event but obtain different descriptions.
Usually, many overlapping coordinate charts are needed to cover a manifold. Given two coordinate charts, one containing (representing an observer) and another containing
(representing another observer), the intersection of the charts
represents the region of spacetime in which both observers can measure
physical quantities and hence compare results. The relation between the
two sets of measurements is given by a non-singular
coordinate transformation on this intersection. The idea of coordinate
charts as local observers who can perform measurements in their vicinity
also makes good physical sense, as this is how one actually collects
physical data—locally.
For example, two observers, one of whom is on Earth, but the
other one who is on a fast rocket to Jupiter, may observe a comet
crashing into Jupiter (this is the event ). In general, they will disagree about the exact location and timing of this impact, i.e., they will have different 4-tuples
(as they are using different coordinate systems). Although their
kinematic descriptions will differ, dynamical (physical) laws, such as
momentum conservation and the first law of thermodynamics, will still
hold. In fact, relativity theory requires more than this in the sense
that it stipulates these (and all other physical) laws must take the
same form in all coordinate systems. This introduces tensors into relativity, by which all physical quantities are represented.
Geodesics are said to be timelike, null, or spacelike if the
tangent vector to one point of the geodesic is of this nature. Paths of
particles and light beams in spacetime are represented by timelike and
null (lightlike) geodesics, respectively.
Privileged character of 3+1 spacetime
Properties of (n + m)-dimensional spacetimes
There are two kinds of dimensions: spatial (bidirectional) and temporal (unidirectional). Let the number of spatial dimensions be N and the number of temporal dimensions be T. That N = 3 and T = 1, setting aside the compactified dimensions invoked by string theory and undetectable to date, can be explained by appealing to the physical consequences of letting N differ from 3 and T
differ from 1. The argument is often of an anthropic character and
possibly the first of its kind, albeit before the complete concept came
into vogue.
In 1920, Paul Ehrenfest showed that if there is only a single time dimension and more than three spatial dimensions, the orbit of a planet about its Sun cannot remain stable. The same is true of a star's orbit around the center of its galaxy. Ehrenfest also showed that if there are an even number of spatial dimensions, then the different parts of a wave impulse will travel at different speeds. If there are spatial dimensions, where k is a positive whole number, then wave impulses become distorted. In 1922, Hermann Weyl claimed that Maxwell's theory of electromagnetism can be expressed in terms of an action only for a four-dimensional manifold. Finally, Tangherlini showed in 1963 that when there are more than three spatial dimensions, electron orbitals around nuclei cannot be stable; electrons would either fall into the nucleus or disperse.
Max Tegmark expands on the preceding argument in the following anthropic manner. If T differs from 1, the behavior of physical systems could not be predicted reliably from knowledge of the relevant partial differential equations. In such a universe, intelligent life capable of manipulating technology could not emerge. Moreover, if T > 1, Tegmark maintains that protons and electrons
would be unstable and could decay into particles having greater mass
than themselves. (This is not a problem if the particles have a
sufficiently low temperature.) Lastly, if N < 3,
gravitation of any kind becomes problematic, and the universe would
probably be too simple to contain observers. For example, when N < 3, nerves cannot cross without intersecting. Hence anthropic and other arguments rule out all cases except N = 3 and T = 1, which describes the world around us.
On the other hand, in view of creating black holes from an ideal monatomic gas under its self-gravity, Wei-Xiang Feng showed that (3 + 1)-dimensional spacetime is the marginal dimensionality. Moreover, it is the unique dimensionality that can afford a "stable" gas sphere with a "positive" cosmological constant. However, a self-gravitating gas cannot be stably bound if the mass sphere is larger than ~1021 solar masses, due to the small positivity of the cosmological constant observed.
In 2019, James Scargill argued that complex life may be possible
with two spatial dimensions. According to Scargill, a purely scalar
theory of gravity may enable a local gravitational force, and 2D
networks may be sufficient for complex neural networks.