Approximate orbit of the Sun (yellow circle) around the Galactic Center
The galactic year, also known as a cosmic year, is the duration of time required for the Sun to orbit once around the center of the Milky Way Galaxy. One galactic year is approximately 225 million Earth years.
The Solar System is traveling at an average speed of 230 km/s
(828,000 km/h) or 143 mi/s (514,000 mph) within its trajectory around
the Galactic Center,
a speed at which an object could circumnavigate the Earth's equator in 2
minutes and 54 seconds; that speed corresponds to approximately 1/1300
of the speed of light.
The galactic year provides a conveniently usable unit for
depicting cosmic and geological time periods together. By contrast, a
"billion-year" scale does not allow for useful discrimination between
geologic events, and a "million-year" scale requires some rather large
numbers.
Timeline of the universe and Earth's history in galactic years
The Universe's expansion causes all galaxies beyond the Milky Way's Local Group to disappear beyond the cosmic light horizon, removing them from the observable universe
2000 gal
Local Group of 47 galaxies coalesces into a single large galaxy
Visualization of the
orbit of the Sun (yellow dot and white curve) around the Galactic
Center (GC) in the last galactic year. The red dots correspond to the
positions of the stars studied by the European Southern Observatory in a monitoring program.
Luminiferous aether or ether (luminiferous meaning 'light-bearing') was the postulated medium for the propagation of light. It was invoked to explain the ability of the apparently wave-based light to propagate through empty space (a vacuum),
something that waves should not be able to do. The assumption of a
spatial plenum (space completely filled with matter) of luminiferous
aether, rather than a spatial vacuum, provided the theoretical medium
that was required by wave theories of light.
The aether hypothesis was the topic of considerable debate
throughout its history, as it required the existence of an invisible and
infinite material with no interaction with physical objects. As the
nature of light was explored, especially in the 19th century, the
physical qualities required of an aether became increasingly
contradictory. By the late 19th century, the existence of the aether was
being questioned, although there was no physical theory to replace it.
The negative outcome of the Michelson–Morley experiment
(1887) suggested that the aether did not exist, a finding that was
confirmed in subsequent experiments through the 1920s. This led to
considerable theoretical work to explain the propagation of light
without an aether. A major breakthrough was the special theory of relativity,
which could explain why the experiment failed to see aether, but was
more broadly interpreted to suggest that it was not needed. The
Michelson–Morley experiment, along with the blackbody radiator and photoelectric effect, was a key experiment in the development of modern physics, which includes both relativity and quantum theory, the latter of which explains the particle-like nature of light.
In the 17th century, Robert Boyle
was a proponent of an aether hypothesis. According to Boyle, the aether
consists of subtle particles, one sort of which explains the absence of
vacuum and the mechanical interactions between bodies, and the other
sort of which explains phenomena such as magnetism (and possibly
gravity) that are, otherwise, inexplicable on the basis of purely
mechanical interactions of macroscopic bodies, "though in the ether of
the ancients there was nothing taken notice of but a diffused and very
subtle substance; yet we are at present content to allow that there is
always in the air a swarm of streams moving in a determinate course
between the north pole and the south".
Christiaan Huygens's Treatise on Light (1690) hypothesized that light is a wave propagating through an aether. He and Isaac Newton could only envision light waves as being longitudinal, propagating like sound and other mechanical waves in fluids. However, longitudinal waves necessarily have only one form for a given propagation direction, rather than two polarizations like a transverse wave. Thus, longitudinal waves can not explain birefringence,
in which two polarizations of light are refracted differently by a
crystal. In addition, Newton rejected light as waves in a medium because
such a medium would have to extend everywhere in space, and would
thereby "disturb and retard the Motions of those great Bodies" (the
planets and comets) and thus "as it [light's medium] is of no use, and
hinders the Operation of Nature, and makes her languish, so there is no
evidence for its Existence, and therefore it ought to be rejected".
Isaac Newton contended that light is made up of numerous small
particles. This can explain such features as light's ability to travel
in straight lines and reflect
off surfaces. Newton imagined light particles as non-spherical
"corpuscles", with different "sides" that give rise to birefringence.
But the particle theory of light can not satisfactorily explain refraction and diffraction. To explain refraction, Newton's Third Book of Opticks
(1st ed. 1704, 4th ed. 1730) postulated an "aethereal medium"
transmitting vibrations faster than light, by which light, when
overtaken, is put into "Fits of easy Reflexion and easy Transmission",
which caused refraction and diffraction. Newton believed that these
vibrations were related to heat radiation:
Is not the Heat of the warm Room convey'd through the
vacuum by the Vibrations of a much subtiler Medium than Air, which after
the Air was drawn out remained in the Vacuum? And is not this Medium
the same with that Medium by which Light is refracted and reflected, and
by whose Vibrations Light communicates Heat to Bodies, and is put into
Fits of easy Reflexion and easy Transmission?
In contrast to the modern understanding that heat radiation and light are both electromagnetic radiation,
Newton viewed heat and light as two different phenomena. He believed
heat vibrations to be excited "when a Ray of Light falls upon the
Surface of any pellucid Body". He wrote, "I do not know what this Aether is", but that if it consists of particles then they must be
exceedingly
smaller than those of Air, or even than those of Light: The exceeding
smallness of its Particles may contribute to the greatness of the force
by which those Particles may recede from one another, and thereby make
that Medium exceedingly more rare and elastic than Air, and by
consequence exceedingly less able to resist the motions of Projectiles,
and exceedingly more able to press upon gross Bodies, by endeavoring to
expand itself.
Bradley suggests particles
In 1720, James Bradley carried out a series of experiments attempting to measure stellar parallax
by taking measurements of stars at different times of the year. As the
Earth moves around the Sun, the apparent angle to a given distant spot
changes. By measuring those angles the distance to the star can be
calculated based on the known orbital circumference of the Earth around
the Sun. He failed to detect any parallax, thereby placing a lower limit
on the distance to stars.
During these experiments, Bradley also discovered a related
effect; the apparent positions of the stars did change over the year,
but not as expected. Instead of the apparent angle being maximized when
the Earth was at either end of its orbit with respect to the star, the
angle was maximized when the Earth was at its fastest sideways velocity
with respect to the star. This effect is now known as stellar aberration.
Bradley explained this effect in the context of Newton's
corpuscular theory of light, by showing that the aberration angle was
given by simple vector addition of the Earth's orbital velocity and the
velocity of the corpuscles of light, just as vertically falling
raindrops strike a moving object at an angle. Knowing the Earth's
velocity and the aberration angle enabled him to estimate the speed of
light.
Explaining stellar aberration in the context of an aether-based
theory of light was regarded as more problematic. As the aberration
relied on relative velocities, and the measured velocity was dependent
on the motion of the Earth, the aether had to be remaining stationary
with respect to the star as the Earth moved through it. This meant that
the Earth could travel through the aether, a physical medium, with no
apparent effect – precisely the problem that led Newton to reject a wave
model in the first place.
Wave-theory triumphs
A century later, Thomas Young and Augustin-Jean Fresnel
revived the wave theory of light when they pointed out that light could
be a transverse wave rather than a longitudinal wave; the polarization
of a transverse wave (like Newton's "sides" of light) could explain
birefringence, and in the wake of a series of experiments on diffraction
the particle model of Newton was finally abandoned. Physicists assumed, moreover, that, like mechanical waves, light waves required a medium for propagation, and thus required Huygens's idea of an aether "gas" permeating all space.
However, a transverse wave apparently required the propagating
medium to behave as a solid, as opposed to a fluid. The idea of a solid
that did not interact with other matter seemed a bit odd, and Augustin-Louis Cauchy
suggested that perhaps there was some sort of "dragging", or
"entrainment", but this made the aberration measurements difficult to
understand. He also suggested that the absence of longitudinal waves suggested that the aether had negative compressibility. George Green pointed out that such a fluid would be unstable. George Gabriel Stokes became a champion of the entrainment interpretation, developing a model in which the aether might, like pine pitch, be dilatant
(fluid at slow speeds and rigid at fast speeds). Thus the Earth could
move through it fairly freely, but it would be rigid enough to support
light.
Electromagnetism
In 1856, Wilhelm Eduard Weber and Rudolf Kohlrausch
measured the numerical value of the ratio of the electrostatic unit of
charge to the electromagnetic unit of charge. They found that the ratio
between the electrostatic unit of charge and the electromagnetic unit of charge is the speed of light c. The following year, Gustav Kirchhoff
wrote a paper in which he showed that the speed of a signal along an
electric wire was equal to the speed of light. These are the first
recorded historical links between the speed of light and electromagnetic
phenomena.
James Clerk Maxwell began working on Michael Faraday's lines of force. In his 1861 paper On Physical Lines of Force
he modelled these magnetic lines of force using a sea of molecular
vortices that he considered to be partly made of aether and partly made
of ordinary matter. He derived expressions for the dielectric constant
and the magnetic permeability in terms of the transverse elasticity and
the density of this elastic medium. He then equated the ratio of the
dielectric constant to the magnetic permeability with a suitably adapted
version of Weber and Kohlrausch's result of 1856, and he substituted
this result into Newton's equation for the speed of sound. On obtaining a
value that was close to the speed of light as measured by Hippolyte Fizeau, Maxwell concluded that light consists in undulations of the same medium that is the cause of electric and magnetic phenomena.
Maxwell had, however, expressed some uncertainties surrounding
the precise nature of his molecular vortices and so he began to embark
on a purely dynamical approach to the problem. He wrote another paper in
1864, entitled "A Dynamical Theory of the Electromagnetic Field", in which the details of the luminiferous medium were less explicit. Although Maxwell did not explicitly mention the sea of molecular vortices, his derivation of Ampère's circuital law
was carried over from the 1861 paper and he used a dynamical approach
involving rotational motion within the electromagnetic field which he
likened to the action of flywheels. Using this approach to justify the
electromotive force equation (the precursor of the Lorentz force
equation), he derived a wave equation from a set of eight equations
which appeared in the paper and which included the electromotive force
equation and Ampère's circuital law.
Maxwell once again used the experimental results of Weber and
Kohlrausch to show that this wave equation represented an
electromagnetic wave that propagates at the speed of light, hence
supporting the view that light is a form of electromagnetic radiation.
In 1887–1889, Heinrich Hertz
experimentally demonstrated the electric magnetic waves are identical
to light waves. This unification of electromagnetic wave and optics
indicated that there was a single luminiferous aether instead of many
different kinds of aether media.
The apparent need for a propagation medium for such Hertzian waves (later called radio waves)
can be seen by the fact that they consist of orthogonal electric (E)
and magnetic (B or H) waves. The E waves consist of undulating dipolar
electric fields, and all such dipoles appeared to require separated and
opposite electric charges. Electric charge is an inextricable property
of matter,
so it appeared that some form of matter was required to provide the
alternating current that would seem to have to exist at any point along
the propagation path of the wave. Propagation of waves in a true vacuum
would imply the existence of electric fields without associated electric charge, or of electric charge without associated matter. Albeit compatible with Maxwell's equations, electromagnetic induction
of electric fields could not be demonstrated in vacuum, because all
methods of detecting electric fields required electrically charged
matter.
In addition, Maxwell's equations required that all electromagnetic waves in vacuum propagate at a fixed speed, c. As this can only occur in one reference frame in Newtonian physics (see Galilean relativity),
the aether was hypothesized as the absolute and unique frame of
reference in which Maxwell's equations hold. That is, the aether must be
"still" universally, otherwise c would vary along with any
variations that might occur in its supportive medium. Maxwell himself
proposed several mechanical models of aether based on wheels and gears,
and George Francis FitzGerald
even constructed a working model of one of them. These models had to
agree with the fact that the electromagnetic waves are transverse but
never longitudinal.
Problems
By this point the mechanical qualities of the aether had become more and more magical: it had to be a fluid
in order to fill space, but one that was millions of times more rigid
than steel in order to support the high frequencies of light waves. It
also had to be massless and without viscosity,
otherwise it would visibly affect the orbits of planets. Additionally
it appeared it had to be completely transparent, non-dispersive, incompressible, and continuous at a very small scale. Maxwell wrote in Encyclopædia Britannica:
Aethers were invented for the planets to swim in, to
constitute electric atmospheres and magnetic effluvia, to convey
sensations from one part of our bodies to another, and so on, until all
space had been filled three or four times over with aethers. ... The
only aether which has survived is that which was invented by Huygens to
explain the propagation of light.
By the early 20th century, aether theory was in trouble. A series of increasingly complex experiments
had been carried out in the late 19th century to try to detect the
motion of the Earth through the aether, and had failed to do so. A range
of proposed aether-dragging theories could explain the null result but
these were more complex, and tended to use arbitrary-looking
coefficients and physical assumptions. Lorentz and FitzGerald offered
within the framework of Lorentz ether theory
a more elegant solution to how the motion of an absolute aether could
be undetectable (length contraction), but if their equations were
correct, the new special theory of relativity (1905) could generate the same mathematics without referring to an aether at all. Aether fell to Occam's Razor.
The two most important models, which were aimed to describe the relative motion of the Earth and aether, were Augustin-Jean Fresnel's (1818) model of the (nearly) stationary aether including a partial aether drag determined by Fresnel's dragging coefficient, and George Gabriel Stokes' (1844)
model of complete aether drag. The latter theory was not considered as correct, since it was not compatible with the aberration of light, and the auxiliary hypotheses developed to explain this problem were not convincing. Also, subsequent experiments as the Sagnac effect (1913) also showed that this model is untenable. However, the most important experiment supporting Fresnel's theory was Fizeau's 1851 experimental confirmation of Fresnel's 1818 prediction that a medium with refractive indexn moving with a velocity v would increase the speed of light travelling through the medium in the same direction as v from c/n to:
That is, movement adds only a fraction of the medium's velocity to the light (predicted by Fresnel in order to make Snell's law
work in all frames of reference, consistent with stellar aberration).
This was initially interpreted to mean that the medium drags the aether
along, with a portion of the medium's velocity, but that understanding became very problematic after Wilhelm Veltmann demonstrated that the index n in Fresnel's formula depended upon the wavelength
of light, so that the aether could not be moving at a
wavelength-independent speed. This implied that there must be a separate
aether for each of the infinitely many frequencies.
Negative aether-drift experiments
The
key difficulty with Fresnel's aether hypothesis arose from the
juxtaposition of the two well-established theories of Newtonian dynamics
and Maxwell's electromagnetism. Under a Galilean transformation the equations of Newtonian dynamics are invariant,
whereas those of electromagnetism are not. Basically this means that
while physics should remain the same in non-accelerated experiments,
light would not follow the same rules because it is travelling in the
universal "aether frame". Some effect caused by this difference should
be detectable.
A simple example concerns the model on which aether was
originally built: sound. The speed of propagation for mechanical waves,
the speed of sound,
is defined by the mechanical properties of the medium. Sound travels
4.3 times faster in water than in air. This explains why a person
hearing an explosion underwater and quickly surfacing can hear it again
as the slower travelling sound arrives through the air. Similarly, a
traveller on an airliner
can still carry on a conversation with another traveller because the
sound of words is travelling along with the air inside the aircraft.
This effect is basic to all Newtonian dynamics, which says that
everything from sound to the trajectory of a thrown baseball should all
remain the same in the aircraft flying (at least at a constant speed) as
if still sitting on the ground. This is the basis of the Galilean
transformation, and the concept of frame of reference.
But the same was not supposed to be true for light, since
Maxwell's mathematics demanded a single universal speed for the
propagation of light, based, not on local conditions, but on two
measured properties, the permittivity and permeability
of free space, that were assumed to be the same throughout the
universe. If these numbers did change, there should be noticeable
effects in the sky; stars in different directions would have different
colours, for instance.
Thus at any point there should be one special coordinate system,
"at rest relative to the aether". Maxwell noted in the late 1870s that
detecting motion relative to this aether should be easy enough—light
travelling along with the motion of the Earth would have a different
speed than light travelling backward, as they would both be moving
against the unmoving aether. Even if the aether had an overall universal
flow, changes in position during the day/night cycle, or over the span
of seasons, should allow the drift to be detected.
First-order experiments
Although
the aether is almost stationary according to Fresnel, his theory
predicts a positive outcome of aether drift experiments only to second order in
because Fresnel's dragging coefficient would cause a negative outcome
of all optical experiments capable of measuring effects to first order in .
This was confirmed by the following first-order experiments, all of
which gave negative results. The following list is based on the
description of Wilhelm Wien (1898), with changes and additional experiments according to the descriptions of Edmund Taylor Whittaker (1910) and Jakob Laub (1910):
The experiment of François Arago
(1810), to confirm whether refraction, and thus the aberration of
light, is influenced by Earth's motion. Similar experiments were
conducted by George Biddell Airy (1871) by means of a telescope filled with water, and Éleuthère Mascart (1872).
The experiment of Fizeau (1860), to find whether the rotation of the
polarization plane through glass columns is changed by Earth's motion.
He obtained a positive result, but Lorentz could show that the results
have been contradictory. DeWitt Bristol Brace (1905) and Strasser (1907) repeated the experiment with improved accuracy, and obtained negative results.
The experiment of Martin Hoek (1868). This experiment is a more precise variation of the Fizeau experiment
(1851). Two light rays were sent in opposite directions – one of them
traverses a path filled with resting water, the other one follows a path
through air. In agreement with Fresnel's dragging coefficient, he
obtained a negative result.
The experiment of Wilhelm Klinkerfues
(1870) investigated whether an influence of Earth's motion on the
absorption line of sodium exists. He obtained a positive result, but
this was shown to be an experimental error, because a repetition of the
experiment by Haga (1901) gave a negative result.
The experiment of Ketteler (1872), in which two rays of an
interferometer were sent in opposite directions through two mutually
inclined tubes filled with water. No change of the interference fringes
occurred. Later, Mascart (1872) showed that the interference fringes of
polarized light in calcite remained uninfluenced as well.
The experiment of Éleuthère Mascart
(1872) to find a change of rotation of the polarization plane in
quartz. No change of rotation was found when the light rays had the
direction of Earth's motion and then the opposite direction. Lord Rayleigh conducted similar experiments with improved accuracy, and obtained a negative result as well.
Besides those optical experiments, also electrodynamic first-order
experiments were conducted, which should have led to positive results
according to Fresnel. However, Hendrik Antoon Lorentz (1895) modified Fresnel's theory and showed that those experiments can be explained by a stationary aether as well:
The experiment of Wilhelm Röntgen (1888), to find whether a charged capacitor produces magnetic forces due to Earth's motion.
The experiment of Theodor des Coudres
(1889), to find whether the inductive effect of two wire rolls upon a
third one is influenced by the direction of Earth's motion. Lorentz
showed that this effect is cancelled to first order by the electrostatic
charge (produced by Earth's motion) upon the conductors.
The experiment of Königsberger (1905). The plates of a capacitor are
located in the field of a strong electromagnet. Due to Earth's motion,
the plates should have become charged. No such effect was observed.
The experiment of Frederick Thomas Trouton
(1902). A capacitor was brought parallel to Earth's motion, and it was
assumed that momentum is produced when the capacitor is charged. The
negative result can be explained by Lorentz's theory, according to which
the electromagnetic momentum compensates the momentum due to Earth's
motion. Lorentz could also show, that the sensitivity of the apparatus
was much too low to observe such an effect.
Second-order experiments
While the first-order experiments could be explained by a modified stationary aether, more precise second-order experiments were expected to give positive results. However, no such results could be found.
The famous Michelson–Morley experiment
compared the source light with itself after being sent in different
directions and looked for changes in phase in a manner that could be
measured with extremely high accuracy. In this experiment, their goal
was to determine the velocity of the Earth through the aether. The publication of their result in 1887, the null result,
was the first clear demonstration that something was seriously wrong
with the aether hypothesis (Michelson's first experiment in 1881 was not
entirely conclusive). In this case the MM experiment yielded a shift of
the fringing pattern of about 0.01 of a fringe,
corresponding to a small velocity. However, it was incompatible with
the expected aether wind effect due to the Earth's (seasonally varying)
velocity which would have required a shift of 0.4 of a fringe, and the
error was small enough that the value may have indeed been zero.
Therefore, the null hypothesis,
the hypothesis that there was no aether wind, could not be rejected.
More modern experiments have since reduced the possible value to a
number very close to zero, about 10−17.
It is obvious from what has gone
before that it would be hopeless to attempt to solve the question of the
motion of the solar system by observations of optical phenomena at the
surface of the earth.
— A. Michelson and E. Morley. "On the Relative Motion of the Earth and the Luminiferous Æther". Philosophical Magazine S. 5. Vol. 24. No. 151. December 1887.
A series of experiments using similar but increasingly sophisticated
apparatuses all returned the null result as well. Conceptually different
experiments that also attempted to detect the motion of the aether were
the Trouton–Noble experiment (1903), whose objective was to detect torsion effects caused by electrostatic fields, and the experiments of Rayleigh and Brace (1902, 1904), to detect double refraction in various media. However, all of them obtained a null result, like Michelson–Morley (MM) previously did.
These "aether-wind" experiments led to a flurry of efforts to
"save" aether by assigning to it ever more complex properties, and only a
few scientists, like Emil Cohn or Alfred Bucherer,
considered the possibility of the abandonment of the aether hypothesis.
Of particular interest was the possibility of "aether entrainment" or
"aether drag", which would lower the magnitude of the measurement,
perhaps enough to explain the results of the Michelson–Morley
experiment. However, as noted earlier, aether dragging already had
problems of its own, notably aberration. In addition, the interference
experiments of Lodge (1893, 1897) and Ludwig Zehnder (1895), aimed to show whether the aether is dragged by various, rotating masses, showed no aether drag. A more precise measurement was made in the Hammar experiment (1935), which ran a complete MM experiment with one of the "legs" placed between two massive lead blocks.
If the aether was dragged by mass then this experiment would have been
able to detect the drag caused by the lead, but again the null result
was achieved. The theory was again modified, this time to suggest that
the entrainment only worked for very large masses or those masses with
large magnetic fields. This too was shown to be incorrect by the Michelson–Gale–Pearson experiment, which detected the Sagnac effect due to Earth's rotation (see Aether drag hypothesis).
Another completely different attempt to save "absolute" aether was made in the Lorentz–FitzGerald contraction hypothesis, which posited that everything
was affected by travel through the aether. In this theory, the reason
that the Michelson–Morley experiment "failed" was that the apparatus
contracted in length in the direction of travel. That is, the light was
being affected in the "natural" manner by its travel through the aether
as predicted, but so was the apparatus itself, cancelling out any
difference when measured. FitzGerald had inferred this hypothesis from a
paper by Oliver Heaviside. Without referral to an aether, this physical interpretation of relativistic effects was shared by Kennedy and Thorndike
in 1932 as they concluded that the interferometer's arm contracts and
also the frequency of its light source "very nearly" varies in the way
required by relativity.
Similarly, the Sagnac effect, observed by G. Sagnac in 1913, was immediately seen to be fully consistent with special relativity. In fact, the Michelson–Gale–Pearson experiment
in 1925 was proposed specifically as a test to confirm the relativity
theory, although it was also recognized that such tests, which merely
measure absolute rotation, are also consistent with non-relativistic
theories.
During the 1920s, the experiments pioneered by Michelson were repeated by Dayton Miller,
who publicly proclaimed positive results on several occasions, although
they were not large enough to be consistent with any known aether
theory. However, other researchers were unable to duplicate Miller's
claimed results. Over the years the experimental accuracy of such
measurements has been raised by many orders of magnitude, and no trace
of any violations of Lorentz invariance has been seen. (A later
re-analysis of Miller's results concluded that he had underestimated the
variations due to temperature.)
Since the Miller experiment and its unclear results there have
been many more experimental attempts to detect the aether. Many
experimenters have claimed positive results. These results have not
gained much attention from mainstream science, since they contradict a
large quantity of high-precision measurements, all the results of which
were consistent with special relativity.
Between 1892 and 1904, Hendrik Lorentz
developed an electron–aether theory, in which he avoided making
assumptions about the aether. In his model the aether is completely
motionless, and by that he meant that it could not be set in motion in
the neighborhood of ponderable matter. Contrary to earlier electron
models, the electromagnetic field of the aether appears as a mediator
between the electrons, and changes in this field cannot propagate faster
than the speed of light. A fundamental concept of Lorentz's theory in
1895 was the "theorem of corresponding states" for terms of order v/c.
This theorem states that an observer moving relative to the aether
makes the same observations as a resting observer, after a suitable
change of variables. Lorentz noticed that it was necessary to change the
space-time variables when changing frames and introduced concepts like
physical length contraction (1892) to explain the Michelson–Morley experiment, and the mathematical concept of local time (1895) to explain the aberration of light and the Fizeau experiment. This resulted in the formulation of the so-called Lorentz transformation by Joseph Larmor (1897, 1900) and Lorentz (1899, 1904), whereby (it was noted by Larmor) the complete formulation of local time is accompanied by some sort of time dilation
of electrons moving in the aether. As Lorentz later noted (1921, 1928),
he considered the time indicated by clocks resting in the aether as
"true" time, while local time was seen by him as a heuristic working
hypothesis and a mathematical artifice.
Therefore, Lorentz's theorem is seen by modern authors as being a
mathematical transformation from a "real" system resting in the aether
into a "fictitious" system in motion.
The work of Lorentz was mathematically perfected by Henri Poincaré, who formulated on many occasions the Principle of Relativity
and tried to harmonize it with electrodynamics. He declared
simultaneity only a convenient convention which depends on the speed of
light, whereby the constancy of the speed of light would be a useful postulate for making the laws of nature as simple as possible. In 1900 and 1904 he physically interpreted Lorentz's local time as the result of clock synchronization by light signals. In June and July 1905
he declared the relativity principle a general law of nature, including
gravitation. He corrected some mistakes of Lorentz and proved the
Lorentz covariance of the electromagnetic equations. However, he used
the notion of an aether as a perfectly undetectable medium and
distinguished between apparent and real time, so most historians of
science argue that he failed to invent special relativity.
End of aether
Special relativity
Aether theory was dealt another blow when the Galilean transformation and Newtonian dynamics were both modified by Albert Einstein's special theory of relativity, giving the mathematics of Lorentzian electrodynamics a new, "non-aether" context.
Unlike most major shifts in scientific thought, special relativity was
adopted by the scientific community remarkably quickly, consistent with
Einstein's later comment that the laws of physics described by the
Special Theory were "ripe for discovery" in 1905. Max Planck's early advocacy of the special theory, along with the elegant formulation given to it by Hermann Minkowski, contributed much to the rapid acceptance of special relativity among working scientists.
Einstein based his theory on Lorentz's earlier work. Instead of
suggesting that the mechanical properties of objects changed with their
constant-velocity motion through an undetectable aether, Einstein
proposed to deduce the characteristics that any successful theory must
possess in order to be consistent with the most basic and firmly
established principles, independent of the existence of a hypothetical
aether. He found that the Lorentz transformation must transcend its
connection with Maxwell's equations, and must represent the fundamental
relations between the space and time coordinates of inertial frames of reference.
In this way he demonstrated that the laws of physics remained invariant
as they had with the Galilean transformation, but that light was now
invariant as well.
With the development of the special theory of relativity, the need to account for a single universal frame of reference
had disappeared – and acceptance of the 19th-century theory of a
luminiferous aether disappeared with it. For Einstein, the Lorentz
transformation implied a conceptual change: that the concept of position
in space or time was not absolute, but could differ depending on the
observer's location and velocity.
Moreover, in another paper published the same month in 1905, Einstein made several observations on a then-thorny problem, the photoelectric effect.
In this work he demonstrated that light can be considered as particles
that have a "wave-like nature". Particles obviously do not need a medium
to travel, and thus, neither did light. This was the first step that
would lead to the full development of quantum mechanics, in which the wave-like nature and
the particle-like nature of light are both considered as valid
descriptions of light. A summary of Einstein's thinking about the aether
hypothesis, relativity and light quanta may be found in his 1909
(originally German) lecture "The Development of Our Views on the
Composition and Essence of Radiation".
Lorentz on his side continued to use the aether hypothesis. In
his lectures of around 1911, he pointed out that what "the theory of
relativity has to say ... can be carried out independently of what one
thinks of the aether and the time". He commented that "whether there is
an aether or not, electromagnetic fields certainly exist, and so also
does the energy of the electrical oscillations" so that, "if we do not
like the name of 'aether', we must use another word as a peg to hang all
these things upon". He concluded that "one cannot deny the bearer of
these concepts a certain substantiality".
Nevertheless, in 1920, Einstein gave an address at Leiden University
in which he commented "More careful reflection teaches us however, that
the special theory of relativity does not compel us to deny ether. We
may assume the existence of an ether; only we must give up ascribing a
definite state of motion to it, i.e. we must by abstraction take from it
the last mechanical characteristic which Lorentz had still left it. We
shall see later that this point of view, the conceivability of which I
shall at once endeavour to make more intelligible by a somewhat halting
comparison, is justified by the results of the general theory of
relativity". He concluded his address by saying that "according to the
general theory of relativity space is endowed with physical qualities;
in this sense, therefore, there exists an ether. According to the
general theory of relativity space without ether is unthinkable."
In later years there have been a few individuals who advocated a
neo-Lorentzian approach to physics, which is Lorentzian in the sense of
positing an absolute true state of rest that is undetectable and which
plays no role in the predictions of the theory. (No violations of Lorentz covariance
have ever been detected, despite strenuous efforts.) Hence these
theories resemble the 19th century aether theories in name only. For
example, the founder of quantum field theory, Paul Dirac, stated in 1951 in an article in Nature, titled "Is there an Aether?" that "we are rather forced to have an aether".However, Dirac never formulated a complete theory, and so his speculations found no acceptance by the scientific community.
Einstein's views on the aether
When
Einstein was still a student in the Zurich Polytechnic in 1900, he was
very interested in the idea of aether. His initial proposal of research
thesis was to do an experiment to measure how fast the Earth was moving
through the aether.
"The velocity of a wave is proportional to the square root of the
elastic forces which cause [its] propagation, and inversely proportional
to the mass of the aether moved by these forces."
In 1916, after Einstein completed his foundational work on general relativity,
Lorentz wrote a letter to him in which he speculated that within
general relativity the aether was re-introduced. In his response
Einstein wrote that one can actually speak about a "new aether", but one
may not speak of motion in relation to that aether. This was further
elaborated by Einstein in some semi-popular articles (1918, 1920, 1924,
1930).
In 1918, Einstein publicly alluded to that new definition for the first time.
Then, in the early 1920s, in a lecture which he was invited to give at
Lorentz's university in Leiden, Einstein sought to reconcile the theory
of relativity with Lorentzian aether.
In this lecture Einstein stressed that special relativity took away the
last mechanical property of the aether: immobility. However, he
continued that special relativity does not necessarily rule out the
aether, because the latter can be used to give physical reality to
acceleration and rotation. This concept was fully elaborated within general relativity,
in which physical properties (which are partially determined by matter)
are attributed to space, but no substance or state of motion can be
attributed to that "aether" (by which he meant curved space-time).
In another paper of 1924, named "Concerning the Aether", Einstein
argued that Newton's absolute space, in which acceleration is absolute,
is the "Aether of Mechanics". And within the electromagnetic theory of
Maxwell and Lorentz one can speak of the "Aether of Electrodynamics", in
which the aether possesses an absolute state of motion. As regards
special relativity, also in this theory acceleration is absolute as in
Newton's mechanics. However, the difference from the electromagnetic
aether of Maxwell and Lorentz lies in the fact that "because it was no
longer possible to speak, in any absolute sense, of simultaneous states
at different locations in the aether, the aether became, as it were,
four-dimensional since there was no objective way of ordering its states
by time alone". Now the "aether of special relativity" is still
"absolute", because matter is affected by the properties of the aether,
but the aether is not affected by the presence of matter. This asymmetry
was solved within general relativity. Einstein explained that the
"aether of general relativity" is not absolute, because matter is
influenced by the aether, just as matter influences the structure of the
aether.
The only similarity of this relativistic aether concept with the classical aether models lies in the presence of physical properties in space, which can be identified through geodesics. As historians such as John Stachel
argue, Einstein's views on the "new aether" are not in conflict with
his abandonment of the aether in 1905. As Einstein himself pointed out,
no "substance" and no state of motion can be attributed to that new
aether. Einstein's use of the word "aether" found little support in the
scientific community, and played no role in the continuing development
of modern physics.
A USB dead drop is a USB mass storage device installed in a public space. For example, a USB flash drive might be mounted in an outdoor brick wall and fixed in place with fast concrete. Members of the public are implicitly invited to find files, or leave files, on a dead drop by directly plugging their laptop into the wall-mounted USB stick in order to transfer data. (It is also possible to use smartphones and tablets for this purpose, by utilizing a USB on-the-go cable.) The dead drops can therefore be regarded as an anonymous, offline, peer-to-peer file sharing network. In practice, USB dead drops are more often used for social or artistic reasons, rather than practical ones.
Background and history
The Dead Drops project was conceived by Berlin-based conceptual artist Aram Bartholl, a member of New York's F.A.T. Lab art and technology collective. The first USB dead drop network of five devices was installed by Bartholl in October 2010 in Brooklyn, New York City. The name comes from the dead drop method of communication used in espionage. An unrelated system called "deadSwap", in which participants use an SMS gateway to coordinate passing USB memory sticks on to one another, was begun in Germany in 2009.
Each dead drop is typically installed without any data except two files: deaddrops-manifesto.txt, and a readme.txt file explaining the project. Although typically found in urban areas embedded in concrete or brick, installation of USB dead drops in trees and other organic structures in natural settings have also been observed. Wireless dead drops such as the 2011 PirateBox, where the user connects to a Wi-Fi hotspot with network attached storage rather than physically connecting to a USB device, have also been created.
Comparison to other types of data transfer
Some reasons to use USB dead drops are practical. They permit P2P
file sharing without needing any internet or cellular connection,
sharing files with another person secretly/anonymously, and they do not
track any IP address or similar personally identifying information. Other benefits are more social or artistic in nature: USB dead drops are an opportunity to practice what Telecomix describes as datalove and can be seen as a way to promote off-grid data networks. Motivation for using USB dead drops has been likened to what drives people involved in geocaching,
which has existed for longer and is somewhat similar in that often a
set of GPS coordinates is used to locate a particular USB dead drop. Specifically, USB dead drops give the user "the thrill of discovery" in seeking out the location of the dead drop and when examining the data it contains. A QR-Code dead drop including the data in the QR code image or pointing to a decentralized storage repository would be an alternative and less risky option compared to a physical USB dead drop as long as users avoid IP address disclosure.
Potential drawbacks
Dead
drops are USB-based devices, which must be connected to an upstream
computer system, e.g. laptop or smartphone or similar. The act of
making such a connection, to a device which is not necessarily trusted,
inherently poses certain threats:
Malware: anyone can intentionally or unintentionally infect an attached computer with malware such as a trojan horse, keylogger, or unwanted firmware. This risk can be mitigated by using antivirus software, or by using a throwaway device for the act of data transfer.
Booby trap: a fake dead drop or USB Killer
might be rigged to electrically damage any equipment connected to it,
and/or constitute a health and safety hazard for users. This risk can
be mitigated by using a USB galvanic isolation
adapter, which allows data exchange while physically decoupling the two
circuits. Wifi-based dead drops are not vulnerable to this threat.
Mugging:
because a USB dead drop is normally in a public or quasi-public
location, users may be physically attacked when they attempt to use the
system, for a variety of reasons including theft of the user's devices.
Drawbacks to system infrastructure
Publicly
and privately available USB dead drops give anyone (with physical
access) the ability to save and transfer data anonymously and free of
charge. These features are an advantages over the internet and the cellular network,
which are at best quasi-anonymous and low-cost (there is always some
fee associated although in certain scenarios such as
government-subsidized or employer-subsidized or
public-library-subsidized network access the end user may experience no direct costs). However, offline networks are vulnerable to various types of threats and disadvantages, relative to online ones:
One device at a time: Users cannot plug in to a USB dead drop if someone else is already plugged in
Removal of stored data: anyone with physical access can erase all of the data held within the USB dead drop (via file deletion or disk formatting), or make it unusable by encrypting the data of the whole drive and hiding the key (see also the related topic of ransomware).
Removal of the entire device: thieves can steal the USB drive itself.
Disclosure: anyone can disclose the location of a (formerly) private
dead drop, by shadowing people that use it, and publishing coordinates
in a public fashion. This impacts the anonymous nature of USB dead
drops, since known drops can be filmed or otherwise observed.
Vandalism of the dead drop by physical destruction: anyone with physical access can destroy the dead drop, e.g. with pliers, a hammer,
high voltage from a static field, high temperature from a blowtorch, or
other physical force. Likelihood of vandalism or extraction is reduced
by sealing the USB dead drop in a hole deeper than its length but this
requires legitimate users to connect with a USB extender cable.
Sometimes the installation of the dead-drop can itself be vandalism of
the building; i.e. when a building owner destroys a dead drop placed
without permission.
Demolition
or damage during maintenance: certain dead drop locations are limited
to the lifespan of public structures. When a dead drop is embedded in a
brick wall, the drop can be destroyed when the wall is destroyed. When
a drop is embedded in a concrete sidewalk, the drop can be destroyed by
sidewalk-related construction and maintenance. Sometimes dead drops
are damaged when walls are repainted.
Prevalence
As of 2013, there were approximately 1000 USB dead drops (plus six known wifi-based dead drops). Most known USB dead drops are in the United States and Europe. As of 2016, overall dead drop infrastructure was estimated as being more than 10 terabytes
of storage capacity, with the majority still located in the United
States and Europe, but with growing numbers installed in the Asia-Pacific region, South America, and Africa.
In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.
General
Although
the word "optimization" shares the same root as "optimal", it is rare
for the process of optimization to produce a truly optimal system. A
system can generally be made optimal not in absolute terms, but only
with respect to a given quality metric, which may be in contrast with
other possible metrics. As a result, the optimized system will typically
only be optimal in one application or for one audience. One might
reduce the amount of time that a program takes to perform some task at
the price of making it consume more memory. In an application where
memory space is at a premium, one might deliberately choose a slower algorithm in order to use less memory. Often there is no "one size fits all" design which works well in all cases, so engineers make trade-offs
to optimize the attributes of greatest interest. Additionally, the
effort required to make a piece of software completely optimal –
incapable of any further improvement – is almost always more than is
reasonable for the benefits that would be accrued; so the process of
optimization may be halted before a completely optimal solution has been
reached. Fortunately, it is often the case that the greatest
improvements come early in the process.
Even for a given quality metric (such as execution speed), most
methods of optimization only improve the result; they have no pretense
of producing optimal output. Superoptimization is the process of finding truly optimal output.
Levels of optimization
Optimization
can occur at a number of levels. Typically the higher levels have
greater impact, and are harder to change later on in a project,
requiring significant changes or a complete rewrite if they need to be
changed. Thus optimization can typically proceed via refinement from
higher to lower, with initial gains being larger and achieved with less
work, and later gains being smaller and requiring more work. However, in
some cases overall performance depends on performance of very low-level
portions of a program, and small changes at a late stage or early
consideration of low-level details can have outsized impact. Typically
some consideration is given to efficiency throughout a project – though
this varies significantly – but major optimization is often considered
a refinement to be done late, if ever. On longer-running projects there
are typically cycles of optimization, where improving one area reveals
limitations in another, and these are typically curtailed when
performance is acceptable or gains become too small or costly.
As performance is part of the specification of a program – a
program that is unusably slow is not fit for purpose: a video game with
60 Hz (frames-per-second) is acceptable, but 6 frames-per-second is
unacceptably choppy – performance is a consideration from the start, to
ensure that the system is able to deliver sufficient performance, and
early prototypes need to have roughly acceptable performance for there
to be confidence that the final system will (with optimization) achieve
acceptable performance. This is sometimes omitted in the belief that
optimization can always be done later, resulting in prototype systems
that are far too slow – often by an order of magnitude
or more – and systems that ultimately are failures because they
architecturally cannot achieve their performance goals, such as the Intel 432
(1981); or ones that take years of work to achieve acceptable
performance, such as Java (1995), which only achieved acceptable
performance with HotSpot
(1999). The degree to which performance changes between prototype and
production system, and how amenable it is to optimization, can be a
significant source of uncertainty and risk.
Design level
At
the highest level, the design may be optimized to make best use of the
available resources, given goals, constraints, and expected use/load.
The architectural design of a system overwhelmingly affects its
performance. For example, a system that is network latency-bound (where
network latency is the main constraint on overall performance) would be
optimized to minimize network trips, ideally making a single request (or
no requests, as in a push protocol) rather than multiple roundtrips. Choice of design depends on the goals: when designing a compiler, if fast compilation is the key priority, a one-pass compiler is faster than a multi-pass compiler
(assuming same work), but if speed of output code is the goal, a slower
multi-pass compiler fulfills the goal better, even though it takes
longer itself. Choice of platform and programming language occur at this
level, and changing them frequently requires a complete rewrite, though
a modular system may allow rewrite of only some component – for
example, a Python program may rewrite performance-critical sections in
C. In a distributed system, choice of architecture (client-server, peer-to-peer,
etc.) occurs at the design level, and may be difficult to change,
particularly if all components cannot be replaced in sync (e.g., old
clients).
Algorithms and data structures
Given an overall design, a good choice of efficient algorithms and data structures, and efficient implementation of these algorithms and data structures comes next. After design, the choice of algorithms
and data structures affects efficiency more than any other aspect of
the program. Generally data structures are more difficult to change than
algorithms, as a data structure assumption and its performance
assumptions are used throughout the program, though this can be
minimized by the use of abstract data types in function definitions, and keeping the concrete data structure definitions restricted to a few places.
For algorithms, this primarily consists of ensuring that algorithms are constant O(1), logarithmic O(log n), linear O(n), or in some cases log-linear O(n log n) in the input (both in space and time). Algorithms with quadratic complexity O(n2)
fail to scale, and even linear algorithms cause problems if repeatedly
called, and are typically replaced with constant or logarithmic if
possible.
Beyond asymptotic order of growth, the constant factors matter:
an asymptotically slower algorithm may be faster or smaller (because
simpler) than an asymptotically faster algorithm when they are both
faced with small input, which may be the case that occurs in reality.
Often a hybrid algorithm will provide the best performance, due to this tradeoff changing with size.
A general technique to improve performance is to avoid work. A good example is the use of a fast path
for common cases, improving performance by avoiding unnecessary work.
For example, using a simple text layout algorithm for Latin text, only
switching to a complex layout algorithm for complex scripts, such as Devanagari. Another important technique is caching, particularly memoization,
which avoids redundant computations. Because of the importance of
caching, there are often many levels of caching in a system, which can
cause problems from memory use, and correctness issues from stale
caches.
Source code level
Beyond
general algorithms and their implementation on an abstract machine,
concrete source code level choices can make a significant difference.
For example, on early C compilers, while(1) was slower than for(;;) for an unconditional loop, because while(1) evaluated 1 and then had a conditional jump which tested if it was true, while for (;;) had an unconditional jump . Some optimizations (such as this one) can nowadays be performed by optimizing compilers.
This depends on the source language, the target machine language, and
the compiler, and can be both difficult to understand or predict and
changes over time; this is a key place where understanding of compilers
and machine code can improve performance. Loop-invariant code motion and return value optimization
are examples of optimizations that reduce the need for auxiliary
variables and can even result in faster performance by avoiding
round-about optimizations.
Build level
Between the source and compile level, directives and build flags can be used to tune performance options in the source code and compiler respectively, such as using preprocessor
defines to disable unneeded software features, optimizing for specific
processor models or hardware capabilities, or predicting branching, for instance. Source-based software distribution systems such as BSD's Ports and Gentoo's Portage can take advantage of this form of optimization.
At the lowest level, writing code using an assembly language,
designed for a particular hardware platform can produce the most
efficient and compact code if the programmer takes advantage of the full
repertoire of machine instructions. Many operating systems used on embedded systems
have been traditionally written in assembler code for this reason.
Programs (other than very small programs) are seldom written from start
to finish in assembly due to the time and cost involved. Most are
compiled down from a high level language to assembly and hand optimized
from there. When efficiency and size are less important large parts may
be written in a high-level language.
With more modern optimizing compilers and the greater complexity of recent CPUs,
it is harder to write more efficient code than what the compiler
generates, and few projects need this "ultimate" optimization step.
Much of the code written today is intended to run on as many
machines as possible. As a consequence, programmers and compilers don't
always take advantage of the more efficient instructions provided by
newer CPUs or quirks of older models. Additionally, assembly code tuned
for a particular processor without using such instructions might still
be suboptimal on a different processor, expecting a different tuning of
the code.
Typically today rather than writing in assembly language, programmers will use a disassembler
to analyze the output of a compiler and change the high-level source
code so that it can be compiled more efficiently, or understand why it
is inefficient.
Run time
Just-in-time
compilers can produce customized machine code based on run-time data,
at the cost of compilation overhead. This technique dates to the
earliest regular expression engines, and has become widespread with Java HotSpot and V8 for JavaScript. In some cases adaptive optimization may be able to perform run time
optimization exceeding the capability of static compilers by
dynamically adjusting parameters according to the actual input or other
factors.
Profile-guided optimization
is an ahead-of-time (AOT) compilation optimization technique based on
run time profiles, and is similar to a static "average case" analog of
the dynamic technique of adaptive optimization.
Self-modifying code
can alter itself in response to run time conditions in order to
optimize code; this was more common in assembly language programs.
Code optimization can be also broadly categorized as platform-dependent
and platform-independent techniques. While the latter ones are
effective on most or all platforms, platform-dependent techniques use
specific properties of one platform, or rely on parameters depending on
the single platform or even on the single processor. Writing or
producing different versions of the same code for different processors
might therefore be needed. For instance, in the case of compile-level
optimization, platform-independent techniques are generic techniques
(such as loop unrolling,
reduction in function calls, memory efficient routines, reduction in
conditions, etc.), that impact most CPU architectures in a similar way. A
great example of platform-independent optimization has been shown with
inner for loop, where it was observed that a loop with an inner for loop
performs more computations per unit time than a loop without it or one
with an inner while loop. Generally, these serve to reduce the total instruction path length
required to complete the program and/or reduce total memory usage
during the process. On the other hand, platform-dependent techniques
involve instruction scheduling, instruction-level parallelism,
data-level parallelism, cache optimization techniques (i.e., parameters
that differ among various platforms) and the optimal instruction
scheduling might be different even on different processors of the same
architecture.
Strength reduction
Computational
tasks can be performed in several different ways with varying
efficiency. A more efficient version with equivalent functionality is
known as a strength reduction. For example, consider the following C code snippet whose intention is to obtain the sum of all integers from 1 to N:
This code can (assuming no arithmetic overflow) be rewritten using a mathematical formula like:
intsum=N*(1+N)/2;printf("sum: %d\n",sum);
The optimization, sometimes performed automatically by an optimizing compiler, is to select a method (algorithm) that is more computationally efficient, while retaining the same functionality. See algorithmic efficiency
for a discussion of some of these techniques. However, a significant
improvement in performance can often be achieved by removing extraneous
functionality.
Optimization is not always an obvious or intuitive process. In
the example above, the "optimized" version might actually be slower than
the original version if N were sufficiently small and the particular hardware happens to be much faster at performing addition and looping operations than multiplication and division.
Trade-offs
In
some cases, however, optimization relies on using more elaborate
algorithms, making use of "special cases" and special "tricks" and
performing complex trade-offs. A "fully optimized" program might be more
difficult to comprehend and hence may contain more faults than unoptimized versions. Beyond eliminating obvious antipatterns, some code level optimizations decrease maintainability.
Optimization will generally focus on improving just one or two
aspects of performance: execution time, memory usage, disk space,
bandwidth, power consumption or some other resource. This will usually
require a trade-off – where one factor is optimized at the expense of
others. For example, increasing the size of cache
improves run time performance, but also increases the memory
consumption. Other common trade-offs include code clarity and
conciseness.
There are instances where the programmer performing the
optimization must decide to make the software better for some operations
but at the cost of making other operations less efficient. These
trade-offs may sometimes be of a non-technical nature – such as when a
competitor has published a benchmark
result that must be beaten in order to improve commercial success but
comes perhaps with the burden of making normal usage of the software
less efficient. Such changes are sometimes jokingly referred to as pessimizations.
Bottlenecks
Optimization may include finding a bottleneck in a system – a component that is the limiting factor on performance. In terms of code, this will often be a hot spot –
a critical part of the code that is the primary consumer of the needed
resource – though it can be another factor, such as I/O latency or
network bandwidth.
In computer science, resource consumption often follows a form of power law distribution, and the Pareto principle can be applied to resource optimization by observing that 80% of the resources are typically used by 20% of the operations.
In software engineering, it is often a better approximation that 90% of
the execution time of a computer program is spent executing 10% of the
code (known as the 90/10 law in this context).
More complex algorithms and data structures perform well with
many items, while simple algorithms are more suitable for small amounts
of data — the setup, initialization time, and constant factors of the
more complex algorithm can outweigh the benefit, and thus a hybrid algorithm or adaptive algorithm
may be faster than any single algorithm. A performance profiler can be
used to narrow down decisions about which functionality fits which
conditions.
In some cases, adding more memory
can help to make a program run faster. For example, a filtering program
will commonly read each line and filter and output that line
immediately. This only uses enough memory for one line, but performance
is typically poor, due to the latency of each disk read. Caching the
result is similarly effective, though also requiring larger memory use.
When to optimize
Optimization can reduce readability and add code that is used only to improve the performance.
This may complicate programs or systems, making them harder to maintain
and debug. As a result, optimization or performance tuning is often
performed at the end of the development stage.
Donald Knuth made the following two statements on optimization:
"We should forget about small efficiencies, say about 97%
of the time: premature optimization is the root of all evil. Yet we
should not pass up our opportunities in that critical 3%"
(He also attributed the quote to Tony Hoare several years later, although this might have been an error as Hoare disclaims having coined the phrase).
"In established engineering disciplines a 12%
improvement, easily obtained, is never considered marginal and I believe
the same viewpoint should prevail in software engineering"
"Premature optimization" is a phrase used to describe a situation
where a programmer lets performance considerations affect the design of a
piece of code. This can result in a design that is not as clean as it
could have been or code that is incorrect, because the code is
complicated by the optimization and the programmer is distracted by
optimizing.
When deciding whether to optimize a specific part of the program, Amdahl's Law
should always be considered: the impact on the overall program depends
very much on how much time is actually spent in that specific part,
which is not always clear from looking at the code without a performance analysis.
A better approach is therefore to design first, code from the design and then profile/benchmark
the resulting code to see which parts should be optimized. A simple and
elegant design is often easier to optimize at this stage, and profiling
may reveal unexpected performance problems that would not have been
addressed by premature optimization.
In practice, it is often necessary to keep performance goals in
mind when first designing software, but the programmer balances the
goals of design and optimization.
Modern compilers and operating systems are so efficient that the
intended performance increases often fail to materialize. As an
example, caching data at the application level that is again cached at
the operating system level does not yield improvements in execution.
Even so, it is a rare case when the programmer will remove failed
optimizations from production code. It is also true that advances in
hardware will more often than not obviate any potential improvements,
yet the obscuring code will persist into the future long after its
purpose has been negated.
Macros
Optimization during code development using macros takes on different forms in different languages.
In some procedural languages, such as C and C++, macros are implemented using token substitution. Nowadays, inline functions can be used as a type safe
alternative in many cases. In both cases, the inlined function body can
then undergo further compile-time optimizations by the compiler,
including constant folding, which may move some computations to compile time.
In many functional programming
languages, macros are implemented using parse-time substitution of
parse trees/abstract syntax trees, which it is claimed makes them safer
to use. Since in many cases interpretation is used, that is one way to
ensure that such computations are only performed at parse-time, and
sometimes the only way.
Lisp originated this style of macro, and such macros are often called "Lisp-like macros". A similar effect can be achieved by using template metaprogramming in C++.
In both cases, work is moved to compile-time. The difference between C macros on one side, and Lisp-like macros and C++template metaprogramming
on the other side, is that the latter tools allow performing arbitrary
computations at compile-time/parse-time, while expansion of C macros does not perform any computation, and relies on the optimizer ability to perform it. Additionally, C macros do not directly support recursion or iteration, so are not Turing complete.
As with any optimization, however, it is often difficult to
predict where such tools will have the most impact before a project is
complete.
Optimization can be automated by compilers or performed by
programmers. Gains are usually limited for local optimization, and
larger for global optimizations. Usually, the most powerful optimization
is to find a superior algorithm.
Optimizing a whole system is usually undertaken by programmers
because it is too complex for automated optimizers. In this situation,
programmers or system administrators
explicitly change code so that the overall system performs better.
Although it can produce better efficiency, it is far more expensive than
automated optimizations. Since many parameters influence the program
performance, the program optimization space is large. Meta-heuristics
and machine learning are used to address the complexity of program
optimization.
Use a profiler (or performance analyzer) to find the sections of the program that are taking the most resources – the bottleneck. Programmers sometimes believe they have a clear idea of where the bottleneck is, but intuition is frequently wrong. Optimizing an unimportant piece of code will typically do little to help the overall performance.
When the bottleneck is localized, optimization usually starts
with a rethinking of the algorithm used in the program. More often than
not, a particular algorithm can be specifically tailored to a particular
problem, yielding better performance than a generic algorithm. For
example, the task of sorting a huge list of items is usually done with a
quicksort
routine, which is one of the most efficient generic algorithms. But if
some characteristic of the items is exploitable (for example, they are
already arranged in some particular order), a different method can be
used, or even a custom-made sort routine.
After the programmer is reasonably sure that the best algorithm
is selected, code optimization can start. Loops can be unrolled (for
lower loop overhead, although this can often lead to lower speed if it overloads the CPU cache), data types as small as possible can be used, integer arithmetic can be used instead of floating-point, and so on. (See algorithmic efficiency article for these and other techniques.)
Performance bottlenecks can be due to language limitations rather
than algorithms or data structures used in the program. Sometimes, a
critical part of the program can be re-written in a different programming language that gives more direct access to the underlying machine. For example, it is common for very high-level languages like Python to have modules written in C for greater speed. Programs already written in C can have modules written in assembly. Programs written in D can use the inline assembler.
Rewriting sections "pays off" in these circumstances because of a general "rule of thumb"
known as the 90/10 law, which states that 90% of the time is spent in
10% of the code, and only 10% of the time in the remaining 90% of the
code. So, putting intellectual effort into optimizing just a small part
of the program can have a huge effect on the overall speed – if the
correct part(s) can be located.
Manual optimization sometimes has the side effect of undermining
readability. Thus code optimizations should be carefully documented
(preferably using in-line comments), and their effect on future
development evaluated.
The program that performs an automated optimization is called an optimizer.
Most optimizers are embedded in compilers and operate during
compilation. Optimizers can often tailor the generated code to specific
processors.
Today, automated optimizations are almost exclusively limited to compiler optimization.
However, because compiler optimizations are usually limited to a fixed
set of rather general optimizations, there is considerable demand for
optimizers which can accept descriptions of problem and
language-specific optimizations, allowing an engineer to specify custom
optimizations. Tools that accept descriptions of optimizations are
called program transformation systems and are beginning to be applied to real software systems such as C++.
Grid computing or distributed computing aims to optimize the whole system, by moving tasks from computers with high usage to computers with idle time.
Time taken for optimization
Sometimes, the time taken to undertake optimization therein itself may be an issue.
Optimizing existing code usually does not add new features, and worse, it might add new bugs
in previously working code (as any change might). Because manually
optimized code might sometimes have less "readability" than unoptimized
code, optimization might impact maintainability of it as well.
Optimization comes at a price and it is important to be sure that the
investment is worthwhile.
An automatic optimizer (or optimizing compiler,
a program that performs code optimization) may itself have to be
optimized, either to further improve the efficiency of its target
programs or else speed up its own operation. A compilation performed
with optimization "turned on" usually takes longer, although this is
usually only a problem when programs are quite large.
In particular, for just-in-time compilers the performance of the run time compile component, executing together with its target code, is the key to improving overall execution speed.