Absolute dating is the process of determining an age on a specified chronology in archaeology and geology. Some scientists prefer the terms chronometric or calendar dating, as the use of the word "absolute" implies an unwarranted certainty of accuracy. Absolute dating provides a numerical age or range, in contrast with relative dating, which places events in order without any measure of the age between events.
In archaeology, absolute dating is usually based on the physical,
chemical, and life properties of the materials of artifacts, buildings,
or other items that have been modified by humans and by historical
associations with materials with known dates (such as coins and historical records).
For example, coins found in excavations may have their production date
written on them, or there may be written records describing the coin and
when it was used, allowing the site to be associated with a particular
calendar year. Absolute dating techniques include radiocarbon dating of wood or bones, potassium-argon dating, and trapped-charge dating methods such as thermoluminescence dating of glazed ceramics.
In historical geology, the primary methods of absolute dating involve using the radioactive decay of elements trapped in rocks or minerals, including isotope systems from younger organic remains (radiocarbon dating with 14 C) to systems such as uranium–lead dating that allow determination of absolute ages for some of the oldest rocks on Earth.
Radiometric dating is based on the known and constant rate of decay of radioactive isotopes into their radiogenic daughter isotopes.
Particular isotopes are suitable for different applications due to the
types of atoms present in the mineral or other material and its
approximate age. For example, techniques based on isotopes with
half-lives in the thousands of years, such as carbon-14, cannot be used
to date materials that have ages on the order of billions of years, as
the detectable amounts of the radioactive atoms and their decayed
daughter isotopes will be too small to measure within the uncertainty of
the instruments.
One of the most widely used and well-known absolute dating techniques is carbon-14 (or radiocarbon) dating, which is used to date organic remains. This is a radiometric technique since it is based on radioactive decay. Cosmic radiation
entering Earth's atmosphere produces carbon-14, and plants take in
carbon-14 as they fix carbon dioxide. Carbon-14 moves up the food chain
as animals eat plants and as predators eat other animals. With death,
the uptake of carbon-14 stops.
It takes 5,730 years for half the carbon-14 to decay to nitrogen;
this is the half-life of carbon-14. After another 5,730 years, only
one-quarter of the original carbon-14 will remain. After yet another
5,730 years, only one-eighth will be left.
By measuring the carbon-14 in organic material, scientists can determine the date of death of the organic matter in an artifact or ecofact.
Limitations
The
relatively short half-life of carbon-14, 5,730 years, makes dating
reliable only up to about 60,000 years. The technique often cannot
pinpoint the date of an archeological site better than historic records,
but is highly effective for precise dates when calibrated with other
dating techniques such as tree-ring dating.
An additional problem with carbon-14 dates from archeological
sites is known as the "old wood" problem. In dry, desert climates,
organic materials like dead trees can remain in their natural state for
hundreds of years. When people eventually use these materials as
firewood or building supplies, they become part of the archaeological
record. Thus, dating that particular tree does not necessarily indicate
when the fire burned or the structure was built.
For this reason, many archaeologists prefer to use samples from short-lived plants for radiocarbon dating. The development of accelerator mass spectrometry (AMS) dating, which allows a date to be obtained from a very small sample, has been very useful in this regard.
Other radiometric dating techniques are available for earlier periods. One of the most widely used is potassium–argon dating (K–Ar dating). Potassium-40
is a radioactive isotope of potassium that decays into argon-40. The
half-life of potassium-40 is 1.3 billion years, far longer than that of
carbon-14, allowing much older samples to be dated. Potassium is common
in rocks and minerals, allowing many samples of geochronological or archeological interest to be dated.
Argon, a noble gas, is not commonly incorporated into such samples except when produced in situ through radioactive decay. The date measured reveals the last time that the object was heated past the closure temperature at which the trapped argon can escape the lattice. K–Ar dating was used to calibrate the geomagnetic polarity time scale.
Thermoluminescence testing also dates items to the last time they were heated. This
technique is based on the principle that all objects absorb radiation
from the environment. This process frees electrons within minerals that
remain caught within the item.
Heating an item to 500 degrees Celsius or higher releases the trapped electrons, producing light. This light can be measured to determine the last time the item was heated.
Radiation levels do not remain constant over time. Fluctuating
levels can skew results – for example, if an item went through several
high radiation eras, thermoluminescence will return an older date for
the item. Many factors can spoil the sample before testing as well,
exposing the sample to heat or direct light may cause some of the
electrons to dissipate, causing the item to date younger.
Because of these and other factors, Thermoluminescence is at most
about 15% accurate. It cannot be used to accurately date a site on its
own. However, it can be used to confirm the antiquity of an item.
Optically stimulated luminescence (OSL)
Optically
stimulated luminescence (OSL) dating constrains the time at which
sediment was last exposed to light. During sediment transport, exposure
to sunlight 'zeros' the luminescence signal. Upon burial, the sediment
accumulates a luminescence signal as natural ambient radiation gradually
ionises the mineral grains.
Careful sampling under dark conditions allows the sediment to be
exposed to artificial light in the laboratory, which releases the OSL
signal. The amount of luminescence released is used to calculate the
equivalent dose (De) that the sediment has acquired since deposition,
which can be used in combination with the dose rate (Dr) to calculate
the age.
The growth rings of a tree at Bristol Zoo, England. Each ring represents one year; the outside rings, near the bark, are the youngest.
Dendrochronology, or tree-ring dating, is the scientific method of dating based on the analysis of patterns of tree rings, also known as growth rings. Dendrochronology can date the time at which tree rings were formed, in many types of wood, to the exact calendar year.
Dendrochronology has three main areas of application: paleoecology, where it is used to determine certain aspects of past ecologies (most prominently climate); archaeology, where it is used to date old buildings, etc.; and radiocarbon dating, where it is used to calibrate radiocarbon ages (see below).
In some areas of the world, it is possible to date wood back a
few thousand years, or even many thousands. Currently, the maximum for
fully anchored chronologies is a little over 11,000 years from present.
Amino acid dating is a dating technique used to estimate the age of a specimen in paleobiology, archaeology, forensic science, taphonomy, sedimentary geology and other fields. This technique relates changes in amino acid molecules to the time elapsed since they were formed. All biological tissues contain amino acids. All amino acids except glycine (the simplest one) are optically active, having an asymmetric carbon atom. This means that the amino acid can have two different configurations, "D" or "L" which are mirror images of each other.
With a few important exceptions, living organisms keep all their
amino acids in the "L" configuration. When an organism dies, control
over the configuration of the amino acids ceases, and the ratio of D to L
moves from a value near 0 towards an equilibrium value near 1, a
process called racemization. Thus, measuring the ratio of D to L in a sample enables one to estimate how long ago the specimen died.
A numerical method is an algorithm that approximates the solution to a mathematical problem (examples below include the solution to a linear system of equations, the value of an integral, the solution of a differential equation, the minimum of a multivariate function). In a probabilistic numerical algorithm, this process of approximation is thought of as a problem of estimation, inference or learning and realised in the framework of probabilistic inference (often, but not always, Bayesian inference).
Formally, this means casting the setup of the computational problem in terms of a prior distribution,
formulating the relationship between numbers computed by the computer
(e.g. matrix-vector multiplications in linear algebra, gradients in
optimization, values of the integrand or the vector field defining a
differential equation) and the quantity in question (the solution of the
linear problem, the minimum, the integral, the solution curve) in a likelihood function, and returning a posterior distribution
as the output. In most cases, numerical algorithms also take internal
adaptive decisions about which numbers to compute, which form an active learning problem.
Many of the most popular classic numerical algorithms can be
re-interpreted in the probabilistic framework. This includes the method
of conjugate gradients,Nordsieck methods, Gaussian quadrature rules, and quasi-Newton methods. In all these cases, the classic method is based on a regularized least-squares estimate that can be associated with the posterior mean arising from a Gaussian prior and likelihood. In such cases, the variance of the Gaussian posterior is then associated with a worst-case estimate for the squared error.
Probabilistic numerical methods promise several conceptual
advantages over classic, point-estimate based approximation techniques:
They return structured error estimates (in particular,
the ability to return joint posterior samples, i.e. multiple realistic
hypotheses for the true unknown solution of the problem)
Hierarchical Bayesian inference can be used to set and control
internal hyperparameters in such methods in a generic fashion, rather
than having to re-invent novel methods for each parameter
Since they use and allow for an explicit likelihood describing the
relationship between computed numbers and target quantity, probabilistic
numerical methods can use the results of even highly imprecise, biased
and stochastic computations. Conversely, probabilistic numerical methods can also provide a likelihood in computations often considered "likelihood-free" elsewhere
Because all probabilistic numerical methods use essentially the same
data type – probability measures – to quantify uncertainty over both inputs and outputs they can be chained together to propagate uncertainty across large-scale, composite computations
Sources from multiple sources of information (e.g. algebraic,
mechanistic knowledge about the form of a differential equation, and
observations of the trajectory of the system collected in the physical
world) can be combined naturally and inside the inner loop of the algorithm, removing otherwise necessary nested loops in computation, e.g. in inverse problems.
These advantages are essentially the equivalent of similar functional
advantages that Bayesian methods enjoy over point-estimates in machine
learning, applied or transferred to the computational domain.
Bayesian quadrature with a Gaussian process conditioned on
evaluations of the integrand (shown in black). Shaded areas in the left
column illustrate the marginal standard deviations. The right figure
shows the prior () and posterior () Gaussian distribution over the value of the integral, as well as the true solution.
In numerical integration, function evaluations at a number of points are used to estimate the integral of a function against some measure . Bayesian quadrature consists of specifying a prior distribution over and conditioning this prior on to obtain a posterior distribution over , then computing the implied posterior distribution on . The most common choice of prior is a Gaussian process
as this allows us to obtain a closed-form posterior distribution on the
integral which is a univariate Gaussian distribution. Bayesian
quadrature is particularly useful when the function is expensive to evaluate and the dimension of the data is small to moderate.
Bayesian
optimization of a function (black) with Gaussian processes (purple).
Three acquisition functions (blue) are shown at the bottom.
Probabilistic numerics have also been studied for mathematical optimization, which consist of finding the minimum or maximum of some objective function given (possibly noisy or indirect) evaluations of that function at a set of points.
Perhaps the most notable effort in this direction is Bayesian optimization, a general approach to optimization grounded in Bayesian inference.
Bayesian optimization algorithms operate by maintaining a probabilistic
belief about throughout the optimization procedure; this often takes the form of a Gaussian process
prior conditioned on observations. This belief then guides the
algorithm in obtaining observations that are likely to advance the
optimization process. Bayesian optimization policies are usually
realized by transforming the objective function posterior into an
inexpensive, differentiable acquisition function that is maximized to select each successive observation location. One prominent approach is to model optimization via Bayesian sequential experimental design, seeking to obtain a sequence of observations yielding the most optimization progress as evaluated by an appropriate utility function.
A welcome side effect from this approach is that uncertainty in the
objective function, as measured by the underlying probabilistic belief,
can guide an optimization policy in addressing the classic exploration vs. exploitation tradeoff.
In this setting, the optimization objective is often an empirical risk of the form defined by a dataset , and a loss that quantifies how well a predictive model parameterized by performs on predicting the target from its corresponding input .
Epistemic uncertainty arises when the dataset size is large and cannot be processed at once meaning that local quantities (given some ) such as the loss function itself or its gradient
cannot be computed in reasonable time.
Hence, generally mini-batching is used to construct estimators of these
quantities on a random subset of the data. Probabilistic numerical
methods model this uncertainty explicitly and allow for automated
decisions and parameter tuning.
Linear algebra
Probabilistic numerical methods for linear algebra have primarily focused on solving systems of linear equations of the form and the computation of determinants.
Illustration of a matrix-based probabilistic linear solver.
A large class of methods are iterative in nature and collect
information about the linear system to be solved via repeated
matrix-vector multiplication with the system matrix with different vectors .
Such methods can be roughly split into a solution- and a matrix-based perspective,depending on whether belief is expressed over the solution of the linear system or the (pseudo-)inverse of the matrix .
The belief update uses that the inferred object is linked to matrix multiplications or via and .
Methods typically assume a Gaussian distribution, due to its closedness
under linear observations of the problem. While conceptually different,
these two views are computationally equivalent and inherently connected
via the right-hand-side through .
Probabilistic numerical linear algebra routines have been successfully applied to scale Gaussian processes to large datasets. In particular, they enable exact
propagation of the approximation error to a combined Gaussian process
posterior, which quantifies the uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
Ordinary differential equations
Samples from the first component of the numerical solution of the Lorenz system obtained with a probabilistic numerical integrator.
Probabilistic numerical methods for ordinary differential equations,
have been developed for initial and boundary value problems. Many
different probabilistic numerical methods designed for ordinary
differential equations have been proposed, and these can broadly be
grouped into the two following categories:
Randomisation-based methods are defined through random
perturbations of standard deterministic numerical methods for ordinary
differential equations. For example, this has been achieved by adding
Gaussian perturbations on the solution of one-step integrators or by perturbing randomly their time-step. This defines a probability measure on the solution of the differential equation that can be sampled.
Gaussian process regression methods are based on posing the problem
of solving the differential equation at hand as a Gaussian process
regression problem, interpreting evaluations of the right-hand side as
data on the derivative. These techniques resemble to Bayesian cubature, but employ different and often non-linear observation models. In its infancy, this class of methods was based on naive Gaussian process regression. This was later improved (in terms of efficient computation) in favor of Gauss–Markov priors modeled by the stochastic differential equation, where is a -dimensional vector modeling the first derivatives of , and where is a -dimensional Brownian motion. Inference can thus be implemented efficiently with Kalman filtering based methods.
The boundary between these two categories is not sharp, indeed a
Gaussian process regression approach based on randomised data was
developed as well. These methods have been applied to problems in computational Riemannian geometry, inverse problems, latent force models, and to differential equations with a geometric structure such as symplecticity.
Partial differential equations
A number of probabilistic numerical methods have also been proposed for partial differential equations.
As with ordinary differential equations, the approaches can broadly be
divided into those based on randomisation, generally of some underlying
finite-element mesh and those based on Gaussian process regression.
Learning to solve a partial differential equation. A problem-specific Gaussian process prior
is conditioned on partially-known physics, given by uncertain boundary
conditions (BC) and a linear PDE, as well as on noisy physical
measurements from experiment. The boundary conditions and the right-hand
side of the PDE are not known but inferred from a small set of
noise-corrupted measurements. The plots juxtapose the belief with the true solution of the latent boundary value problem.
The origins of probabilistic numerics can be traced to a discussion of probabilistic approaches to polynomial interpolation by Henri Poincaré in his Calcul des Probabilités. In modern terminology, Poincaré considered a Gaussian prior distribution on a function , expressed as a formal power series with random coefficients, and asked for "probable values" of given this prior and observations for .
A later seminal contribution to the interplay of numerical
analysis and probability was provided by Albert Suldin in the context of
univariate quadrature. The statistical problem considered by Suldin was the approximation of the definite integral of a function , under a Brownian motion prior on , given access to pointwise evaluation of at nodes . Suldin showed that, for given quadrature nodes, the quadrature rule with minimal mean squared error is the trapezoidal rule;
furthermore, this minimal error is proportional to the sum of cubes of
the inter-node spacings. As a result, one can see the trapezoidal rule
with equally-spaced nodes as statistically optimal in some sense — an
early example of the average-case analysis of a numerical method.
Suldin's point of view was later extended by Mike Larkin. Note that Suldin's Brownian motion prior on the integrand is a Gaussian measure and that the operations of integration and of point wise evaluation of are both linear maps.
Thus, the definite integral is a real-valued Gaussian random variable.
In particular, after conditioning on the observed pointwise values of , it follows a normal distribution with mean equal to the trapezoidal rule and variance equal to .
This viewpoint is very close to that of Bayesian quadrature, seeing the output of a quadrature method not just as a point estimate but as a probability distribution in its own right.
As noted by Houman Owhadi and collaborators, interplays between numerical approximation and statistical inference can also be traced back to Palasti and Renyi, Sard, Kimeldorf and Wahba (on the correspondence between Bayesian estimation and spline smoothing/interpolation) and Larkin (on the correspondence between Gaussian process
regression and numerical approximation). Although the approach of
modelling a perfectly known function as a sample from a random process
may seem counterintuitive, a natural framework for understanding it can
be found in information-based complexity (IBC), the branch of computational complexity founded on the observation that
numerical implementation requires computation with partial information
and limited resources. In IBC, the performance of an algorithm operating
on incomplete information can be analyzed in the worst-case or the
average-case (randomized) setting with respect to the missing
information. Moreover, as Packel observed, the average case setting could be interpreted as a mixed strategy
in an adversarial game obtained by lifting a (worst-case) minmax
problem to a minmax problem over mixed (randomized) strategies. This
observation leads to a natural connection between numerical approximation and Wald'sdecision theory, evidently influenced by von Neumann'stheory of games. To describe this connection consider the optimal recovery setting of Micchelli and Rivlin in which one tries to approximate an unknown function from a finite
number of linear measurements on that function. Interpreting this
optimal recovery problem as a zero-sum game where Player I selects the
unknown function and Player II selects its approximation, and using
relative errors in a quadratic norm to define losses, Gaussian priors
emerge as optimal mixed strategies for such games, and the covariance
operator of the optimal Gaussian prior is determined by the quadratic
norm used to define the relative error of the recovery.