Search This Blog

Saturday, December 30, 2023

Inverse problem

From Wikipedia, the free encyclopedia

An inverse problem in science is the process of calculating from a set of observations the causal factors that produced them: for example, calculating an image in X-ray computed tomography, source reconstruction in acoustics, or calculating the density of the Earth from measurements of its gravity field. It is called an inverse problem because it starts with the effects and then calculates the causes. It is the inverse of a forward problem, which starts with the causes and then calculates the effects.

Inverse problems are some of the most important mathematical problems in science and mathematics because they tell us about parameters that we cannot directly observe. They have wide application in system identification, optics, radar, acoustics, communication theory, signal processing, medical imaging, computer vision, geophysics, oceanography, astronomy, remote sensing, natural language processing, machine learning, nondestructive testing, slope stability analysis and many other fields.

History

Starting with the effects to discover the causes has concerned physicists for centuries. A historical example is the calculations of Adams and Le Verrier which led to the discovery of Neptune from the perturbed trajectory of Uranus. However, a formal study of inverse problems was not initiated until the 20th century.

One of the earliest examples of a solution to an inverse problem was discovered by Hermann Weyl and published in 1911, describing the asymptotic behavior of eigenvalues of the Laplace–Beltrami operator. Today known as Weyl's law, it is perhaps most easily understood as an answer to the question of whether it is possible to hear the shape of a drum. Weyl conjectured that the eigenfrequencies of a drum would be related to the area and perimeter of the drum by a particular equation, a result improved upon by later mathematicians.

The field of inverse problems was later touched on by Soviet-Armenian physicist, Viktor Ambartsumian.

While still a student, Ambartsumian thoroughly studied the theory of atomic structure, the formation of energy levels, and the Schrödinger equation and its properties, and when he mastered the theory of eigenvalues of differential equations, he pointed out the apparent analogy between discrete energy levels and the eigenvalues of differential equations. He then asked: given a family of eigenvalues, is it possible to find the form of the equations whose eigenvalues they are? Essentially Ambartsumian was examining the inverse Sturm–Liouville problem, which dealt with determining the equations of a vibrating string. This paper was published in 1929 in the German physics journal Zeitschrift für Physik and remained in obscurity for a rather long time. Describing this situation after many decades, Ambartsumian said, "If an astronomer publishes an article with a mathematical content in a physics journal, then the most likely thing that will happen to it is oblivion."

Nonetheless, toward the end of the Second World War, this article, written by the 20-year-old Ambartsumian, was found by Swedish mathematicians and formed the starting point for a whole area of research on inverse problems, becoming the foundation of an entire discipline.

Then important efforts have been devoted to a "direct solution" of the inverse scattering problem especially by Gelfand and Levitan in the Soviet Union. They proposed an analytic constructive method for determining the solution. When computers became available, some authors have investigated the possibility of applying their approach to similar problems such as the inverse problem in the 1D wave equation. But it rapidly turned out that the inversion is an unstable process: noise and errors can be tremendously amplified making a direct solution hardly practicable. Then, around the seventies, the least-squares and probabilistic approaches came in and turned out to be very helpful for the determination of parameters involved in various physical systems. This approach met a lot of success. Nowadays inverse problems are also investigated in fields outside physics, such as chemistry, economics, and computer science. Eventually, as numerical models become prevalent in many parts of society, we may expect an inverse problem associated with each of these numerical models.

Conceptual understanding

Since Newton, scientists have extensively attempted to model the world. In particular, when a mathematical model is available (for instance, Newton's gravitational law or Coulomb's equation for electrostatics), we can foresee, given some parameters that describe a physical system (such as a distribution of mass or a distribution of electric charges), the behavior of the system. This approach is known as mathematical modeling and the above-mentioned physical parameters are called the model parameters or simply the model. To be precise, we introduce the notion of state of the physical system: it is the solution of the mathematical model's equation. In optimal control theory, these equations are referred to as the state equations. In many situations we are not truly interested in knowing the physical state but just its effects on some objects (for instance, the effects the gravitational field has on a specific planet). Hence we have to introduce another operator, called the observation operator, which converts the state of the physical system (here the predicted gravitational field) into what we want to observe (here the movements of the considered planet). We can now introduce the so-called forward problem, which consists of two steps:

  • determination of the state of the system from the physical parameters that describe it
  • application of the observation operator to the estimated state of the system so as to predict the behavior of what we want to observe.

This leads to introduce another operator (F stands for "forward") which maps model parameters into , the data that model predicts that is the result of this two-step procedure. Operator is called forward operator or forward map. In this approach we basically attempt at predicting the effects knowing the causes.

The table below shows, the Earth being considered as the physical system and for different physical phenomena, the model parameters that describe the system, the physical quantity that describes the state of the physical system and observations commonly made on the state of the system.

Governing equations Model parameters (input of the model) State of the physical system Common observations on the system
Newton's law of gravity Distribution of mass Gravitational field Measurement made by gravimeters at different surface locations
Maxwell's equations Distribution of magnetic susceptibility Magnetic field Magnetic field measured at different surface locations by magnetometers (case of a steady state)
Wave equation Distribution of wave-speeds and densities Wave-field caused by artificial or natural seismic sources Particle velocity measured by seismometers placed at different surface locations
Diffusion equation Distribution of Diffusion coefficient Diffusing material concentration as a function of space and time Monitoring of this concentration measured at different locations

In the inverse problem approach we, roughly speaking, try to know the causes given the effects.

General statement of the inverse problem

The inverse problem is the "inverse" of the forward problem: instead of determining the data produced by particular model parameters, we want to determine the model parameters that produce the data that is the observation we have recorded (the subscript obs stands for observed). Our goal, in other words, is to determine the model parameters such that (at least approximately)

where is the forward map. We denote by the (possibly infinite) number of model parameters, and by the number of recorded data. We introduce some useful concepts and the associated notations that will be used below:

  • The space of models denoted by : the vector space spanned by model parameters; it has dimensions;
  • The space of data denoted by : if we organize the measured samples in a vector with components (if our measurements consist of functions, is a vector space with infinite dimensions);
  • : the response of model ; it consists of the data predicted by model ;
  • : the image of by the forward map, it is a subset of (but not a subspace unless is linear) made of responses of all models;
  • : the data misfits (or residuals) associated with model : they can be arranged as a vector, an element of .

The concept of residuals is very important: in the scope of finding a model that matches the data, their analysis reveals if the considered model can be considered as realistic or not. Systematic unrealistic discrepancies between the data and the model responses also reveals that the forward map is inadequate and may give insights about an improved forward map.

When operator is linear, the inverse problem is linear. Otherwise, that is most often, the inverse problem is nonlinear. Also, models cannot always be described by a finite number of parameters. It is the case when we look for distributed parameters (a distribution of wave-speeds for instance): in such cases the goal of the inverse problem is to retrieve one or several functions. Such inverse problems are inverse problems with infinite dimension.

Linear inverse problems

In the case of a linear forward map and when we deal with a finite number of model parameters, the forward map can be written as a linear system

where is the matrix that characterizes the forward map.

An elementary example: Earth's gravitational field

Only a few physical systems are actually linear with respect to the model parameters. One such system from geophysics is that of the Earth's gravitational field. The Earth's gravitational field is determined by the density distribution of the Earth in the subsurface. Because the lithology of the Earth changes quite significantly, we are able to observe minute differences in the Earth's gravitational field on the surface of the Earth. From our understanding of gravity (Newton's Law of Gravitation), we know that the mathematical expression for gravity is:

here is a measure of the local gravitational acceleration, is the universal gravitational constant, is the local mass (which is related to density) of the rock in the subsurface and is the distance from the mass to the observation point.

By discretizing the above expression, we are able to relate the discrete data observations on the surface of the Earth to the discrete model parameters (density) in the subsurface that we wish to know more about. For example, consider the case where we have measurements carried out at 5 locations on the surface of the Earth. In this case, our data vector, is a column vector of dimension (5×1): its -th component is associated with the -th observation location. We also know that we only have five unknown masses in the subsurface (unrealistic but used to demonstrate the concept) with known location: we denote by the distance between the -th observation location and the -th mass. Thus, we can construct the linear system relating the five unknown masses to the five data points as follows:

To solve for the model parameters that fit our data, we might be able to invert the matrix to directly convert the measurements into our model parameters. For example:

A system with five equations and five unknowns is a very specific situation: our example was designed to end up with this specificity. In general, the numbers of data and unknowns are different so that matrix is not square.

However, even a square matrix can have no inverse: matrix can be rank deficient (i.e. has zero eigenvalues) and the solution of the system is not unique. Then the solution of the inverse problem will be undetermined. This is a first difficulty. Over-determined systems (more equations than unknowns) have other issues. Also noise may corrupt our observations making possibly outside the space of possible responses to model parameters so that solution of the system may not exist. This is another difficulty.

Tools to overcome the first difficulty

The first difficulty reflects a crucial problem: Our observations do not contain enough information and additional data are required. Additional data can come from physical prior information on the parameter values, on their spatial distribution or, more generally, on their mutual dependence. It can also come from other experiments: For instance, we may think of integrating data recorded by gravimeters and seismographs for a better estimation of densities. The integration of this additional information is basically a problem of statistics. This discipline is the one that can answer the question: How to mix quantities of different nature? We will be more precise in the section "Bayesian approach" below.

Concerning distributed parameters, prior information about their spatial distribution often consists of information about some derivatives of these distributed parameters. Also, it is common practice, although somewhat artificial, to look for the "simplest" model that reasonably matches the data. This is usually achieved by penalizing the norm of the gradient (or the total variation) of the parameters (this approach is also referred to as the maximization of the entropy). One can also make the model simple through a parametrization that introduces freedom degrees only when necessary.

Additional information may also be integrated through inequality constraints on the model parameters or some functions of them. Such constraints are important to avoid unrealistic values for the parameters (negative values for instance). In this case, the space spanned by model parameters will no longer be a vector space but a subset of admissible models denoted by in the sequel.

Tools to overcome the second difficulty

As mentioned above, noise may be such that our measurements are not the image of any model, so that we cannot look for a model that produces the data but rather look for the best (or optimal) model: that is, the one that best matches the data. This leads us to minimize an objective function, namely a functional that quantifies how big the residuals are or how far the predicted data are from the observed data. Of course, when we have perfect data (i.e. no noise) then the recovered model should fit the observed data perfectly. A standard objective function, , is of the form: where is the Euclidean norm (it will be the norm when the measurements are functions instead of samples) of the residuals. This approach amounts to making use of ordinary least squares, an approach widely used in statistics. However, the Euclidean norm is known to be very sensitive to outliers: to avoid this difficulty we may think of using other distances, for instance the norm, in replacement of the norm.

Bayesian approach

Very similar to the least-squares approach is the probabilistic approach: If we know the statistics of the noise that contaminates the data, we can think of seeking the most likely model m, which is the model that matches the maximum likelihood criterion. If the noise is Gaussian, the maximum likelihood criterion appears as a least-squares criterion, the Euclidean scalar product in data space being replaced by a scalar product involving the co-variance of the noise. Also, should prior information on model parameters be available, we could think of using Bayesian inference to formulate the solution of the inverse problem. This approach is described in detail in Tarantola's book.

Numerical solution of our elementary example

Here we make use of the Euclidean norm to quantify the data misfits. As we deal with a linear inverse problem, the objective function is quadratic. For its minimization, it is classical to compute its gradient using the same rationale (as we would to minimize a function of only one variable). At the optimal model , this gradient vanishes which can be written as:

where FT denotes the matrix transpose of F. This equation simplifies to:

This expression is known as the normal equation and gives us a possible solution to the inverse problem. In our example matrix turns out to be generally full rank so that the equation above makes sense and determines uniquely the model parameters: we do not need integrating additional information for ending up with a unique solution.

Mathematical and computational aspects

Inverse problems are typically ill-posed, as opposed to the well-posed problems usually met in mathematical modeling. Of the three conditions for a well-posed problem suggested by Jacques Hadamard (existence, uniqueness, and stability of the solution or solutions) the condition of stability is most often violated. In the sense of functional analysis, the inverse problem is represented by a mapping between metric spaces. While inverse problems are often formulated in infinite dimensional spaces, limitations to a finite number of measurements, and the practical consideration of recovering only a finite number of unknown parameters, may lead to the problems being recast in discrete form. In this case the inverse problem will typically be ill-conditioned. In these cases, regularization may be used to introduce mild assumptions on the solution and prevent overfitting. Many instances of regularized inverse problems can be interpreted as special cases of Bayesian inference.

Numerical solution of the optimization problem

Some inverse problems have a very simple solution, for instance, when one has a set of unisolvent functions, meaning a set of functions such that evaluating them at distinct points yields a set of linearly independent vectors. This means that given a linear combination of these functions, the coefficients can be computed by arranging the vectors as the columns of a matrix and then inverting this matrix. The simplest example of unisolvent functions is polynomials constructed, using the unisolvence theorem, so as to be unisolvent. Concretely, this is done by inverting the Vandermonde matrix. But this a very specific situation.

In general, the solution of an inverse problem requires sophisticated optimization algorithms. When the model is described by a large number of parameters (the number of unknowns involved in some diffraction tomography applications can reach one billion), solving the linear system associated with the normal equations can be cumbersome. The numerical method to be used for solving the optimization problem depends in particular on the cost required for computing the solution of the forward problem. Once chosen the appropriate algorithm for solving the forward problem (a straightforward matrix-vector multiplication may be not adequate when matrix is huge), the appropriate algorithm for carrying out the minimization can be found in textbooks dealing with numerical methods for the solution of linear systems and for the minimization of quadratic functions (see for instance Ciarlet or Nocedal).

Also, the user may wish to add physical constraints to the models: In this case, they have to be familiar with constrained optimization methods, a subject in itself. In all cases, computing the gradient of the objective function often is a key element for the solution of the optimization problem. As mentioned above, information about the spatial distribution of a distributed parameter can be introduced through the parametrization. One can also think of adapting this parametrization during the optimization.

Should the objective function be based on a norm other than the Euclidean norm, we have to leave the area of quadratic optimization. As a result, the optimization problem becomes more difficult. In particular, when the norm is used for quantifying the data misfit the objective function is no longer differentiable: its gradient does not make sense any longer. Dedicated methods (see for instance Lemaréchal) from non differentiable optimization come in.

Once the optimal model is computed we have to address the question: "Can we trust this model?" The question can be formulated as follows: How large is the set of models that match the data "nearly as well" as this model? In the case of quadratic objective functions, this set is contained in a hyper-ellipsoid, a subset of ( is the number of unknowns), whose size depends on what we mean with "nearly as well", that is on the noise level. The direction of the largest axis of this ellipsoid (eigenvector associated with the smallest eigenvalue of matrix ) is the direction of poorly determined components: if we follow this direction, we can bring a strong perturbation to the model without changing significantly the value of the objective function and thus end up with a significantly different quasi-optimal model. We clearly see that the answer to the question "can we trust this model" is governed by the noise level and by the eigenvalues of the Hessian of the objective function or equivalently, in the case where no regularization has been integrated, by the singular values of matrix . Of course, the use of regularization (or other kinds of prior information) reduces the size of the set of almost optimal solutions and, in turn, increases the confidence we can put in the computed solution.

Stability, regularization and model discretization in infinite dimension

We focus here on the recovery of a distributed parameter. When looking for distributed parameters we have to discretize these unknown functions. Doing so, we reduce the dimension of the problem to something finite. But now, the question is: is there any link between the solution we compute and the one of the initial problem? Then another question: what do we mean with the solution of the initial problem? Since a finite number of data does not allow the determination of an infinity of unknowns, the original data misfit functional has to be regularized to ensure the uniqueness of the solution. Many times, reducing the unknowns to a finite-dimensional space will provide an adequate regularization: the computed solution will look like a discrete version of the solution we were looking for. For example, a naive discretization will often work for solving the deconvolution problem: it will work as long as we do not allow missing frequencies to show up in the numerical solution. But many times, regularization has to be integrated explicitly in the objective function.

In order to understand what may happen, we have to keep in mind that solving such a linear inverse problem amount to solving a Fredholm integral equation of the first kind:

where is the kernel, and are vectors of , and is a domain in . This holds for a 2D application. For a 3D application, we consider . Note that here the model parameters consist of a function and that the response of a model also consists of a function denoted by . This equation is an extension to infinite dimension of the matrix equation given in the case of discrete problems.

For sufficiently smooth the operator defined above is compact on reasonable Banach spaces such as the . F. Riesz theory states that the set of singular values of such an operator contains zero (hence the existence of a null-space), is finite or at most countable, and, in the latter case, they constitute a sequence that goes to zero. In the case of a symmetric kernel, we have an infinity of eigenvalues and the associated eigenvectors constitute a hilbertian basis of . Thus any solution of this equation is determined up to an additive function in the null-space and, in the case of infinity of singular values, the solution (which involves the reciprocal of arbitrary small eigenvalues) is unstable: two ingredients that make the solution of this integral equation a typical ill-posed problem! However, we can define a solution through the pseudo-inverse of the forward map (again up to an arbitrary additive function). When the forward map is compact, the classical Tikhonov regularization will work if we use it for integrating prior information stating that the norm of the solution should be as small as possible: this will make the inverse problem well-posed. Yet, as in the finite dimension case, we have to question the confidence we can put in the computed solution. Again, basically, the information lies in the eigenvalues of the Hessian operator. Should subspaces containing eigenvectors associated with small eigenvalues be explored for computing the solution, then the solution can hardly be trusted: some of its components will be poorly determined. The smallest eigenvalue is equal to the weight introduced in Tikhonov regularization.

Irregular kernels may yield a forward map which is not compact and even unbounded if we naively equip the space of models with the norm. In such cases, the Hessian is not a bounded operator and the notion of eigenvalue does not make sense any longer. A mathematical analysis is required to make it a bounded operator and design a well-posed problem: an illustration can be found in. Again, we have to question the confidence we can put in the computed solution and we have to generalize the notion of eigenvalue to get the answer.

Analysis of the spectrum of the Hessian operator is thus a key element to determine how reliable the computed solution is. However, such an analysis is usually a very heavy task. This has led several authors to investigate alternative approaches in the case where we are not interested in all the components of the unknown function but only in sub-unknowns that are the images of the unknown function by a linear operator. These approaches are referred to as the " Backus and Gilbert method", Lions's sentinels approach, and the SOLA method: these approaches turned out to be strongly related with one another as explained in Chavent. Finally, the concept of limited resolution, often invoked by physicists, is nothing but a specific view of the fact that some poorly determined components may corrupt the solution. But, generally speaking, these poorly determined components of the model are not necessarily associated with high frequencies.

Some classical linear inverse problems for the recovery of distributed parameters

The problems mentioned below correspond to different versions of the Fredholm integral: each of these is associated with a specific kernel .

Deconvolution

The goal of deconvolution is to reconstruct the original image or signal which appears as noisy and blurred on the data . From a mathematical point of view, the kernel here only depends on the difference between and .

Tomographic methods

In these methods we attempt at recovering a distributed parameter, the observation consisting in the measurement of the integrals of this parameter carried out along a family of lines. We denote by the line in this family associated with measurement point . The observation at can thus be written as:

where is the arc-length along and a known weighting function. Comparing this equation with the Fredholm integral above, we notice that the kernel is kind of a delta function that peaks on line . With such a kernel, the forward map is not compact.

Computed tomography

In X-ray computed tomography the lines on which the parameter is integrated are straight lines: the tomographic reconstruction of the parameter distribution is based on the inversion of the Radon transform. Although from a theoretical point of view many linear inverse problems are well understood, problems involving the Radon transform and its generalisations still present many theoretical challenges with questions of sufficiency of data still unresolved. Such problems include incomplete data for the x-ray transform in three dimensions and problems involving the generalisation of the x-ray transform to tensor fields. Solutions explored include Algebraic Reconstruction Technique, filtered backprojection, and as computing power has increased, iterative reconstruction methods such as iterative Sparse Asymptotic Minimum Variance.

Diffraction tomography

Diffraction tomography is a classical linear inverse problem in exploration seismology: the amplitude recorded at one time for a given source-receiver pair is the sum of contributions arising from points such that the sum of the distances, measured in traveltimes, from the source and the receiver, respectively, is equal to the corresponding recording time. In 3D the parameter is not integrated along lines but over surfaces. Should the propagation velocity be constant, such points are distributed on an ellipsoid. The inverse problems consists in retrieving the distribution of diffracting points from the seismograms recorded along the survey, the velocity distribution being known. A direct solution has been originally proposed by Beylkin and Lambaré et al.: these works were the starting points of approaches known as amplitude preserved migration (see Beylkin and Bleistein). Should geometrical optics techniques (i.e. rays) be used for the solving the wave equation, these methods turn out to be closely related to the so-called least-squares migration methods derived from the least-squares approach (see Lailly, Tarantola).

Doppler tomography (astrophysics)

If we consider a rotating stellar object, the spectral lines we can observe on a spectral profile will be shifted due to Doppler effect. Doppler tomography aims at converting the information contained in spectral monitoring of the object into a 2D image of the emission (as a function of the radial velocity and of the phase in the periodic rotation movement) of the stellar atmosphere. As explained by Tom Marsh this linear inverse problem is tomography like: we have to recover a distributed parameter which has been integrated along lines to produce its effects in the recordings.

Inverse heat conduction

Early publications on inverse heat conduction arose from determining surface heat flux during atmospheric re-entry from buried temperature sensors. Other applications where surface heat flux is needed but surface sensors are not practical include: inside reciprocating engines, inside rocket engines; and, testing of nuclear reactor components. A variety of numerical techniques have been developed to address the ill-posedness and sensitivity to measurement error caused by damping and lagging in the temperature signal.

Non-linear inverse problems

Non-linear inverse problems constitute an inherently more difficult family of inverse problems. Here the forward map is a non-linear operator. Modeling of physical phenomena often relies on the solution of a partial differential equation (see table above except for gravity law): although these partial differential equations are often linear, the physical parameters that appear in these equations depend in a non-linear way of the state of the system and therefore on the observations we make on it.

Some classical non-linear inverse problems

Inverse scattering problems

Whereas linear inverse problems were completely solved from the theoretical point of view at the end of the nineteenth century, only one class of nonlinear inverse problems was so before 1970, that of inverse spectral and (one space dimension) inverse scattering problems, after the seminal work of the Russian mathematical school (Krein, Gelfand, Levitan, Marchenko). A large review of the results has been given by Chadan and Sabatier in their book "Inverse Problems of Quantum Scattering Theory" (two editions in English, one in Russian).

In this kind of problem, data are properties of the spectrum of a linear operator which describe the scattering. The spectrum is made of eigenvalues and eigenfunctions, forming together the "discrete spectrum", and generalizations, called the continuous spectrum. The very remarkable physical point is that scattering experiments give information only on the continuous spectrum, and that knowing its full spectrum is both necessary and sufficient in recovering the scattering operator. Hence we have invisible parameters, much more interesting than the null space which has a similar property in linear inverse problems. In addition, there are physical motions in which the spectrum of such an operator is conserved as a consequence of such motion. This phenomenon is governed by special nonlinear partial differential evolution equations, for example the Korteweg–de Vries equation. If the spectrum of the operator is reduced to one single eigenvalue, its corresponding motion is that of a single bump that propagates at constant velocity and without deformation, a solitary wave called a "soliton".

A perfect signal and its generalizations for the Korteweg–de Vries equation or other integrable nonlinear partial differential equations are of great interest, with many possible applications. This area has been studied as a branch of mathematical physics since the 1970s. Nonlinear inverse problems are also currently studied in many fields of applied science (acoustics, mechanics, quantum mechanics, electromagnetic scattering - in particular radar soundings, seismic soundings, and nearly all imaging modalities).

A final example related to the Riemann hypothesis was given by Wu and Sprung, the idea is that in the semiclassical old quantum theory the inverse of the potential inside the Hamiltonian is proportional to the half-derivative of the eigenvalues (energies) counting function n(x).

Permeability matching in oil and gas reservoirs

The goal is to recover the diffusion coefficient in the parabolic partial differential equation that models single phase fluid flows in porous media. This problem has been the object of many studies since a pioneering work carried out in the early seventies. Concerning two-phase flows an important problem is to estimate the relative permeabilities and the capillary pressures.

Inverse problems in the wave equations

The goal is to recover the wave-speeds (P and S waves) and the density distributions from seismograms. Such inverse problems are of prime interest in seismology and exploration geophysics. We can basically consider two mathematical models:

These basic hyperbolic equations can be upgraded by incorporating attenuation, anisotropy, ...

The solution of the inverse problem in the 1D wave equation has been the object of many studies. It is one of the very few non-linear inverse problems for which we can prove the uniqueness of the solution. The analysis of the stability of the solution was another challenge. Practical applications, using the least-squares approach, were developed. Extension to 2D or 3D problems and to the elastodynamics equations was attempted since the 80's but turned out to be very difficult ! This problem often referred to as Full Waveform Inversion (FWI), is not yet completely solved: among the main difficulties are the existence of non-Gaussian noise into the seismograms, cycle-skipping issues (also known as phase ambiguity), and the chaotic behavior of the data misfit function. Some authors have investigated the possibility of reformulating the inverse problem so as to make the objective function less chaotic than the data misfit function.

Travel-time tomography

Realizing how difficult is the inverse problem in the wave equation, seismologists investigated a simplified approach making use of geometrical optics. In particular they aimed at inverting for the propagation velocity distribution, knowing the arrival times of wave-fronts observed on seismograms. These wave-fronts can be associated with direct arrivals or with reflections associated with reflectors whose geometry is to be determined, jointly with the velocity distribution.

The arrival time distribution ( is a point in physical space) of a wave-front issued from a point source, satisfies the Eikonal equation:

where denotes the slowness (reciprocal of the velocity) distribution. The presence of makes this equation nonlinear. It is classically solved by shooting rays (trajectories about which the arrival time is stationary) from the point source.

This problem is tomography like: the measured arrival times are the integral along the ray-path of the slowness. But this tomography like problem is nonlinear, mainly because the unknown ray-path geometry depends upon the velocity (or slowness) distribution. In spite of its nonlinear character, travel-time tomography turned out to be very effective for determining the propagation velocity in the Earth or in the subsurface, the latter aspect being a key element for seismic imaging, in particular using methods mentioned in Section "Diffraction tomography".

Mathematical aspects: Hadamard's questions

The questions concern well-posedness: Does the least-squares problem have a unique solution which depends continuously on the data (stability problem)? It is the first question, but it is also a difficult one because of the non-linearity of . In order to see where the difficulties arise from, Chavent proposed to conceptually split the minimization of the data misfit function into two consecutive steps ( is the subset of admissible models):

  • projection step: given find a projection on (nearest point on according to the distance involved in the definition of the objective function)
  • given this projection find one pre-image that is a model whose image by operator is this projection.

Difficulties can - and usually will - arise in both steps:

  1. operator is not likely to be one-to-one, therefore there can be more than one pre-image,
  2. even when is one-to-one, its inverse may not be continuous over ,
  3. the projection on may not exist, should this set be not closed,
  4. the projection on can be non-unique and not continuous as this can be non-convex due to the non-linearity of .

We refer to Chavent for a mathematical analysis of these points.

Computational aspects

A non-convex data misfit function

The forward map being nonlinear, the data misfit function is likely to be non-convex, making local minimization techniques inefficient. Several approaches have been investigated to overcome this difficulty:

  • use of global optimization techniques such as sampling of the posterior density function and Metropolis algorithm in the inverse problem probabilistic framework, genetic algorithms (alone or in combination with Metropolis algorithm: see for an application to the determination of permeabilities that match the existing permeability data), neural networks, regularization techniques including multi scale analysis;
  • reformulation of the least-squares objective function so as to make it smoother.

Computation of the gradient of the objective function

Inverse problems, especially in infinite dimension, may be large size, thus requiring important computing time. When the forward map is nonlinear, the computational difficulties increase and minimizing the objective function can be difficult. Contrary to the linear situation, an explicit use of the Hessian matrix for solving the normal equations does not make sense here: the Hessian matrix varies with models. Much more effective is the evaluation of the gradient of the objective function for some models. Important computational effort can be saved when we can avoid the very heavy computation of the Jacobian (often called "Fréchet derivatives"): the adjoint state method, proposed by Chavent and Lions, is aimed to avoid this very heavy computation. It is now very widely used.

Applications

Inverse problem theory is used extensively in weather predictions, oceanography, hydrology, and petroleum engineering.

Inverse problems are also found in the field of heat transfer, where a surface heat flux is estimated outgoing from temperature data measured inside a rigid body; and, in understanding the controls on plant-matter decay. The linear inverse problem is also the fundamental of spectral estimation and direction-of-arrival (DOA) estimation in signal processing.

Inverse lithography is used in photomask design for semiconductor device fabrication.

Inner core super-rotation

From Wikipedia, the free encyclopedia
Cutaway of the Earth showing the inner core (white) and outer core (yellow)

Inner core super-rotation is the eastward rotation of the inner core of Earth relative to its mantle, for a net rotation rate that is usually faster than Earth as a whole. A 1995 model of Earth's dynamo predicted super-rotations of up to 3 degrees per year; the following year, this prediction was supported by observed discrepancies in the time that p-waves take to travel through the inner and outer core.

Seismic observations have made use of a direction dependence (anisotropy) of the speed of seismic waves in the inner core, as well as spatial variations in the speed. Other estimates come from free oscillations of Earth. The results are inconsistent and the existence of a super-rotation is still controversial, but it is probably less than 0.1 degrees per year.

When geodynamo models take into account gravitational coupling between the inner core and mantle, it lowers the predicted super-rotation to as little as 1 degree per million years. For the inner core to rotate despite gravitational coupling, it must be able to change shape, which places constraints on its viscosity.

A 2023 study reported that the spin of the Earth's inner core has stopped spinning faster than the planet's surface around 2009 and likely is now rotating slower than it.

Background

At the center of Earth is the core, a ball with a mean radius of 3480 kilometres that is composed mostly of iron. The outer core is liquid while the inner core, with a radius of 1220 km, is solid. Because the outer core has a low viscosity, it could be rotating at a different rate from the mantle and crust. This possibility was first proposed in 1975 to explain a phenomenon of Earth's magnetic field called westward drift: some parts of the field rotate about 0.2 degrees per year westward relative to Earth's surface. In 1981, David Gubbins of Leeds University predicted that a differential rotation of the inner and outer core could generate a large toroidal magnetic field near the shared boundary, accelerating the inner core to the rate of westward drift. This would be in opposition to the Earth's rotation, which is eastwards, so the overall rotation would be slower.

In 1995, Gary Glatzmeier at Los Alamos and Paul Roberts at UCLA published the first "self-consistent" three-dimensional model of the dynamo in the core. The model predicted that the inner core rotates 3 degrees per year faster than the mantle, a phenomenon that became known as super-rotation. 1996, Xiaodong Song and Paul G. Richards, scientists at the Lamont–Doherty Earth Observatory, presented seismic evidence for a super-rotation of 0.4 to 1.8 degrees per year, while another study estimated the super-rotation to be 3 degrees per year.

Seismic observations

Schematic of PKP(BC) and PKP(DF) waves
Location of the South Sandwich Islands, which are nearly antipodal to Alaska.

The main observational constraints on inner core rotation come from seismology. When an earthquake occurs, two kinds of seismic wave travel down through the Earth: those with ground motion in the direction the wave propagates (p-waves) and those with transverse motion (s-waves). S-waves do not travel through the outer core because they involve shear stress, a type of deformation that cannot occur in a liquid. In seismic notation, a p-wave is represented by the letter P when traveling through the crust and mantle and by the letter K when traveling through the outer core. A wave that travels through the mantle, core and mantle again before reaching the surface is represented by PKP. For geometric reasons, two branches of PKP are distinguished: PKP(AB) through the upper part of the outer core, and PKP(BC) through the lower part. A wave passing through the inner core is referred to as PKP(DF). (Alternate names for these phases are PKP1, PKP2 and PKIKP.) Seismic waves can travel multiple paths from an earthquake to a given sensor.

PKP(BC) and PKP(DF) waves have similar paths in the mantle, so any difference in the overall travel time is mainly due to the difference in wave speeds between the outer and inner core. Song and Richards looked at how this difference changed over time. Waves traveling from south to north (emitted by earthquakes in the South Sandwich Islands and received at Fairbanks, Alaska) had a differential that changed by 0.4 seconds between 1967 and 1995. By contrast, waves traveling near the equatorial plane (e.g., between Tonga and Germany) showed no change.

One of the criticisms of the early estimates of super-rotation was that uncertainties about the hypocenters of the earthquakes, particularly those in the earlier records, caused errors in the measurement of travel times. This error can be reduced by using data for doublet earthquakes. These are earthquakes that have very similar waveforms, indicating that the earthquakes were very close to each other (within about a kilometer). Using doublet data from the South Sandwich Islands, a study in 2015 arrived at a new estimate of 0.41° per year.

Seismic observations – in particular "temporal changes between repeated seismic waves that should traverse the same path through the inner core" – were used to reveal a core rotation slow-down around 2009. This is not thought to have major effects and one cycle of the oscillation in rotation is thought to be about seven decades, coinciding with several other geophysical periodicities, "especially the length of day and magnetic field".

Inner core anisotropy

Song and Richards explained their observations in terms of the prevailing model of inner core anisotropy at the time. Waves were observed to travel faster between north and south than along the equatorial plane. A model for the inner core with uniform anisotropy had a direction of fastest travel tilted at an angle 10° from the spin axis of the Earth. Since then, the model for the anisotropy has become more complex. The top 100 kilometers are isotropic. Below that, there is stronger anisotropy in a "western" hemisphere (roughly centered on the Americas) than in an "eastern" hemisphere (the other half of the globe), and the anisotropy may increase with depth. There may also be a different orientation of anisotropy in an "innermost inner core" (IMIC) with a radius of about 550 kilometers.

A group at the University of Cambridge used travel time differentials to estimate the longitudes of the hemisphere boundaries with depth up to 90 kilometers below the inner core boundary. Combining this information with an estimate for the rate of growth for the inner core, they obtained a rate of 0.1–1° per million years.

Estimates of the rotation rate based on travel time differentials have been inconsistent. Those based on the Sandwich Island earthquakes have the fastest rates, although they also have a weaker signal, with PKP(DF) barely emerging above the noise. Estimates based on other paths have been lower or even in the opposite direction. By one analysis, the rotation rate is constrained to be less than 0.1° per year.

Heterogeneity

A study in 1997 revisited the Sandwich Islands data and came to a different conclusion about the origin of changes in travel times, attributing them to local heterogeneities in wave speeds. The new estimate for super-rotation was reduced to 0.2–0.3° per year.

Inner core rotation has also been estimated using PKiKP waves, which scatter off the surface of the inner core, rather than PKP(DF) waves. Estimates using this method have ranged from 0.05 to 0.15° per year.

Normal modes

Another way of constraining the inner core rotation is using normal modes (standing waves in Earth), giving a global picture. Heterogeneities in the core split the modes, and changes in the "splitting functions" over time can be used to estimate the rotation rate. However, their accuracy is limited by the shortage of seismic stations in the 1970s and 1980s, and the inferred rotation can be positive or negative depending on the mode. Overall, normal modes are unable to distinguish the rotation rate from zero.

Theory

In the 1995 model of Glatzmeier and Roberts, the inner core is rotated by a mechanism similar to an induction motor. A thermal wind in the outer core gives rise to a circulation pattern with flow from east to west near the inner core boundary. Magnetic fields passing through the inner and outer cores provide a magnetic torque, while viscous torque on the boundary keeps the inner core and the fluid near it rotating at the same rate on average.

The 1995 model did not include the effect of gravitational coupling between density variations in the mantle and topography on the inner core boundary. A 1996 study predicted that it would force the inner core and mantle to rotate at the same rate, but a 1997 paper showed that relative rotation could occur if the inner core was able to change its shape. This would require the viscosity to be less than 1.5 x 1020 pascal-seconds (Pa·s). It also predicted that, if the viscosity were too low (less than 3 x 1016 Pa·s), the inner core would not be able to maintain its seismic anisotropy. However, the source of the anisotropy is still not well understood. A model of the viscosity of the inner core based on Earth's nutations constrains the viscosity to 2–7 × 1014 Pa·s.

Geodynamo models that take into account gravitational locking and changes in the length of day predict a super-rotation rate of only 1° per million years. Some of the inconsistencies between measurements of the rotation may be accommodated if the rotation rate oscillates.

Proterozoic

From Wikipedia, the free encyclopedia
 
Proterozoic
2500 – 538.8 ± 0.2 Ma
From left to right: Four main Proterozoic events: Great Oxidation Event and subsequent Huronian glaciation; First eukaryotes, like red algae; Snowball Earth in Cryogenian period; Ediacaran biota
Chronology
Etymology
Name formalityFormal
Usage information
Celestial bodyEarth
Regional usageGlobal (ICS)
Time scale(s) usedICS Time Scale
Definition
Chronological unitEon
Stratigraphic unitEonothem
Time span formalityFormal
Lower boundary definitionDefined Chronometrically
Lower GSSA ratified1991
Upper boundary definitionAppearance of the Ichnofossil Treptichnus pedum
Upper boundary GSSPFortune Head section, Newfoundland, Canada
47.0762°N 55.8310°W
Upper GSSP ratified1992

The Proterozoic (IPA: /ˌprtərəˈzɪk, ˌprɒt-, -ər-, -trə-, -tr-/ PROH-tər-ə-ZOH-ik, PROT-, -⁠ər-oh-, -⁠trə-, -⁠troh-) is the third of the four geologic eons of Earth's history, spanning the time interval from 2500 to 538.8 Mya, the longest eon of the Earth's geologic time scale. It is preceded by the Archean and followed by the Phanerozoic, and is the most recent part of the Precambrian "supereon".

The Proterozoic is subdivided into three geologic eras (from oldest to youngest): the Paleoproterozoic, Mesoproterozoic and Neoproterozoic. It covers the time from the appearance of free oxygen in Earth's atmosphere to just before the proliferation of complex life on the Earth during the Cambrian Explosion. The name Proterozoic combines two words of Greek origin: protero- meaning "former, earlier", and -zoic, meaning "of life".

Well-identified events of this eon were the transition to an oxygenated atmosphere during the Paleoproterozoic; the evolution of eukaryotes via symbiogenesis; several global glaciations, which produced the 300 million years-long Huronian glaciation (during the Siderian and Rhyacian periods of the Paleoproterozoic) and the hypothesized Snowball Earth (during the Cryogenian period in the late Neoproterozoic); and the Ediacaran period (635 to 538.8 Ma), which is characterized by the evolution of abundant soft-bodied multicellular organisms such as sponges, algae, cnidarians, bilaterians and the sessile Ediacaran biota (some of which had evolved sexual reproduction) and provides the first obvious fossil evidence of life on Earth.

The Proterozoic record

The geologic record of the Proterozoic Eon is more complete than that for the preceding Archean Eon. In contrast to the deep-water deposits of the Archean, the Proterozoic features many strata that were laid down in extensive shallow epicontinental seas; furthermore, many of those rocks are less metamorphosed than Archean rocks, and many are unaltered. Studies of these rocks have shown that the eon continued the massive continental accretion that had begun late in the Archean Eon. The Proterozoic Eon also featured the first definitive supercontinent cycles and wholly modern mountain building activity (orogeny).

There is evidence that the first known glaciations occurred during the Proterozoic. The first began shortly after the beginning of the Proterozoic Eon, and evidence of at least four during the Neoproterozoic Era at the end of the Proterozoic Eon, possibly climaxing with the hypothesized Snowball Earth of the Sturtian and Marinoan glaciations.

The accumulation of oxygen

One of the most important events of the Proterozoic was the accumulation of oxygen in the Earth's atmosphere. Though oxygen is believed to have been released by photosynthesis as far back as the Archean Eon, it could not build up to any significant degree until mineral sinks of unoxidized sulfur and iron had been exhausted. Until roughly 2.3 billion years ago, oxygen was probably only 1% to 2% of its current level. The banded iron formations, which provide most of the world's iron ore, are one mark of that mineral sink process. Their accumulation ceased after 1.9 billion years ago, after the iron in the oceans had all been oxidized.

Red beds, which are colored by hematite, indicate an increase in atmospheric oxygen 2 billion years ago. Such massive iron oxide formations are not found in older rocks. The oxygen buildup was probably due to two factors: exhaustion of the chemical sinks, and an increase in carbon sequestration, which sequestered organic compounds that would have otherwise been oxidized by the atmosphere.

A second surge in oxygen concentrations, known as the Neoproterozoic Oxygenation Event, occurred during the Middle and Late Neoproterozoic and drove the rapid evolution of multicellular life towards the end of the era.

Subduction processes

The Proterozoic Eon was a very tectonically active period in the Earth's history.

The late Archean Eon to Early Proterozoic Eon corresponds to a period of increasing crustal recycling, suggesting subduction. Evidence for this increased subduction activity comes from the abundance of old granites originating mostly after 2.6 Ga.

The occurrence of eclogite (a type of metamorphic rock created by high pressure, > 1 GPa), is explained using a model that incorporates subduction. The lack of eclogites that date to the Archean Eon suggests that conditions at that time did not favor the formation of high grade metamorphism and therefore did not achieve the same levels of subduction as was occurring in the Proterozoic Eon.

As a result of remelting of basaltic oceanic crust due to subduction, the cores of the first continents grew large enough to withstand the crustal recycling processes.

The long-term tectonic stability of those cratons is why we find continental crust ranging up to a few billion years in age. It is believed that 43% of modern continental crust was formed in the Proterozoic, 39% formed in the Archean, and only 18% in the Phanerozoic. Studies by Condie (2000) and Rino et al. (2004) suggest that crust production happened episodically. By isotopically calculating the ages of Proterozoic granitoids it was determined that there were several episodes of rapid increase in continental crust production. The reason for these pulses is unknown, but they seemed to have decreased in magnitude after every period.

Tectonic history (supercontinents)

Evidence of collision and rifting between continents raises the question as to what exactly were the movements of the Archean cratons composing Proterozoic continents. Paleomagnetic and geochronological dating mechanisms have allowed the deciphering of Precambrian Supereon tectonics. It is known that tectonic processes of the Proterozoic Eon resemble greatly the evidence of tectonic activity, such as orogenic belts or ophiolite complexes, we see today. Hence, most geologists would conclude that the Earth was active at that time. It is also commonly accepted that during the Precambrian, the Earth went through several supercontinent breakup and rebuilding cycles (Wilson cycle).

In the late Proterozoic (most recent), the dominant supercontinent was Rodinia (~1000–750 Ma). It consisted of a series of continents attached to a central craton that forms the core of the North American Continent called Laurentia. An example of an orogeny (mountain building processes) associated with the construction of Rodinia is the Grenville orogeny located in Eastern North America. Rodinia formed after the breakup of the supercontinent Columbia and prior to the assemblage of the supercontinent Gondwana (~500 Ma). The defining orogenic event associated with the formation of Gondwana was the collision of Africa, South America, Antarctica and Australia forming the Pan-African orogeny.

Columbia was dominant in the early-mid Proterozoic and not much is known about continental assemblages before then. There are a few plausible models that explain tectonics of the early Earth prior to the formation of Columbia, but the current most plausible hypothesis is that prior to Columbia, there were only a few independent cratons scattered around the Earth (not necessarily a supercontinent, like Rodinia or Columbia).

Life

The emergence of advanced single-celled eukaryotes began after the Great Oxidation Event. This may have been due to an increase in the oxidized nitrates that eukaryotes use, as opposed to cyanobacteria. It was also during the Proterozoic that the first symbiotic relationships between mitochondria (found in nearly all eukaryotes) and chloroplasts (found in plants and some protists only) and their hosts evolved.

By the late Palaeoproterozoic, eukaryotic organisms had become moderately biodiverse. The blossoming of eukaryotes such as acritarchs did not preclude the expansion of cyanobacteria; in fact, stromatolites reached their greatest abundance and diversity during the Proterozoic, peaking roughly 1200 million years ago.

The earliest fossils possessing features typical of fungi date to the Paleoproterozoic Era, some 2,400 million years ago; these multicellular benthic organisms had filamentous structures capable of anastomosis.

Classically, the boundary between the Proterozoic and the Phanerozoic eons was set at the base of the Cambrian Period when the first fossils of animals, including trilobites and archeocyathids, as well as the animal-like Caveasphaera, appeared. In the second half of the 20th century, a number of fossil forms have been found in Proterozoic rocks, particularly in ones from the Ediacaran, proving that multicellular life had already become widespread tens of millions of years before the Cambrian Explosion in what is known as the Avalon Explosion. Nonetheless, the upper boundary of the Proterozoic has remained fixed at the base of the Cambrian, which is currently placed at 538.8 Ma.

Equality (mathematics)

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Equality_...