Fields
play an important role in science, technology, and economy. They
describe the spatial variations of a quantity, like the air temperature,
as a function of position. Knowing the configuration of a field can be
of large value. Measurements of fields, however, can never provide the
precise field configuration with certainty. Physical fields have an
infinite number of degrees of freedom, but the data generated by any
measurement device is always finite, providing only a finite number of
constraints on the field. Thus, an unambiguous deduction of such a field
from measurement data alone is impossible and only probabilistic inference
remains as a means to make statements about the field. Fortunately,
physical fields exhibit correlations and often follow known physical
laws. Such information is best fused into the field inference in order
to overcome the mismatch of field degrees of freedom to measurement
points. To handle this, an information theory for fields is needed, and
that is what information field theory is.
Concepts
Bayesian inference
is a field value at a location in a space . The prior knowledge about the unknown signal field is encoded in the probability distribution . The data provides additional information on via the likelihood that gets incorporated into the posterior probability
As fields
have an infinite number of degrees of freedom, the definition of
probabilities over spaces of field configurations has subtleties.
Identifying physical fields as elements of function spaces provides the
problem that no Lebesgue measure
is defined over the latter and therefore probability densities can not
be defined there. However, physical fields have much more regularity
than most elements of function spaces, as they are continuous and smooth
at most of their locations. Therefore less general, but sufficiently
flexible constructions can be used to handle the infinite number of
degrees of freedom of a field.
A pragmatic approach is to regard the field to be discretized in
terms of pixels. Each pixel carries a single field value that is assumed
to be constant within the pixel volume. All statements about the
continuous field have then to be cast into its pixel representation.
This way, one deals with finite dimensional field spaces, over which
probability densities are well definable.
In order for this description to be a proper field theory, it is further required that the pixel resolution can always be refined, while expectation values of the discretized field converge to finite values:
Path integrals
If this limit exists, one can talk about the field configuration space integral or path integral
irrespective of the resolution it might be evaluated numerically.
The determinant in the denominator might be ill-defined in the continuum limit,
however, all what is necessary for IFT to be consistent is that this
determinant can be estimated for any finite resolution field
representation with and that this permits the calculation of convergent expectation values.
A Gaussian probability distribution requires the specification of the field two point correlation function with coefficients
and a scalar product for continuous fields
with respect to which the inverse signal field covariance is constructed, i.e.
The corresponding prior information Hamiltonian reads
Measurement equation
The measurement data was generated with the likelihood . In case the instrument was linear, a measurement equation of the form
can be given, in which is the instrument response, which describes how the data on average reacts to the signal, and is the noise, simply the difference between data and linear signal response .
It is essential to note that the response translates the infinite
dimensional signal vector into the finite dimensional data space. In
components this reads
where a vector component notation was also introduced for signal and data vectors.
If the noise follows a signal independent zero mean Gaussian statistics with covariance , then the likelihood is Gaussian as well,
and the likelihood information Hamiltonian is
A linear measurement of a Gaussian signal, subject to Gaussian and signal-independent noise leads to a free IFT.
Free theory
Free Hamiltonian
The joint information Hamiltonian of the Gaussian scenario described above is
where denotes equality up to irrelevant constants, which, in this case, means expressions that are independent of . From this is it clear, that the posterior must be a Gaussian with mean and variance ,
where equality between the right and left hand sides holds as both distributions are normalized, .
Generalized Wiener filter
The posterior mean
is also known as the generalized Wiener filter solution and the uncertainty covariance
as the Wiener variance.
In IFT, is called the information source, as it acts as a source term to excite the field (knowledge), and the information propagator, as it propagates information from one location to another in
Interacting theory
Interacting Hamiltonian
If
any of the assumptions that lead to the free theory is violated, IFT
becomes an interacting theory, with terms that are of higher than
quadratic order in the signal field. This happens when the signal or the
noise are not following Gaussian statistics, when the response is
non-linear, when the noise depends on the signal, or when response or
covariances are uncertain.
In this case, the information Hamiltonian might be expandable in a Taylor-Fréchet series,
where is the free Hamiltonian, which alone would lead to a Gaussian posterior, and
is the interacting Hamiltonian, which encodes non-Gaussian corrections.
The first and second order Taylor coefficients are often identified
with the (negative) information source and information propagator , respectively. The higher coefficients are associated with non-linear self-interactions.
Classical field
The classical field minimizes the information Hamiltonian,
The Wiener filter problem requires the two point correlation
of a field to be known. If it is unknown, it has to be inferred along
with the field itself. This requires the specification of a hyperprior. Often, statistical homogeneity (translation invariance) can be assumed, implying that is diagonal in Fourier space (for being a dimensional Cartesian space). In this case, only the Fourier space power spectrum needs to be inferred. Given a further assumption of statistical isotropy, this spectrum depends only on the length of the Fourier vector and only a one dimensional spectrum has to be determined. The prior field covariance reads then in Fourier space coordinates .
If the prior on is flat, the joint probability of data and spectrum is
where the notation of the information propagator and source of the Wiener filter problem was used again. The corresponding information Hamiltonian is
where denotes equality up to irrelevant constants (here: constant with respect to ). Minimizing this with respect to , in order to get its maximum a posteriori power spectrum estimator, yields
where the Wiener filter mean and the spectral band projector were introduced. The latter commutes with , since is diagonal in Fourier space. The maximum a posteriori estimator for the power spectrum is therefore
It has to be calculated iteratively, as and depend both on themselves. In an empirical Bayes approach, the estimated would be taken as given. As a consequence, the posterior mean estimate for the signal field is the corresponding and its uncertainty the corresponding in the empirical Bayes approximation.
The resulting non-linear filter is called the critical filter. The generalization of the power spectrum estimation formula as
exhibits a perception thresholds for ,
meaning that the data variance in a Fourier band has to exceed the
expected noise level by a certain threshold before the signal
reconstruction
becomes non-zero for this band. Whenever the data variance exceeds this
threshold slightly, the signal reconstruction jumps to a finite
excitation level, similar to a first order phase transition in thermodynamic systems. For filter with
perception of the signal starts continuously as soon the data variance
exceeds the noise level. The disappearance of the discontinuous
perception at is similar to a thermodynamic system going through a critical point. Hence the name critical filter.
The critical filter, extensions thereof to non-linear
measurements, and the inclusion of non-flat spectrum priors, permitted
the application of IFT to real world signal inference problems, for
which the signal covariance is usually unknown a priori.
IFT application examples
The generalized Wiener filter, that emerges in free IFT, is in broad
usage in signal processing. Algorithms explicitly based on IFT were
derived for a number of applications. Many of them are implemented using
the Numerical Information Field Theory (NIFTy) library.
D³PO is a code for Denoising, Deconvolving, and Decomposing Photon Observations.
It reconstructs images from individual photon count events taking into
account the Poisson statistics of the counts and an instrument response
function. It splits the sky emission into an image of diffuse emission
and one of point sources, exploiting the different correlation structure
and statistics of the two components for their separation. D³PO has
been applied to data of the Fermi and the RXTE satellites.
RESOLVE
is a Bayesian algorithm for aperture synthesis imaging in radio
astronomy. RESOLVE is similar to D³PO, but it assumes a Gaussian
likelihood and a Fourier space response function. It has been applied to
data of the Very Large Array.
PySESA is a Python framework for Spatially Explicit Spectral Analysis for spatially explicit spectral analysis of point clouds and geospatial data.
Advanced theory
Many
techniques from quantum field theory can be used to tackle IFT
problems, like Feynman diagrams, effective actions, and the field
operator formalism.
Feynman diagrams
In case the interaction coefficients in a Taylor-Fréchet expansion of the information Hamiltonian
can be expanded asymptotically in terms of these coefficients. The free Hamiltonian specifies the mean and variance of the Gaussian distribution over which the expansion is integrated. This leads to a sum over the set of all connected Feynman diagrams. From the Helmholtz free energy, any connected moment of the field can be calculated via
Situations
where small expansion parameters exist that are needed for such a
diagrammatic expansion to converge are given by nearly Gaussian signal
fields, where the non-Gaussianity of the field statistics leads to small
interaction coefficients . For example, the statistics of the Cosmic Microwave Background is nearly Gaussian, with small amounts of non-Gaussianities believed to be seeded during the inflationary epoch in the Early Universe.
Effective action
In
order to have a stable numerics for IFT problems, a field functional
that if minimized provides the posterior mean field is needed. Such is
given by the effective action or Gibbs free energy of a field. The Gibbs free energy can be constructed from the Helmholtz free energy via a Legendre transformation.
In IFT, it is given by the difference of the internal information energy
for temperature ,
where a Gaussian posterior approximation is used with the approximate data containing the mean and the dispersion of the field.
The Gibbs free energy is then
the Kullback-Leibler divergence
between approximative and exact posterior plus the Helmholtz free
energy. As the latter does not depend on the approximate data ,
minimizing the Gibbs free energy is equivalent to minimizing the
Kullback-Leibler divergence between approximate and exact posterior.
Thus, the effective action approach of IFT is equivalent to the variational Bayesian methods, which also minimize the Kullback-Leibler divergence between approximate and exact posteriors.
Minimizing the Gibbs free energy provides approximatively the posterior mean field
whereas
minimizing the information Hamiltonian provides the maximum a
posteriori field. As the latter is known to over-fit noise, the former
is usually a better field estimator.
Operator formalism
The
calculation of the Gibbs free energy requires the calculation of
Gaussian integrals over an information Hamiltonian, since the internal
information energy is
Such integrals can be calculated via a field operator formalism, in which
is the field operator. This generates the field expression within the integral if applied to the Gaussian distribution function,
and any higher power of the field if applied several times,
If the information Hamiltonian is analytical, all its terms can be generated via the field operator
As the field operator does not depend on the field itself, it can be pulled out of the path integral of the internal information energy construction,
where should be regarded as a functional that always returns the value irrespective the value of its input . The resulting expression can be calculated by commuting the mean field annihilator to the right of the expression, where they vanish since . The mean field annihilator commutes with the mean field as
By the usage of the field operator formalism the Gibbs free
energy can be calculated, which permits the (approximate) inference of
the posterior mean field via a numerical robust functional minimization.
History
The book of Norbert Wiener
might be regarded as one of the first works on field inference. The
usage of path integrals for field inference was proposed by a number of
authors, e.g. Edmund Bertschinger or William Bialek and A. Zee. The connection of field theory and Bayesian reasoning was made explicit by Jörg Lemm. The term information field theorywas coined by Torsten Enßlin. See the latter reference for more information on the history of IFT.