A Medley of Potpourri

Thursday, May 21, 2015

Dirac delta function

From Wikipedia, the free encyclopedia

Schematic representation of the Dirac delta function by a line surmounted by an arrow. The height of the arrow is usually used to specify the value of any multiplicative constant, which will give the area under the function. The other convention is to write the area next to the arrowhead.

The Dirac delta function as the limit (in the sense of distributions) of the sequence of zero-centered normal distributions

\delta_a(x) = \frac{1}{a \sqrt{\pi}} \mathrm{e}^{-x^2/a^2}

a \rightarrow 0

In mathematics, the Dirac delta function, or $δ$ function, is a generalized function, or distribution, on the real number line that is zero everywhere except at zero, with an integral of one over the entire real line.^[1]^[2]^[3] The delta function is sometimes thought of as an infinitely high, infinitely thin spike at the origin, with total area one under the spike, and physically represents the density of an idealized point mass or point charge.^[4] It was introduced by theoretical physicist Paul Dirac. In the context of signal processing it is often referred to as the unit impulse symbol (or function).^[5] Its discrete analog is the Kronecker delta function, which is usually defined on a discrete domain and takes values 0 and 1.

From a purely mathematical viewpoint, the Dirac delta is not strictly a function, because any extended-real function that is equal to zero everywhere but a single point must have total integral zero.^[6] The delta function only makes sense as a mathematical object when it appears inside an integral. While from this perspective the Dirac delta can usually be manipulated as though it were a function, formally it must be defined as a distribution that is also a measure. In many applications, the Dirac delta is regarded as a kind of limit (a weak limit) of a sequence of functions having a tall spike at the origin. The approximating functions of the sequence are thus "approximate" or "nascent" delta functions.

Overview

The graph of the delta function is usually thought of as following the whole x-axis and the positive y-axis. Despite its name, the delta function is not truly a function, at least not a usual one with range in real numbers. For example, the objects f(x) = δ(x) and g(x) = 0 are equal everywhere except at x = 0 yet have integrals that are different.
According to Lebesgue integration theory, if f and g are functions such that f = g almost everywhere, then f is integrable if and only if g is integrable and the integrals of f and g are identical. Rigorous treatment of the Dirac delta requires measure theory or the theory of distributions.

The Dirac delta is used to model a tall narrow spike function (an impulse), and other similar abstractions such as a point charge, point mass or electron point. For example, to calculate the dynamics of a baseball being hit by a bat, one can approximate the force of the bat hitting the baseball by a delta function. In doing so, one not only simplifies the equations, but one also is able to calculate the motion of the baseball by only considering the total impulse of the bat against the ball rather than requiring knowledge of the details of how the bat transferred energy to the ball.
In applied mathematics, the delta function is often manipulated as a kind of limit (a weak limit) of a sequence of functions, each member of which has a tall spike at the origin: for example, a sequence of Gaussian distributions centered at the origin with variance tending to zero.

History

Joseph Fourier presented what is now called the Fourier integral theorem in his treatise Théorie analytique de la chaleur in the form:^[7]

f(x)=\frac{1}{2\pi}\int_{-\infty}^\infty\ \ d\alpha f(\alpha) \ \int_{-\infty}^\infty dp\ \cos (px-p\alpha)\ ,

which is tantamount to the introduction of the δ-function in the form:^[8]

\delta(x-\alpha)=\frac{1}{2\pi} \int_{-\infty}^\infty dp\ \cos (px-p\alpha) \ .

Later, Augustin Cauchy expressed the theorem using exponentials:^[9]^[10]

f(x)=\frac{1}{2\pi} \int_{-\infty} ^ \infty \ e^{ipx}\left(\int_{-\infty}^\infty e^{-ip\alpha }f(\alpha)\ d \alpha \right) \ dp.

Cauchy pointed out that in some circumstances the order of integration in this result was significant.^[11]^[12]

As justified using the theory of distributions, the Cauchy equation can be rearranged to resemble Fourier's original formulation and expose the δ-function as:

\begin{align} f(x)&=\frac{1}{2\pi} \int_{-\infty}^\infty e^{ipx}\left(\int_{-\infty}^\infty e^{-ip\alpha }f(\alpha)\ d \alpha \right) \ dp \\ &=\frac{1}{2\pi} \int_{-\infty}^\infty \left(\int_{-\infty}^\infty e^{ipx} e^{-ip\alpha } \ dp \right)f(\alpha)\ d \alpha =\int_{-\infty}^\infty \delta (x-\alpha) f(\alpha) \ d \alpha, \end{align}

where the δ-function is expressed as:

\delta(x-\alpha)=\frac{1}{2\pi} \int_{-\infty}^\infty e^{ip(x-\alpha)}\ dp \ .

A rigorous interpretation of the exponential form and the various limitations upon the function f necessary for its application extended over several centuries. The problems with a classical interpretation are explained as follows:^[13]

The greatest drawback of the classical Fourier transformation is a rather narrow class of functions (originals) for which it can be effectively computed. Namely, it is necessary that these functions decrease sufficiently rapidly to zero (in the neighborhood of infinity) in order to insure the existence of the Fourier integral. For example, the Fourier transform of such simple functions as polynomials does not exist in the classical sense. The extension of the classical Fourier transformation to distributions considerably enlarged the class of functions that could be transformed and this removed many obstacles.

Further developments included generalization of the Fourier integral, "beginning with Plancherel's pathbreaking L²-theory (1910), continuing with Wiener's and Bochner's works (around 1930) and culminating with the amalgamation into L. Schwartz's theory of distributions (1945) ...",^[14] and leading to the formal development of the Dirac delta function.

An infinitesimal formula for an infinitely tall, unit impulse delta function (infinitesimal version of Cauchy distribution) explicitly appears in an 1827 text of Augustin Louis Cauchy.^[15] Siméon Denis Poisson considered the issue in connection with the study of wave propagation as did Gustav Kirchhoff somewhat later. Kirchhoff and Hermann von Helmholtz also introduced the unit impulse as a limit of Gaussians, which also corresponded to Lord Kelvin's notion of a point heat source. At the end of the 19th century, Oliver Heaviside used formal Fourier series to manipulate the unit impulse.^[16] The Dirac delta function as such was introduced as a "convenient notation" by Paul Dirac in his influential 1930 book The Principles of Quantum Mechanics.^[17] He called it the "delta function" since he used it as a continuous analogue of the discrete Kronecker delta.

Definitions

The Dirac delta can be loosely thought of as a function on the real line which is zero everywhere except at the origin, where it is infinite,

\delta(x) = \begin{cases} +\infty, & x = 0 \\ 0, & x \ne 0 \end{cases}

and which is also constrained to satisfy the identity

\int_{-\infty}^\infty \delta(x) \, dx = 1.

^[18]

This is merely a heuristic characterization. The Dirac delta is not a function in the traditional sense as no function defined on the real numbers has these properties.^[17] The Dirac delta function can be rigorously defined either as a distribution or as a measure.

As a measure

One way to rigorously define the delta function is as a measure, which accepts as an argument a subset A of the real line R, and returns δ(A) = 1 if 0 ∈ A, and δ(A) = 0 otherwise.^[19] If the delta function is conceptualized as modeling an idealized point mass at 0, then δ(A) represents the mass contained in the set A. One may then define the integral against δ as the integral of a function against this mass distribution. Formally, the Lebesgue integral provides the necessary analytic device. The Lebesgue integral with respect to the measure δ satisfies

\int_{-\infty}^\infty f(x) \, \delta\{dx\} = f(0)

for all continuous compactly supported functions f. The measure δ is not absolutely continuous with respect to the Lebesgue measure — in fact, it is a singular measure. Consequently, the delta measure has no Radon–Nikodym derivative — no true function for which the property

\int_{-\infty}^\infty f(x)\delta(x)\, dx = f(0)

holds.^[20] As a result, the latter notation is a convenient abuse of notation, and not a standard (Riemann or Lebesgue) integral.

As a probability measure on R, the delta measure is characterized by its cumulative distribution function, which is the unit step function^[21]

H(x) = \begin{cases} 1 & \text{if } x\ge 0\\ 0 & \text{if } x < 0. \end{cases}

This means that H(x) is the integral of the cumulative indicator function 1_{(−∞, x]} with respect to the measure δ; to wit,

H(x) = \int_{\mathbf{R}}\mathbf{1}_{(-\infty,x]}(t)\,\delta\{dt\} = \delta(-\infty,x].

Thus in particular the integral of the delta function against a continuous function can be properly understood as a Stieltjes integral:^[22]

\int_{-\infty}^\infty f(x)\delta\{dx\} = \int_{-\infty}^\infty f(x) \, dH(x).

All higher moments of δ are zero. In particular, characteristic function and moment generating function are both equal to one.

As a distribution

In the theory of distributions a generalized function is thought of not as a function itself, but only in relation to how it affects other functions when it is "integrated" against them. In keeping with this philosophy, to define the delta function properly, it is enough to say what the "integral" of the delta function against a sufficiently "good" test function is. If the delta function is already understood as a measure, then the Lebesgue integral of a test function against that measure supplies the necessary integral.

A typical space of test functions consists of all smooth functions on R with compact support. As a distribution, the Dirac delta is a linear functional on the space of test functions and is defined by^[23]

\delta[\varphi] = \varphi(0)\,

(1)

for every test function φ.

For δ to be properly a distribution, it must be "continuous" in a suitable sense. In general, for a linear functional S on the space of test functions to define a distribution, it is necessary and sufficient that, for every positive integer N there is an integer M_N and a constant C_N such that for every test function φ, one has the inequality^[24]

|S[\phi]| \le C_N \sum_{k=0}^{M_N}\sup_{x\in [-N,N]}|\phi^{(k)}(x)|.

With the δ distribution, one has such an inequality (with C_N = 1) with M_N = 0 for all N. Thus δ is a distribution of order zero. It is, furthermore, a distribution with compact support (the support being {0}).

The delta distribution can also be defined in a number of equivalent ways. For instance, it is the distributional derivative of the Heaviside step function. This means that, for every test function φ, one has

\delta[\phi] = -\int_{-\infty}^\infty \phi'(x)H(x)\, dx.

Intuitively, if integration by parts were permitted, then the latter integral should simplify to

\int_{-\infty}^\infty \phi(x)H'(x)\, dx = \int_{-\infty}^\infty \phi(x)\delta(x)\, dx,

and indeed, a form of integration by parts is permitted for the Stieltjes integral, and in that case one does have

-\int_{-\infty}^\infty \phi'(x)H(x)\, dx = \int_{-\infty}^\infty \phi(x)\,dH(x).

In the context of measure theory, the Dirac measure gives rise to a distribution by integration. Conversely, equation (1) defines a Daniell integral on the space of all compactly supported continuous functions φ which, by the Riesz representation theorem, can be represented as the Lebesgue integral of φ with respect to some Radon measure.

Generalizations

The delta function can be defined in n-dimensional Euclidean space Rⁿ as the measure such that

\int_{\mathbf{R}^n} f(\mathbf{x})\delta\{d\mathbf{x}\} = f(\mathbf{0})

for every compactly supported continuous function f. As a measure, the n-dimensional delta function is the product measure of the 1-dimensional delta functions in each variable separately. Thus, formally, with x = (x₁, x₂, ..., x_n), one has^[5]

\delta(\mathbf{x}) = \delta(x_1)\delta(x_2)\dots\delta(x_n).

(2)

The delta function can also be defined in the sense of distributions exactly as above in the one-dimensional case.^[25]
However, despite widespread use in engineering contexts, (2) should be manipulated with care, since the product of distributions can only be defined under quite narrow circumstances.^[26]

The notion of a Dirac measure makes sense on any set.^[19] Thus if X is a set, x₀ ∈ X is a marked point, and Σ is any sigma algebra of subsets of X, then the measure defined on sets A ∈ Σ by

\delta_{x_0}(A)=\begin{cases} 1 &\text{if }x_0\in A\\ 0 &\text{if }x_0\notin A \end{cases}

is the delta measure or unit mass concentrated at x₀.

Another common generalization of the delta function is to a differentiable manifold where most of its properties as a distribution can also be exploited because of the differentiable structure. The delta function on a manifold M centered at the point x₀ ∈ M is defined as the following distribution:

\delta_{x_0}[\phi] = \phi(x_0)

(3)

for all compactly supported smooth real-valued functions φ on M.^[27] A common special case of this construction is when M is an open set in the Euclidean space Rⁿ.

On a locally compact Hausdorff space X, the Dirac delta measure concentrated at a point x is the Radon measure associated with the Daniell integral (3) on compactly supported continuous functions φ. At this level of generality, calculus as such is no longer possible, however a variety of techniques from abstract analysis are available. For instance, the mapping

x_0\mapsto \delta_{x_0}

is a continuous embedding of X into the space of finite Radon measures on X, equipped with its vague topology. Moreover, the convex hull of the image of X under this embedding is dense in the space of probability measures on X.^[28]

Properties

Scaling and symmetry

The delta function satisfies the following scaling property for a non-zero scalar α:^[29]

\int_{-\infty}^\infty \delta(\alpha x)\,dx =\int_{-\infty}^\infty \delta(u)\,\frac{du}{|\alpha|} =\frac{1}{|\alpha|}

and so

\delta(\alpha x) = \frac{\delta(x)}{|\alpha|}.

(4)

In particular, the delta function is an even distribution, in the sense that

\delta(-x) = \delta(x)

which is homogeneous of degree −1.

Algebraic properties

The distributional product of δ with x is equal to zero:

x\delta(x) = 0.

Conversely, if xf(x) = xg(x), where f and g are distributions, then

f(x) = g(x) +c \delta(x)

for some constant c.^[30]

Translation

The integral of the time-delayed Dirac delta is given by:

\int_{-\infty}^\infty f(t) \delta(t-T)\,dt = f(T).

This is sometimes referred to as the sifting property^[31] or the sampling property. The delta function is said to "sift out" the value at t = T.

It follows that the effect of convolving a function f(t) with the time-delayed Dirac delta is to time-delay f(t) by the same amount:

= f(t-T).\,

= \int\limits_{-\infty}^\infty f(\tau) \delta(\tau-(t-T)) \, d\tau

(using (4):

\delta(-x)=\delta(x)

)

$(f(t) * \delta(t-T))\,$	$\ \stackrel{\mathrm{def}}{=}\ \int_{-\infty}^\infty f(\tau) \delta(t-T-\tau) \, d\tau$

This holds under the precise condition that f be a tempered distribution (see the discussion of the Fourier transform below). As a special case, for instance, we have the identity (understood in the distribution sense)

\int_{-\infty}^\infty \delta (\xi-x) \delta(x-\eta) \, dx = \delta(\xi-\eta).

Composition with a function

More generally, the delta distribution may be composed with a smooth function g(x) in such a way that the familiar change of variables formula holds, that

\int_{\mathbf{R}} \delta\bigl(g(x)\bigr) f\bigl(g(x)\bigr) |g'(x)|\,dx = \int_{g(\mathbf{R})} \delta(u)f(u)\, du

provided that g is a continuously differentiable function with g′ nowhere zero.^[32] That is, there is a unique way to assign meaning to the distribution

\delta\circ g

so that this identity holds for all compactly supported test functions f. Therefore, the domain must be broken up to exclude the g' = 0 point. This distribution satisfies δ(g(x)) = 0 if g′ is nowhere zero, and otherwise if g has a real root at x₀, then

\delta(g(x)) = \frac{\delta(x-x_0)}{|g'(x_0)|}.

It is natural therefore to define the composition δ(g(x)) for continuously differentiable functions g by

\delta(g(x)) = \sum_i \frac{\delta(x-x_i)}{|g'(x_i)|}

where the sum extends over all roots of g(x), which are assumed to be simple.^[32] Thus, for example

\delta\left(x^2-\alpha^2\right) = \frac{1}{2|\alpha|}\Big[\delta\left(x+\alpha\right)+\delta\left(x-\alpha\right)\Big].

In the integral form the generalized scaling property may be written as

\int_{-\infty}^\infty f(x) \, \delta(g(x)) \, dx = \sum_{i}\frac{f(x_i)}{|g'(x_i)|}.

Properties in n dimensions

The delta distribution in an n-dimensional space satisfies the following scaling property instead:

\delta(\alpha\mathbf{x}) = |\alpha|^{-n}\delta(\mathbf{x})

so that δ is a homogeneous distribution of degree −n. Under any reflection or rotation ρ, the delta function is invariant:

\delta(\rho \mathbf{x}) = \delta(\mathbf{x}).

As in the one-variable case, it is possible to define the composition of δ with a bi-Lipschitz function^[33] g: Rⁿ → Rⁿ uniquely so that the identity

\int_{\mathbf{R}^n} \delta(g(\mathbf{x}))\, f(g(\mathbf{x}))\, |\det g'(\mathbf{x})|\, d\mathbf{x} = \int_{g(\mathbf{R}^n)} \delta(\mathbf{u}) f(\mathbf{u})\,d\mathbf{u}

for all compactly supported functions f.

Using the coarea formula from geometric measure theory, one can also define the composition of the delta function with a submersion from one Euclidean space to another one of different dimension; the result is a type of current. In the special case of a continuously differentiable function g: Rⁿ → R such that the gradient of g is nowhere zero, the following identity holds^[34]

\int_{\mathbf{R}^n} f(\mathbf{x}) \, \delta(g(\mathbf{x})) \, d\mathbf{x} = \int_{g^{-1}(0)}\frac{f(\mathbf{x})}{|\mathbf{\nabla}g|}\,d\sigma(\mathbf{x})

where the integral on the right is over g⁻¹(0), the (n − 1)-dimensional surface defined by g(x) = 0 with respect to the Minkowski content measure. This is known as a simple layer integral.

More generally, if S is a smooth hypersurface of Rⁿ, then we can associated to S the distribution that integrates any compactly supported smooth function g over S:

\delta_S[g] = \int_S g(\mathbf{s})\,d\sigma(\mathbf{s})

where σ is the hypersurface measure associated to S. This generalization is associated with the potential theory of simple layer potentials on S. If D is a domain in Rⁿ with smooth boundary S, then δ_S is equal to the normal derivative of the indicator function of D in the distribution sense:

-\int_{\mathbf{R}^n}g(\mathbf{x})\,\frac{\partial 1_D(\mathbf{x})}{\partial n}\;d\mathbf{x}=\int_S\,g(\mathbf{s})\;d\sigma(\mathbf{s}),

where n is the outward normal.^[35]^[36] For a proof, see e.g. the article on the surface delta function.

Fourier transform

The delta function is a tempered distribution, and therefore it has a well-defined Fourier transform. Formally, one finds^[37]

\hat{\delta}(\xi)=\int_{-\infty}^\infty e^{-2\pi i x \xi}\delta(x)\,dx = 1.

Properly speaking, the Fourier transform of a distribution is defined by imposing self-adjointness of the Fourier transform under the duality pairing

\langle\cdot,\cdot\rangle

of tempered distributions with Schwartz functions. Thus

\hat{\delta}

is defined as the unique tempered distribution satisfying

\langle\hat{\delta},\phi\rangle = \langle\delta,\hat{\phi}\rangle

for all Schwartz functions φ. And indeed it follows from this that

\hat{\delta}=1.

As a result of this identity, the convolution of the delta function with any other tempered distribution S is simply S:

S*\delta = S.\,

That is to say that δ is an identity element for the convolution on tempered distributions, and in fact the space of compactly supported distributions under convolution is an associative algebra with identity the delta function. This property is fundamental in signal processing, as convolution with a tempered distribution is a linear time-invariant system, and applying the linear time-invariant system measures its impulse response. The impulse response can be computed to any desired degree of accuracy by choosing a suitable approximation for δ, and once it is known, it characterizes the system completely. See LTI system theory:Impulse response and convolution.

The inverse Fourier transform of the tempered distribution f(ξ) = 1 is the delta function. Formally, this is expressed

\int_{-\infty}^\infty 1 \cdot e^{2\pi i x\xi}\,d\xi = \delta(x)

and more rigorously, it follows since

\langle 1, f^\vee\rangle = f(0) = \langle\delta,f\rangle

for all Schwartz functions f.

In these terms, the delta function provides a suggestive statement of the orthogonality property of the Fourier kernel on R. Formally, one has

\int_{-\infty}^\infty e^{i 2\pi \xi_1 t} \left[e^{i 2\pi \xi_2 t}\right]^*\,dt = \int_{-\infty}^\infty e^{-i 2\pi (\xi_2 - \xi_1) t} \,dt = \delta(\xi_2 - \xi_1).

This is, of course, shorthand for the assertion that the Fourier transform of the tempered distribution

f(t) = e^{i2\pi\xi_1 t}

\hat{f}(\xi_2) = \delta(\xi_1-\xi_2)

which again follows by imposing self-adjointness of the Fourier transform.

By analytic continuation of the Fourier transform, the Laplace transform of the delta function is found to be^[38]

\int_{0}^{\infty}\delta (t-a)e^{-st} \, dt=e^{-sa}.

Distributional derivatives

The distributional derivative of the Dirac delta distribution is the distribution δ′ defined on compactly supported smooth test functions φ by^[39]

\delta'[\varphi] = -\delta[\varphi']=-\varphi'(0).

The first equality here is a kind of integration by parts, for if δ were a true function then

\int_{-\infty}^\infty \delta'(x)\varphi(x)\,dx = -\int_{-\infty}^\infty \delta(x)\varphi'(x)\,dx.

The k-th derivative of δ is defined similarly as the distribution given on test functions by

\delta^{(k)}[\varphi] = (-1)^k \varphi^{(k)}(0).

In particular, δ is an infinitely differentiable distribution.

The first derivative of the delta function is the distributional limit of the difference quotients:^[40]

\delta'(x) = \lim_{h\to 0} \frac{\delta(x+h)-\delta(x)}{h}.

More properly, one has

\delta' = \lim_{h\to 0} \frac{1}{h}(\tau_h\delta - \delta)

where τ_h is the translation operator, defined on functions by τ_hφ(x) = φ(x + h), and on a distribution S by

(\tau_h S)[\varphi] = S[\tau_{-h}\varphi].

In the theory of electromagnetism, the first derivative of the delta function represents a point magnetic dipole situated at the origin. Accordingly, it is referred to as a dipole or the doublet function.^[41]

The derivative of the delta function satisfies a number of basic properties, including:

$\frac{d}{dx}\delta(-x) = \frac{d}{dx}\delta(x)$
$\delta'(-x) = -\delta'(x)$
$x\delta'(x) = -\delta(x).$ ^[42]

Furthermore, the convolution of δ′ with a compactly supported smooth function f is

\delta'*f = \delta*f' = f',

which follows from the properties of the distributional derivative of a convolution.

Higher dimensions

More generally, on an open set U in the n-dimensional Euclidean space Rⁿ, the Dirac delta distribution centered at a point a ∈ U is defined by^[43]

\delta_a[\phi]=\phi(a)

for all φ ∈ S(U), the space of all smooth compactly supported functions on U. If α = (α₁, ..., α_n) is any multi-index and ∂^α denotes the associated mixed partial derivative operator, then the αth derivative ∂^αδ_a of δ_a is given by^[43]

\left\langle \partial^{\alpha} \delta_{a}, \varphi \right\rangle = (-1)^{| \alpha |} \left\langle \delta_{a}, \partial^{\alpha} \varphi \right\rangle = \left. (-1)^{| \alpha |} \partial^{\alpha} \varphi (x) \right|_{x = a} \mbox{ for all } \varphi \in S(U).

That is, the αth derivative of δ_a is the distribution whose value on any test function φ is the αth derivative of φ at a (with the appropriate positive or negative sign).

The first partial derivatives of the delta function are thought of as double layers along the coordinate planes. More generally, the normal derivative of a simple layer supported on a surface is a double layer supported on that surface, and represents a laminar magnetic monopole. Higher derivatives of the delta function are known in physics as multipoles.

Higher derivatives enter into mathematics naturally as the building blocks for the complete structure of distributions with point support. If S is any distribution on U supported on the set {a} consisting of a single point, then there is an integer m and coefficients c_α such that^[44]

S = \sum_{|\alpha|\le m} c_\alpha \partial^\alpha\delta_a.

Representations of the delta function

The delta function can be viewed as the limit of a sequence of functions

\delta (x) = \lim_{\varepsilon\to 0^+} \eta_\varepsilon(x), \,

where η_ε(x) is sometimes called a nascent delta function. This limit is meant in a weak sense: either that

\lim_{\varepsilon\to 0^+} \int_{-\infty}^{\infty}\eta_\varepsilon(x)f(x) \, dx = f(0) \

(5)

for all continuous functions f having compact support, or that this limit holds for all smooth functions f with compact support. The difference between these two slightly different modes of weak convergence is often subtle: the former is convergence in the vague topology of measures, and the latter is convergence in the sense of distributions.

Approximations to the identity

Typically a nascent delta function η_ε can be constructed in the following manner. Let η be an absolutely integrable function on R of total integral 1, and define

\eta_\varepsilon(x) = \varepsilon^{-1} \eta \left (\frac{x}{\varepsilon} \right).

In n dimensions, one uses instead the scaling

\eta_\varepsilon(x) = \varepsilon^{-n} \eta \left (\frac{x}{\varepsilon} \right).

Then a simple change of variables shows that η_ε also has integral 1.^[45] One shows easily that (5) holds for all continuous compactly supported functions f, and so η_ε converges weakly to δ in the sense of measures.

The η_ε constructed in this way are known as an approximation to the identity.^[46] This terminology is because the space L¹(R) of absolutely integrable functions is closed under the operation of convolution of functions: f ∗ g ∈ L¹(R) whenever f and g are in L¹(R). However, there is no identity in L¹(R) for the convolution product: no element h such that f ∗ h = f for all f. Nevertheless, the sequence η_ε does approximate such an identity in the sense that

f*\eta_\varepsilon \to f\quad\rm{as\ }\varepsilon\to 0.

This limit holds in the sense of mean convergence (convergence in L¹). Further conditions on the η_ε, for instance that it be a mollifier associated to a compactly supported function,^[47] are needed to ensure pointwise convergence almost everywhere.

If the initial η = η₁ is itself smooth and compactly supported then the sequence is called a mollifier. The standard mollifier is obtained by choosing η to be a suitably normalized bump function, for instance

\eta(x) = \begin{cases} e^{-\frac{1}{1-|x|^2}}& \text{ if } |x| < 1\\ 0& \text{ if } |x|\geq 1. \end{cases}

In some situations such as numerical analysis, a piecewise linear approximation to the identity is desirable. This can be obtained by taking η₁ to be a hat function. With this choice of η₁, one has

\eta_\varepsilon(x) = \varepsilon^{-1}\max \left (1-|\frac{x}{\varepsilon}|,0 \right)

which are all continuous and compactly supported, although not smooth and so not a mollifier.

Probabilistic considerations

In the context of probability theory, it is natural to impose the additional condition that the initial η₁ in an approximation to the identity should be positive, as such a function then represents a probability distribution. Convolution with a probability distribution is sometimes favorable because it does not result in overshoot or undershoot, as the output is a convex combination of the input values, and thus falls between the maximum and minimum of the input function. Taking η₁ to be any probability distribution at all, and letting η_ε(x) = η₁(x/ε)/ε as above will give rise to an approximation to the identity. In general this converges more rapidly to a delta function if, in addition, η has mean 0 and has small higher moments. For instance, if η₁ is the uniform distribution on [−1/2, 1/2], also known as the rectangular function, then:^[48]

\eta_\varepsilon(x) = \frac{1}{\varepsilon}\ \textrm{rect}\left(\frac{x}{\varepsilon}\right)= \begin{cases} \frac{1}{\varepsilon},&-\frac{\varepsilon}{2}<x<\frac{\varepsilon}{2}\\ 0,&\text{otherwise}. \end{cases}

Another example is with the Wigner semicircle distribution

\eta_\varepsilon(x)= \begin{cases} \frac{2}{\pi \varepsilon^2}\sqrt{\varepsilon^2 - x^2}, & -\varepsilon < x < \varepsilon \\ 0, & \text{otherwise} \end{cases}

This is continuous and compactly supported, but not a mollifier because it is not smooth.

Semigroups

Nascent delta functions often arise as convolution semigroups. This amounts to the further constraint that the convolution of η_ε with η_δ must satisfy

\eta_\varepsilon * \eta_\delta = \eta_{\varepsilon+\delta}

for all ε, δ > 0. Convolution semigroups in L¹ that form a nascent delta function are always an approximation to the identity in the above sense, however the semigroup condition is quite a strong restriction.

In practice, semigroups approximating the delta function arise as fundamental solutions or Green's functions to physically motivated elliptic or parabolic partial differential equations. In the context of applied mathematics, semigroups arise as the output of a linear time-invariant system. Abstractly, if A is a linear operator acting on functions of x, then a convolution semigroup arises by solving the initial value problem

\begin{cases} \frac{\partial}{\partial t}\eta(t,x) = A\eta(t,x), \quad t>0 \\ \displaystyle\lim_{t\to 0^+} \eta(t,x) = \delta(x) \end{cases}

in which the limit is as usual understood in the weak sense. Setting η_ε(x) = η(ε, x) gives the associated nascent delta function.

Some examples of physically important convolution semigroups arising from such a fundamental solution include the following.

The heat kernel

The heat kernel, defined by

\eta_\varepsilon(x) = \frac{1}{\sqrt{2\pi\varepsilon}} \mathrm{e}^{-\frac{x^2}{2\varepsilon}}

represents the temperature in an infinite wire at time t > 0, if a unit of heat energy is stored at the origin of the wire at time t = 0. This semigroup evolves according to the one-dimensional heat equation:

\frac{\partial u}{\partial t} = \frac{1}{2}\frac{\partial^2 u}{\partial x^2}.

In probability theory, η_ε(x) is a normal distribution of variance ε and mean 0. It represents the probability density at time t = ε of the position of a particle starting at the origin following a standard Brownian motion. In this context, the semigroup condition is then an expression of the Markov property of Brownian motion.

In higher-dimensional Euclidean space Rⁿ, the heat kernel is

\eta_\varepsilon = \frac{1}{(2\pi\varepsilon)^{n/2}}\mathrm{e}^{-\frac{x\cdot x}{2\varepsilon}},

and has the same physical interpretation, mutatis mutandis. It also represents a nascent delta function in the sense that η_ε → δ in the distribution sense as ε → 0.

The Poisson kernel

\eta_\varepsilon(x) = \frac{1}{\pi} \frac{\varepsilon}{\varepsilon^2 + x^2}=\int_{-\infty}^{\infty}\mathrm{e}^{2\pi\mathrm{i} \xi x-|\varepsilon \xi|}\;d\xi

is the fundamental solution of the Laplace equation in the upper half-plane.^[49] It represents the electrostatic potential in a semi-infinite plate whose potential along the edge is held at fixed at the delta function. The Poisson kernel is also closely related to the Cauchy distribution. This semigroup evolves according to the equation

\frac{\partial u}{\partial t} = -\left (-\frac{\partial^2}{\partial x^2} \right)^{\frac{1}{2}}u(t,x)

where the operator is rigorously defined as the Fourier multiplier

\mathcal{F}\left[\left(-\frac{\partial^2}{\partial x^2} \right)^{\frac{1}{2}}f\right](\xi) = |2\pi\xi|\mathcal{F}f(\xi).

Oscillatory integrals

In areas of physics such as wave propagation and wave mechanics, the equations involved are hyperbolic and so may have more singular solutions. As a result, the nascent delta functions that arise as fundamental solutions of the associated Cauchy problems are generally oscillatory integrals. An example, which comes from a solution of the Euler–Tricomi equation of transonic gas dynamics,^[50] is the rescaled Airy function

\varepsilon^{-\frac{1}{3}}\operatorname{Ai}\left (x\varepsilon^{-\frac{1}{3}} \right).

Although using the Fourier transform, it is easy to see that this generates a semigroup in some sense, it is not absolutely integrable and so cannot define a semigroup in the above strong sense. Many nascent delta functions constructed as oscillatory integrals only converge in the sense of distributions (an example is the Dirichlet kernel below), rather than in the sense of measures.

Another example is the Cauchy problem for the wave equation in R¹⁺¹:^[51]

\begin{align} c^{-2}\frac{\partial^2u}{\partial t^2} - \Delta u &= 0\\ u=0,\quad \frac{\partial u}{\partial t} = \delta &\qquad \text{for }t=0. \end{align}

The solution u represents the displacement from equilibrium of an infinite elastic string, with an initial disturbance at the origin.

Other approximations to the identity of this kind include the sinc function (used widely in electronics and telecommunications)

\eta_\varepsilon(x)=\frac{1}{\pi x}\sin\left(\frac{x}{\varepsilon}\right)=\frac{1}{2\pi}\int_{-\frac{1}{\varepsilon}}^{\frac{1}{\varepsilon}} \cos(kx)\;dk

and the Bessel function

\eta_\varepsilon(x) = \frac{1}{\varepsilon}J_{\frac{1}{\varepsilon}} \left(\frac{x+1}{\varepsilon}\right).

Plane wave decomposition

One approach to the study of a linear partial differential equation

L[u]=f,\,

where L is a differential operator on Rⁿ, is to seek first a fundamental solution, which is a solution of the equation

L[u]=\delta.\,

When L is particularly simple, this problem can often be resolved using the Fourier transform directly (as in the case of the Poisson kernel and heat kernel already mentioned). For more complicated operators, it is sometimes easier first to consider an equation of the form

L[u]=h\,

where h is a plane wave function, meaning that it has the form

h = h(x\cdot\xi)

for some vector ξ. Such an equation can be resolved (if the coefficients of L are analytic functions) by the Cauchy–Kovalevskaya theorem or (if the coefficients of L are constant) by quadrature. So, if the delta function can be decomposed into plane waves, then one can in principle solve linear partial differential equations.

Such a decomposition of the delta function into plane waves was part of a general technique first introduced essentially by Johann Radon, and then developed in this form by Fritz John (1955).^[52] Choose k so that n + k is an even integer, and for a real number s, put

g(s) = \operatorname{Re}\left[\frac{-s^k\log(-is)}{k!(2\pi i)^n}\right] =\begin{cases} \frac{|s|^k}{4k!(2\pi i)^{n-1}}&n \text{ odd}\\ &\\ -\frac{|s|^k\log|s|}{k!(2\pi i)^{n}}&n \text{ even.} \end{cases}

Then δ is obtained by applying a power of the Laplacian to the integral with respect to the unit sphere measure dω of g(x · ξ) for ξ in the unit sphere Sⁿ⁻¹:

\delta(x) = \Delta_x^{\frac{n+k}{2}} \int_{S^{n-1}} g(x\cdot\xi)\,d\omega_\xi.

The Laplacian here is interpreted as a weak derivative, so that this equation is taken to mean that, for any test function φ,

\varphi(x) = \int_{\mathbf{R}^n}\varphi(y)\,dy\,\Delta_x^{\frac{n+k}{2}} \int_{S^{n-1}} g((x-y)\cdot\xi)\,d\omega_\xi.

The result follows from the formula for the Newtonian potential (the fundamental solution of Poisson's equation). This is essentially a form of the inversion formula for the Radon transform, because it recovers the value of φ(x) from its integrals over hyperplanes. For instance, if n is odd and k = 1, then the integral on the right hand side is

c_n \Delta^{\frac{n+1}{2}}_x\int\int_{S^{n-1}} \varphi(y)|(y-x)\cdot\xi|\,d\omega_\xi\,dy = c_n\Delta^{\frac{n+1}{2}}_x\int_{S^{n-1}} \, d\omega_\xi \int_{-\infty}^\infty |p|R\varphi(\xi,p+x\cdot\xi)\,dp

where Rφ(ξ, p) is the Radon transform of φ:

R\varphi(\xi,p) = \int_{x\cdot\xi=p} f(x)\,d^{n-1}x.

An alternative equivalent expression of the plane wave decomposition, from Gel'fand & Shilov (1966–1968, I, §3.10), is

\delta(x) = \frac{(n-1)!}{(2\pi i)^n}\int_{S^{n-1}}(x\cdot\xi)^{-n}\,d\omega_\xi

for n even, and

\delta(x) = \frac{1}{2(2\pi i)^{n-1}}\int_{S^{n-1}}\delta^{(n-1)}(x\cdot\xi)\,d\omega_\xi

for n odd.

Fourier kernels

In the study of Fourier series, a major question consists of determining whether and in what sense the Fourier series associated with a periodic function converges to the function. The nth partial sum of the Fourier series of a function f of period 2π is defined by convolution (on the interval [−π,π]) with the Dirichlet kernel:

D_N(x) = \sum_{n=-N}^N e^{inx} = \frac{\sin\left((N+\tfrac12)x\right)}{\sin(x/2)}.

Thus,

s_N(f)(x) = D_N*f(x) = \sum_{n=-N}^N a_n e^{inx}

where

a_n = \frac{1}{2\pi}\int_{-\pi}^\pi f(y)e^{-iny}\,dy.

A fundamental result of elementary Fourier series states that the Dirichlet kernel tends to the a multiple of the delta function as N → ∞. This is interpreted in the distribution sense, that

s_N(f)(0) = \int_{\mathbf{R}} D_N(x)f(x)\,dx \to 2\pi f(0)

for every compactly supported smooth function f. Thus, formally one has

\delta(x) = \frac1{2\pi} \sum_{n=-\infty}^\infty e^{inx}

on the interval [−π,π].

In spite of this, the result does not hold for all compactly supported continuous functions: that is D_N does not converge weakly in the sense of measures. The lack of convergence of the Fourier series has led to the introduction of a variety of summability methods in order to produce convergence. The method of Cesàro summation leads to the Fejér kernel^[53]

F_N(x) = \frac1N\sum_{n=0}^{N-1} D_n(x) = \frac{1}{N}\left(\frac{\sin \frac{Nx}{2}}{\sin \frac{x}{2}}\right)^2.

The Fejér kernels tend to the delta function in a stronger sense that^[54]

\int_{\mathbf{R}} F_N(x)f(x)\,dx \to 2\pi f(0)

for every compactly supported continuous function f. The implication is that the Fourier series of any continuous function is Cesàro summable to the value of the function at every point.

Hilbert space theory

The Dirac delta distribution is a densely defined unbounded linear functional on the Hilbert space L² of square integrable functions. Indeed, smooth compactly support functions are dense in L², and the action of the delta distribution on such functions is well-defined. In many applications, it is possible to identify subspaces of L² and to give a stronger topology on which the delta function defines a bounded linear functional.

Sobolev spaces

The Sobolev embedding theorem for Sobolev spaces on the real line R implies that any square-integrable function f such that

\|f\|_{H^1}^2 = \int_{-\infty}^\infty |\hat{f}(\xi)|^2 (1+|\xi|^2)\,d\xi < \infty

is automatically continuous, and satisfies in particular

\delta[f]=|f(0)| < C \|f\|_{H^1}.

Thus δ is a bounded linear functional on the Sobolev space H¹. Equivalently δ is an element of the continuous dual space H⁻¹ of H¹. More generally, in n dimensions, one has δ ∈ H^−s(Rⁿ) provided s > n / 2.

Spaces of holomorphic functions

In complex analysis, the delta function enters via Cauchy's integral formula which asserts that if D is a domain in the complex plane with smooth boundary, then

f(z) = \frac{1}{2\pi i} \oint_{\partial D} \frac{f(\zeta)\,d\zeta}{\zeta-z},\quad z\in D

for all holomorphic functions f in D that are continuous on the closure of D. As a result, the delta function δ_z is represented on this class of holomorphic functions by the Cauchy integral:

\delta_z[f] = f(z) = \frac{1}{2\pi i} \oint_{\partial D} \frac{f(\zeta)\,d\zeta}{\zeta-z}.

More generally, let H²(∂D) be the Hardy space consisting of the closure in L²(∂D) of all holomorphic functions in D continuous up to the boundary of D. Then functions in H²(∂D) uniquely extend to holomorphic functions in D, and the Cauchy integral formula continues to hold. In particular for z ∈ D, the delta function δ_z is a continuous linear functional on H²(∂D). This is a special case of the situation in several complex variables in which, for smooth domains D, the Szegő kernel plays the role of the Cauchy integral.

Resolutions of the identity

Given a complete orthonormal basis set of functions {φ_n} in a separable Hilbert space, for example, the normalized eigenvectors of a compact self-adjoint operator, any vector f can be expressed as:

f = \sum_{n=1}^\infty \alpha_n \varphi_n.

The coefficients {α_n} are found as:

\alpha_n = \langle \varphi_n, f \rangle,

which may be represented by the notation:

\alpha_n = \varphi_n^\dagger f,

a form of the bra–ket notation of Dirac.^[55] Adopting this notation, the expansion of f takes the dyadic form:^[56]

f = \sum_{n=1}^\infty \varphi_n \left ( \varphi_n^\dagger f \right).

Letting I denote the identity operator on the Hilbert space, the expression

I = \sum_{n=1}^\infty \varphi_n \varphi_n^\dagger,

is called a resolution of the identity. When the Hilbert space is the space L²(D) of square-integrable functions on a domain D, the quantity:

\varphi_n \varphi_n^\dagger,

is an integral operator, and the expression for f can be rewritten as:

f(x) = \sum_{n=1}^\infty \int_D\, \left( \varphi_n (x) \varphi_n^*(\xi)\right) f(\xi) \, d \xi.

The right-hand side converges to f in the L² sense. It need not hold in a pointwise sense, even when f is a continuous function. Nevertheless, it is common to abuse notation and write

f(x) = \int \, \delta(x-\xi) f (\xi)\, d\xi,

resulting in the representation of the delta function:^[57]

\delta(x-\xi) = \sum_{n=1}^\infty \varphi_n (x) \varphi_n^*(\xi).

With a suitable rigged Hilbert space (Φ, L²(D), Φ*) where Φ ⊂ L²(D) contains all compactly supported smooth functions, this summation may converge in Φ*, depending on the properties of the basis φ_n. In most cases of practical interest, the orthonormal basis comes from an integral or differential operator, in which case the series converges in the distribution sense.^[58]

Infinitesimal delta functions

Cauchy used an infinitesimal α to write down a unit impulse, infinitely tall and narrow Dirac-type delta function δ_α satisfying

\int F(x)\delta_\alpha(x) = F(0)

in a number of articles in 1827.^[59] Cauchy defined an infinitesimal in Cours d'Analyse (1827) in terms of a sequence tending to zero. Namely, such a null sequence becomes an infinitesimal in Cauchy's and Lazare Carnot's terminology.

Non-standard analysis allows one to rigorously treat infinitesimals. The article by Yamashita (2007) contains a bibliography on modern Dirac delta functions in the context of an infinitesimal-enriched continuum provided by the hyperreals. Here the Dirac delta can be given by an actual function, having the property that for every real function F one has

\int F(x)\delta_\alpha(x) = F(0)

as anticipated by Fourier and Cauchy.

Dirac comb

A Dirac comb is an infinite series of Dirac delta functions spaced at intervals of T

A so-called uniform "pulse train" of Dirac delta measures, which is known as a Dirac comb, or as the Shah distribution, creates a sampling function, often used in digital signal processing (DSP) and discrete time signal analysis. The Dirac comb is given as the infinite sum, whose limit is understood in the distribution sense,

\Delta(x) = \sum_{n=-\infty}^\infty \delta(x-n),

which is a sequence of point masses at each of the integers.

Up to an overall normalizing constant, the Dirac comb is equal to its own Fourier transform. This is significant because if f is any Schwartz function, then the periodization of f is given by the convolution

(f*\Delta)(x) = \sum_{n=-\infty}^\infty f(x-n).

In particular,

(f*\Delta)^\wedge = \hat{f}\widehat{\Delta} = \hat{f}\Delta

is precisely the Poisson summation formula.^[60]

Sokhotski–Plemelj theorem

The Sokhotski–Plemelj theorem, important in quantum mechanics, relates the delta function to the distribution p.v.1/x, the Cauchy principal value of the function 1/x, defined by

\left\langle\operatorname{p.v.}\frac{1}{x}, \phi\right\rangle = \lim_{\varepsilon\to 0^+}\int_{|x|>\varepsilon} \frac{\phi(x)}{x}\,dx.

Sokhotsky's formula states that^[61]

\lim_{\varepsilon\to 0^+} \frac{1}{x\pm i\varepsilon} = \operatorname{p.v.}\frac{1}{x} \mp i\pi\delta(x),

Here the limit is understood in the distribution sense, that for all compactly supported smooth functions f,

\lim_{\varepsilon\to 0^+} \int_{-\infty}^\infty\frac{f(x)}{x\pm i\varepsilon}\,dx = \mp i\pi f(0) + \lim_{\varepsilon\to 0^+} \int_{|x|>\varepsilon}\frac{f(x)}{x}\,dx.

Relationship to the Kronecker delta

The Kronecker delta δ_ij is the quantity defined by

\delta_{ij} = \begin{cases} 1 & i=j\\ 0 &i\not=j \end{cases}

for all integers i, j. This function then satisfies the following analog of the sifting property: if

(a_i)_{i \in \mathbf{Z}}

is any doubly infinite sequence, then

\sum_{i=-\infty}^\infty a_i \delta_{ik}=a_k.

Similarly, for any real or complex valued continuous function f on R, the Dirac delta satisfies the sifting property

\int_{-\infty}^\infty f(x)\delta(x-x_0)\,dx=f(x_0).

This exhibits the Kronecker delta function as a discrete analog of the Dirac delta function.^[62]

Applications

Probability theory

In probability theory and statistics, the Dirac delta function is often used to represent a discrete distribution, or a partially discrete, partially continuous distribution, using a probability density function (which is normally used to represent fully continuous distributions). For example, the probability density function f(x) of a discrete distribution consisting of points x = {x₁, ..., x_n}, with corresponding probabilities p₁, ..., p_n, can be written as

f(x) = \sum_{i=1}^n p_i \delta(x-x_i).

As another example, consider a distribution which 6/10 of the time returns a standard normal distribution, and 4/10 of the time returns exactly the value 3.5 (i.e. a partly continuous, partly discrete mixture distribution). The density function of this distribution can be written as

f(x) = 0.6 \, \frac {1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} + 0.4 \, \delta(x-3.5).

The delta function is also used in a completely different way to represent the local time of a diffusion process (like Brownian motion). The local time of a stochastic process B(t) is given by

\ell(x,t) = \int_0^t \delta(x-B(s))\,ds

and represents the amount of time that the process spends at the point x in the range of the process. More precisely, in one dimension this integral can be written

\ell(x,t) = \lim_{\varepsilon\to 0^+}\frac{1}{2\varepsilon}\int_0^t \mathbf{1}_{[x-\varepsilon,x+\varepsilon]}(B(s))\,ds

where 1_{[x−ε, x+ε]} is the indicator function of the interval [x−ε, x+ε].

Quantum mechanics

We give an example of how the delta function is expedient in quantum mechanics. The wave function of a particle gives the probability amplitude of finding a particle within a given region of space. Wave functions are assumed to be elements of the Hilbert space L² of square-integrable functions, and the total probability of finding a particle within a given interval is the integral of the magnitude of the wave function squared over the interval. A set {φ_n} of wave functions is orthonormal if they are normalized by

\langle\phi_n|\phi_m\rangle = \delta_{nm}

where δ here refers to the Kronecker delta. A set of orthonormal wave functions is complete in the space of square-integrable functions if any wave function ψ can be expressed as a combination of the φ_n:

\psi = \sum c_n \phi_n,

with

c_n = \langle \phi_n | \psi \rangle

. Complete orthonormal systems of wave functions appear naturally as the eigenfunctions of the Hamiltonian (of a bound system) in quantum mechanics that measures the energy levels, which are called the eigenvalues. The set of eigenvalues, in this case, is known as the spectrum of the Hamiltonian. In bra–ket notation, as above, this equality implies the resolution of the identity:

I = \sum |\phi_n\rangle\langle\phi_n|.

Here the eigenvalues are assumed to be discrete, but the set of eigenvalues of an observable may be continuous rather than discrete. An example is the position observable, Qψ(x) = xψ(x). The spectrum of the position (in one dimension) is the entire real line, and is called a continuous spectrum. However, unlike the Hamiltonian, the position operator lacks proper eigenfunctions. The conventional way to overcome this shortcoming is to widen the class of available functions by allowing distributions as well: that is, to replace the Hilbert space of quantum mechanics by an appropriate rigged Hilbert space.^[63] In this context, the position operator has a complete set of eigen-distributions, labeled by the points y of the real line, given by

\phi_y(x) = \delta(x-y).\;

The eigenfunctions of position are denoted by

\phi_y = |y\rangle

in Dirac notation, and are known as position eigenstates.
Similar considerations apply to the eigenstates of the momentum operator, or indeed any other self-adjoint unbounded operator P on the Hilbert space, provided the spectrum of P is continuous and there are no degenerate eigenvalues. In that case, there is a set Ω of real numbers (the spectrum), and a collection φ_y of distributions indexed by the elements of Ω, such that

P\phi_y = y\phi_y.\;

That is, φ_y are the eigenvectors of P. If the eigenvectors are normalized so that

\langle \phi_y,\phi_{y'}\rangle = \delta(y-y')

in the distribution sense, then for any test function ψ,

\psi(x) = \int_\Omega c(y) \phi_y(x) \, dy

where

c(y) = \langle \psi, \phi_y \rangle.

That is, as in the discrete case, there is a resolution of the identity

I = \int_\Omega |\phi_y\rangle\, \langle\phi_y|\,dy

where the operator-valued integral is again understood in the weak sense. If the spectrum of P has both continuous and discrete parts, then the resolution of the identity involves a summation over the discrete spectrum and an integral over the continuous spectrum.

The delta function also has many more specialized applications in quantum mechanics, such as the delta potential models for a single and double potential well.

Structural mechanics

The delta function can be used in structural mechanics to describe transient loads or point loads acting on structures. The governing equation of a simple mass–spring system excited by a sudden force impulse I at time t = 0 can be written

m \frac{\mathrm{d}^2 \xi}{\mathrm{d} t^2} + k \xi = I \delta(t),

where m is the mass, ξ the deflection and k the spring constant.

As another example, the equation governing the static deflection of a slender beam is, according to Euler–Bernoulli theory,

EI \frac{\mathrm{d}^4 w}{\mathrm{d} x^4} = q(x),\,

where EI is the bending stiffness of the beam, w the deflection, x the spatial coordinate and q(x) the load distribution. If a beam is loaded by a point force F at x = x₀, the load distribution is written

q(x) = F \delta(x-x_0).\,

As integration of the delta function results in the Heaviside step function, it follows that the static deflection of a slender beam subject to multiple point loads is described by a set of piecewise polynomials.

Also a point moment acting on a beam can be described by delta functions. Consider two opposing point forces F at a distance d apart. They then produce a moment M = Fd acting on the beam. Now, let the distance d approach the limit zero, while M is kept constant. The load distribution, assuming a clockwise moment acting at x = 0, is written

\begin{align} q(x) &= \lim_{d \to 0} \Big( F \delta(x) - F \delta(x-d) \Big) \\ &= \lim_{d \to 0} \left( \frac{M}{d} \delta(x) - \frac{M}{d} \delta(x-d) \right) \\ &= M \lim_{d \to 0} \frac{\delta(x) - \delta(x - d)}{d}\\ &= M \delta'(x). \end{align}

Point moments can thus be represented by the derivative of the delta function. Integration of the beam equation again results in piecewise polynomial deflection.

Wednesday, May 20, 2015

Moore's Law Keeps Going, Defying Expectations

It’s a mystery why Gordon Moore’s “law,” which forecasts processor power will double every two years, still holds true a half century later

By Annie Sneed | May 19, 2015

Original link: http://www.scientificamerican.com/article/moore-s-law-keeps-going-defying-expectations/?WT.mc_id=SA_Facebook

*Credit: Jon Sullivan/Wikimedia Commons*

SAN FRANCISCO—Personal computers, cellphones, self-driving cars—Gordon Moore predicted the invention of all these technologies half a century ago in a 1965 article for Electronics magazine. The enabling force behind those inventions would be computing power, and Moore laid out how he thought computing power would evolve over the coming decade. Last week the tech world celebrated his prediction here because it has held true with uncanny accuracy—for the past 50 years.

It is now called Moore’s law, although Moore (who co-founded the chip maker Intel) doesn’t much like the name. “For the first 20 years I couldn’t utter the term Moore’s law. It was embarrassing,” the 86-year-old visionary said in an interview with New York Times columnist Thomas Friedman at the gala event, held at Exploratorium science museum. “Finally, I got accustomed to it where now I could say it with a straight face.” He and Friedman chatted in front of a rapt audience, with Moore cracking jokes the whole time and doling out advice, like how once you’ve made one successful prediction, you should avoid making another. In the background Intel’s latest gadgets whirred quietly: collision-avoidance drones, dancing spider robots, a braille printer—technologies all made possible via advances in processing power anticipated by Moore’s law.

Of course, Moore’s law is not really a law like those describing gravity or the conservation of energy. It is a prediction that the number of transistors (a computer’s electrical switches used to represent 0s and 1s) that can fit on a silicon chip will double every two years as technology advances. This leads to incredibly fast growth in computing power without a concomitant expense and has led to laptops and pocket-size gadgets with enormous processing ability at fairly low prices. Advances under Moore’s law have also enabled smartphone verbal search technologies such as Siri—it takes enormous computing power to analyze spoken words, turn them into digital representations of sound and then interpret them to give a spoken answer in a matter of seconds.

Another way to think about Moore’s law is to apply it to a car. Intel CEO Brian Krzanich explained that if a 1971 Volkswagen Beetle had advanced at the pace of Moore’s law over the past 34 years, today “you would be able to go with that car 300,000 miles per hour. You would get two million miles per gallon of gas, and all that for the mere cost of four cents.”

Moore anticipated the two-year doubling trend based on what he had seen happen in the early years of computer-chip manufacture. In his 1965 paper he plotted the number of transistors that fit on a chip since 1959 and saw a pattern of yearly doubling that he then extrapolated for the next 10 years. (He later revised the trend to a doubling about every two years.) “Moore was just making an observation,” says Peter Denning, a computer scientist at the Naval Postgraduate School in California. “He was the head of research at Fairchild Semiconductor and wanted to look down the road at how much computing power they’d have in a decade. And in 1975 his prediction came pretty darn close.”

But Moore never thought his prediction would last 50 years. “The original prediction was to look at 10 years, which I thought was a stretch,” he told Friedman last week, “This was going from about 60 elements on an integrated circuit to 60,000—a 1,000-fold extrapolation over 10 years. I thought that was pretty wild. The fact that something similar is going on for 50 years is truly amazing.”

Just why Moore’s law has endured so long is hard to say. His doubling prediction turned into an industry objective for competing companies. “It might be a self-fulfilling law,” Denning explains. But it is not clear why it is a constant doubling every couple of years, as opposed to a different rate or fluctuating spikes in progress. “Science has mysteries, and in some ways this is one of those mysteries,” Denning adds. Certainly, if the rate could have gone faster, someone would have done it, notes computer scientist Calvin Lin of the University of Texas at Austin.

Many technologists have forecast the demise of Moore’s doubling over the years, and Moore himself states that this exponential growth can’t last forever. Still, his law persists today, and hence the computational growth it predicts will continue to profoundly change our world. As he put it: “We’ve just seen the beginning of what computers are going to do for us.”

Holographic principle

From Wikipedia, the free encyclopedia

The holographic principle is a property of string theories and a supposed property of quantum gravity that states that the description of a volume of space can be thought of as encoded on a boundary to the region—preferably a light-like boundary like a gravitational horizon. First proposed by Gerard 't Hooft, it was given a precise string-theory interpretation by Leonard Susskind^[1] who combined his ideas with previous ones of 't Hooft and Charles Thorn.^[1]^[2] As pointed out by Raphael Bousso,^[3] Thorn observed in 1978 that string theory admits a lower-dimensional description in which gravity emerges from it in what would now be called a holographic way.

In a larger sense, the theory suggests that the entire universe can be seen as a two-dimensional information structure "painted" on the cosmological horizon^{[clarification needed]}, such that the three dimensions we observe are an effective description only at macroscopic scales and at low energies. Cosmological holography has not been made mathematically precise, partly because the particle horizon has a non-zero area and grows with time.^[4]^[5]

The holographic principle was inspired by black hole thermodynamics, which conjectures that the maximal entropy in any region scales with the radius squared, and not cubed as might be expected. In the case of a black hole, the insight was that the informational content of all the objects that have fallen into the hole might be entirely contained in surface fluctuations of the event horizon. The holographic principle resolves the black hole information paradox within the framework of string theory.^[6] However, there exist classical solutions to the Einstein equations that allow values of the entropy larger than those allowed by an area law, hence in principle larger than those of a black hole.
These are the so-called "Wheeler's bags of gold". The existence of such solutions conflicts with the holographic interpretation, and their effects in a quantum theory of gravity including the holographic principle are not yet fully understood.^[7]

Black hole entropy

An object with entropy is microscopically random, like a hot gas. A known configuration of classical fields has zero entropy: there is nothing random about electric and magnetic fields, or gravitational waves. Since black holes are exact solutions of Einstein's equations, they were thought not to have any entropy either.

But Jacob Bekenstein noted that this leads to a violation of the second law of thermodynamics. If one throws a hot gas with entropy into a black hole, once it crosses the event horizon, the entropy would disappear. The random properties of the gas would no longer be seen once the black hole had absorbed the gas and settled down. One way of salvaging the second law is if black holes are in fact random objects, with an enormous entropy whose increase is greater than the entropy carried by the gas.

Bekenstein assumed that black holes are maximum entropy objects—that they have more entropy than anything else in the same volume. In a sphere of radius R, the entropy in a relativistic gas increases as the energy increases. The only known limit is gravitational; when there is too much energy the gas collapses into a black hole. Bekenstein used this to put an upper bound on the entropy in a region of space, and the bound was proportional to the area of the region. He concluded that the black hole entropy is directly proportional to the area of the event horizon.^[8]

Stephen Hawking had shown earlier that the total horizon area of a collection of black holes always increases with time. The horizon is a boundary defined by light-like geodesics; it is those light rays that are just barely unable to escape. If neighboring geodesics start moving toward each other they eventually collide, at which point their extension is inside the black hole. So the geodesics are always moving apart, and the number of geodesics which generate the boundary, the area of the horizon, always increases. Hawking's result was called the second law of black hole thermodynamics, by analogy with the law of entropy increase, but at first, he did not take the analogy too seriously.

Hawking knew that if the horizon area were an actual entropy, black holes would have to radiate. When heat is added to a thermal system, the change in entropy is the increase in mass-energy divided by temperature:

{\rm d}S = \frac{{\rm d}M}{T}.

If black holes have a finite entropy, they should also have a finite temperature. In particular, they would come to equilibrium with a thermal gas of photons. This means that black holes would not only absorb photons, but they would also have to emit them in the right amount to maintain detailed balance.

Time independent solutions to field equations do not emit radiation, because a time independent background conserves energy. Based on this principle, Hawking set out to show that black holes do not radiate. But, to his surprise, a careful analysis convinced him that they do, and in just the right way to come to equilibrium with a gas at a finite temperature. Hawking's calculation fixed the constant of proportionality at 1/4; the entropy of a black hole is one quarter its horizon area in Planck units.^[9]

The entropy is proportional to the logarithm of the number of microstates, the ways a system can be configured microscopically while leaving the macroscopic description unchanged. Black hole entropy is deeply puzzling — it says that the logarithm of the number of states of a black hole is proportional to the area of the horizon, not the volume in the interior.^[10]

Later, Raphael Bousso came up with a covariant version of the bound based upon null sheets.

Black hole information paradox

Hawking's calculation suggested that the radiation which black holes emit is not related in any way to the matter that they absorb. The outgoing light rays start exactly at the edge of the black hole and spend a long time near the horizon, while the infalling matter only reaches the horizon much later. The infalling and outgoing mass/energy only interact when they cross. It is implausible that the outgoing state would be completely determined by some tiny residual scattering.
Hawking interpreted this to mean that when black holes absorb some photons in a pure state described by a wave function, they re-emit new photons in a thermal mixed state described by a density matrix. This would mean that quantum mechanics would have to be modified, because in quantum mechanics, states which are superpositions with probability amplitudes never become states which are probabilistic mixtures of different possibilities.^{[note 1]}

Troubled by this paradox, Gerard 't Hooft analyzed the emission of Hawking radiation in more detail. He noted that when Hawking radiation escapes, there is a way in which incoming particles can modify the outgoing particles. Their gravitational field would deform the horizon of the black hole, and the deformed horizon could produce different outgoing particles than the undeformed horizon. When a particle falls into a black hole, it is boosted relative to an outside observer, and its gravitational field assumes a universal form. 't Hooft showed that this field makes a logarithmic tent-pole shaped bump on the horizon of a black hole, and like a shadow, the bump is an alternate description of the particle's location and mass. For a four-dimensional spherical uncharged black hole, the deformation of the horizon is similar to the type of deformation which describes the emission and absorption of particles on a string-theory world sheet. Since the deformations on the surface are the only imprint of the incoming particle, and since these deformations would have to completely determine the outgoing particles, 't Hooft believed that the correct description of the black hole would be by some form of string theory.

This idea was made more precise by Leonard Susskind, who had also been developing holography, largely independently. Susskind argued that the oscillation of the horizon of a black hole is a complete description^{[note 2]} of both the infalling and outgoing matter, because the world-sheet theory of string theory was just such a holographic description. While short strings have zero entropy, he could identify long highly excited string states with ordinary black holes. This was a deep advance because it revealed that strings have a classical interpretation in terms of black holes.

This work showed that the black hole information paradox is resolved when quantum gravity is described in an unusual string-theoretic way assuming the string-theoretical description is complete, unambiguous and non-redundant.^[12] The space-time in quantum gravity would emerge as an effective description of the theory of oscillations of a lower-dimensional black-hole horizon, and suggest that any black hole with appropriate properties, not just strings, would serve as a basis for a description of string theory.

In 1995, Susskind, along with collaborators Tom Banks, Willy Fischler, and Stephen Shenker, presented a formulation of the new M-theory using a holographic description in terms of charged point black holes, the D0 branes of type IIA string theory. The Matrix theory they proposed was first suggested as a description of two branes in 11-dimensional supergravity by Bernard de Wit, Jens Hoppe, and Hermann Nicolai. The later authors reinterpreted the same matrix models as a description of the dynamics of point black holes in particular limits.
Holography allowed them to conclude that the dynamics of these black holes give a complete non-perturbative formulation of M-theory. In 1997, Juan Maldacena gave the first holographic descriptions of a higher-dimensional object, the 3+1-dimensional type IIB membrane, which resolved a long-standing problem of finding a string description which describes a gauge theory. These developments simultaneously explained how string theory is related to some forms of supersymmetric quantum field theories.

Limit on information density

Entropy, if considered as information (see information entropy), is measured in bits. The total quantity of bits is related to the total degrees of freedom of matter/energy.

For a given energy in a given volume, there is an upper limit to the density of information (the Bekenstein bound) about the whereabouts of all the particles which compose matter in that volume, suggesting that matter itself cannot be subdivided infinitely many times and there must be an ultimate level of fundamental particles. As the degrees of freedom of a particle are the product of all the degrees of freedom of its sub-particles, were a particle to have infinite subdivisions into lower-level particles, then the degrees of freedom of the original particle must be infinite, violating the maximal limit of entropy density. The holographic principle thus implies that the subdivisions must stop at some level, and that the fundamental particle is a bit (1 or 0) of information.

The most rigorous realization of the holographic principle is the AdS/CFT correspondence by Juan Maldacena. However, J.D. Brown and Marc Henneaux had rigorously proved already in 1986, that the asymptotic symmetry of 2+1 dimensional gravity gives rise to a Virasoro algebra, whose corresponding quantum theory is a 2-dimensional conformal field theory.^[13]

High-level summary

The physical universe is widely seen to be composed of "matter" and "energy". In his 2003 article published in Scientific American magazine, Jacob Bekenstein summarized a current trend started by John Archibald Wheeler, which suggests scientists may "regard the physical world as made of information, with energy and matter as incidentals." Bekenstein asks "Could we, as William Blake memorably penned, 'see a world in a grain of sand,' or is that idea no more than 'poetic license,'"^[14] referring to the holographic principle.

Unexpected connection

Bekenstein's topical overview "A Tale of Two Entropies"^[15] describes potentially profound implications of Wheeler's trend, in part by noting a previously unexpected connection between the world of information theory and classical physics. This connection was first described shortly after the seminal 1948 papers of American applied mathematician Claude E. Shannon introduced today's most widely used measure of information content, now known as Shannon entropy. As an objective measure of the quantity of information, Shannon entropy has been enormously useful, as the design of all modern communications and data storage devices, from cellular phones to modems to hard disk drives and DVDs, rely on Shannon entropy.

In thermodynamics (the branch of physics dealing with heat), entropy is popularly described as a measure of the "disorder" in a physical system of matter and energy. In 1877 Austrian physicist Ludwig Boltzmann described it more precisely in terms of the number of distinct microscopic states that the particles composing a macroscopic "chunk" of matter could be in while still looking like the same macroscopic "chunk". As an example, for the air in a room, its thermodynamic entropy would equal the logarithm of the count of all the ways that the individual gas molecules could be distributed in the room, and all the ways they could be moving.

Energy, matter, and information equivalence

Shannon's efforts to find a way to quantify the information contained in, for example, an e-mail message, led him unexpectedly to a formula with the same form as Boltzmann's. In an article in the August 2003 issue of Scientific American titled "Information in the Holographic Universe", Bekenstein summarizes that "Thermodynamic entropy and Shannon entropy are conceptually equivalent: the number of arrangements that are counted by Boltzmann entropy reflects the amount of Shannon information one would need to implement any particular arrangement..." of matter and energy. The only salient difference between the thermodynamic entropy of physics and Shannon's entropy of information is in the units of measure; the former is expressed in units of energy divided by temperature, the latter in essentially dimensionless "bits" of information, and so the difference is merely a matter of convention.

The holographic principle states that the entropy of ordinary mass (not just black holes) is also proportional to surface area and not volume; that volume itself is illusory and the universe is really a hologram which is isomorphic to the information "inscribed" on the surface of its boundary.^[10]

Experimental tests

The Fermilab physicist Craig Hogan claims that the holographic principle would imply quantum fluctuations in spatial position^[16] that would lead to apparent background noise or "holographic noise" measurable at gravitational wave detectors, in particular GEO 600.^[17] However these claims have not been widely accepted, or cited, among quantum gravity researchers and appear to be in direct conflict with string theory calculations.^[18]

Analyses in 2011 of measurements of gamma ray burst GRB 041219A in 2004 by the INTEGRAL space observatory launched in 2002 by the European Space Agency shows that Craig Hogan's noise is absent down to a scale of 10⁻⁴⁸ meters, as opposed to scale of 10⁻³⁵ meters predicted by Hogan, and the scale of 10⁻¹⁶ meters found in measurements of the GEO 600 instrument.^[19] Research continues at Fermilab under Hogan as of 2013.^[20]

Jacob Bekenstein also claims to have found a way to test the holographic principle with a tabletop photon experiment.^[21]

Tests of Maldacena's conjecture

Hyakutake et al. in 2013/4 published two papers^[22] that bring computational evidence that Maldacena’s conjecture is true. One paper computes the internal energy of a black hole, the position of its event horizon, its entropy and other properties based on the predictions of string theory and the effects of virtual particles. The other paper calculates the internal energy of the corresponding lower-dimensional cosmos with no gravity. The two simulations match. The papers are not an actual proof of Maldacena's conjecture for all cases but a demonstration that the conjecture works for a particular theoretical case and a verification of the AdS/CFT correspondence for a particular situation.^[23]

Search This Blog

Thursday, May 21, 2015

Dirac delta function

Overview

History

Definitions

As a measure

As a distribution

Generalizations

Properties

Scaling and symmetry

Algebraic properties

Translation

Composition with a function

Properties in n dimensions

Fourier transform

Distributional derivatives

Higher dimensions

Representations of the delta function

Approximations to the identity

Probabilistic considerations

Semigroups

Oscillatory integrals

Plane wave decomposition

Fourier kernels

Hilbert space theory

Spaces of holomorphic functions

Resolutions of the identity

Infinitesimal delta functions

Dirac comb

Sokhotski–Plemelj theorem

Relationship to the Kronecker delta

Applications

Probability theory

Quantum mechanics

Structural mechanics

Wednesday, May 20, 2015

Moore's Law Keeps Going, Defying Expectations

Holographic principle

Black hole entropy

Black hole information paradox

Limit on information density

High-level summary

Unexpected connection

Energy, matter, and information equivalence

Experimental tests

Tests of Maldacena's conjecture

Geodesic