A Medley of Potpourri

Sunday, August 20, 2023

Peaceful nuclear explosion

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Peaceful_nuclear_explosion

Peaceful nuclear explosions (PNEs) are nuclear explosions conducted for non-military purposes. Proposed uses include excavation for the building of canals and harbours, electrical generation, the use of nuclear explosions to drive spacecraft, and as a form of wide-area fracking. PNEs were an area of some research from the late 1950s into the 1980s, primarily in the United States and Soviet Union.

In the U.S., a series of tests were carried out under Project Plowshare. Some of the ideas considered included blasting a new Panama Canal, constructing the proposed Nicaragua Canal, the use of underground explosions to create electricity (project PACER), and a variety of mining, geological, and radionuclide studies. The largest of the excavation tests was carried out in the Sedan nuclear test in 1962, which released large amounts of radioactive gas into the air. By the late 1960s, public opposition to Plowshare was increasing, and a 1970s study of the economics of the concepts suggested they had no practical use. Plowshare saw decreasing interest from the 1960s, and was officially cancelled in 1977.

The Soviet program started a few years after the U.S. efforts and explored many of the same concepts under their Nuclear Explosions for the National Economy program. The program was more extensive, eventually conducting 239 nuclear explosions. Some of these tests also released radioactivity, including a significant release of plutonium into the groundwater and the polluting of an area near the Volga River. A major part of the program in the 1970s and 80s was the use of very small bombs to produce shock waves as a seismic measuring tool, and as part of these experiments, two bombs were successfully used to seal blown-out oil wells. The program officially ended in 1988.

As part of ongoing arms control efforts, both programs came to be controlled by a variety of agreements. Most notable among these is the 1976 Treaty on Underground Nuclear Explosions for Peaceful Purposes (PNE Treaty). The Comprehensive Nuclear-Test-Ban Treaty of 1996 prohibits all nuclear explosions, regardless of whether they are for peaceful purposes or not. Since that time the topic has been raised several times, often as a method of asteroid impact avoidance.

Peaceful Nuclear Explosions Treaty

In the PNE Treaty, the signatories agreed: not to carry out any individual nuclear explosions having a yield exceeding 150 kilotons TNT equivalent; not to carry out any group explosion (consisting of a number of individual explosions) having an aggregate yield exceeding 1,500 kilotons; and not to carry out any group explosion having an aggregate yield exceeding 150 kilotons unless the individual explosions in the group could be identified and measured by agreed verification procedures. The parties also reaffirmed their obligations to comply fully with the Limited Test Ban Treaty of 1963.

The parties reserve the right to carry out nuclear explosions for peaceful purposes in the territory of another country if requested to do so, but only in full compliance with the yield limitations and other provisions of the PNE Treaty and in accord with the Non-Proliferation Treaty.

Articles IV and V of the PNE Treaty set forth the agreed verification arrangements. In addition to the use of national technical means, the treaty states that information and access to sites of explosions will be provided by each side, and includes a commitment not to interfere with verification means and procedures.

The protocol to the PNE Treaty sets forth the specific agreed arrangements for ensuring that no weapon-related benefits precluded by the Threshold Test Ban Treaty are derived by carrying out a nuclear explosion used for peaceful purposes, including provisions for use of the hydrodynamic yield measurement method, seismic monitoring, and on-site inspection.

The agreed statement that accompanies the treaty specifies that a "peaceful application" of an underground nuclear explosion would not include the developmental testing of any nuclear explosive.

United States: Operation Plowshare

Operation Plowshare was the name of the U.S. program for the development of techniques to use nuclear explosives for peaceful purposes. The name was coined in 1961, taken from Micah 4:3 ("And he shall judge among the nations, and shall rebuke many people: and they shall beat their swords into plowshares, and their spears into pruning hooks: nation shall not lift up sword against nation, neither shall they learn war any more"). Twenty-eight nuclear blasts were detonated between 1961 and 1973.

One of the first U.S. proposals for peaceful nuclear explosions that came close to being carried out was Project Chariot, which would have used several hydrogen bombs to create an artificial harbor at Cape Thompson, Alaska. It was never carried out due to concerns for the native populations and the fact that there was little potential use for the harbor to justify its risk and expense. There was also talk of using nuclear explosions to excavate a second Panama Canal, as well as an alternative to the Suez Canal.

The largest excavation experiment took place in 1962 at the Department of Energy's Nevada Test Site. The Sedan nuclear test carried out as part of Operation Storax displaced 12 million tons of earth, creating the largest man-made crater in the world, generating a large nuclear fallout over Nevada and Utah. Three tests were conducted in order to stimulate natural gas production, but the effort was abandoned as impractical because of cost and radioactive contamination of the gas.

There were many negative impacts from Project Plowshare's 27 nuclear explosions. For example, the Project Gasbuggy site, located 89 kilometres (55 mi) east of Farmington, New Mexico, still contains nuclear contamination from a single subsurface blast in 1967. Other consequences included blighted land, relocated communities, tritium-contaminated water, radioactivity, and fallout from debris being hurled high into the atmosphere. These were ignored and downplayed until the program was terminated in 1977, due in large part to public opposition, after $770 million had been spent on the project.

Soviet Union: Nuclear Explosions for the National Economy

The Soviet Union conducted a much more vigorous program of 239 nuclear tests, some with multiple devices, between 1965 and 1988 under the auspices of Program No. 6—Employment of Nuclear Explosive Technologies in the Interests of National Economy and Program No. 7—Nuclear Explosions for the National Economy.

The initial program was patterned on the U.S. version, with the same basic concepts being studied. One test, Chagan test in January 1965, has been described as a "near clone" of the U.S. Sedan shot. Like Sedan, Chagan also resulted in a massive plume of radioactive material being blown high into the atmosphere, with an estimated 20% of the fission products with it. Detection of the plume over Japan led to accusations by the U.S. that the Soviets had carried out an above-ground test in violation of the Partial Test Ban Treaty, but these charges were later dropped.

The later, and more extensive, "Deep Seismic Sounding" Program focused on the use of much smaller explosions for various geological uses. Some of these tests are considered to be operational, not purely experimental. These included the use of peaceful nuclear explosions to create deep seismic profiles. Compared to the usage of conventional explosives or mechanical methods, nuclear explosions allow the collection of longer seismic profiles (up to several thousand kilometres).

Alexey Yablokov has stated that all PNE technologies have non-nuclear alternatives and that many PNEs actually caused nuclear disasters.

Reports on the successful Soviet use of nuclear explosions in extinguishing out-of-control gas well fires were widely cited in United States policy discussions of options for stopping the 2010 Gulf of Mexico Deepwater Horizon oil spill.

Other nations

Germany at one time considered manufacturing nuclear explosives for civil engineering purposes. In the early 1970s a feasibility study was conducted for a project to build a canal from the Mediterranean Sea to the Qattara Depression in the Western Desert of Egypt using nuclear demolition. This project proposed to use 213 devices, with yields of 1 to 1.5 megatons, detonated at depths of 100 to 500 m (330 to 1,640 ft) to build this canal for the purpose of producing hydroelectric power.

The Smiling Buddha, India's first explosive nuclear device, was described by the Indian Government as a peaceful nuclear explosion.

In Australia, nuclear blasting was proposed as a way of mining iron ore in the Pilbara.

Civil engineering and energy production

Apart from their use as weapons, nuclear explosives have been tested and used, in a similar manner to chemical high explosives, for various non-military uses. These have included large-scale earth moving, isotope production and the stimulation and closing-off of the flow of natural gas.

At the peak of the Atomic Age, the United States initiated Operation Plowshare, involving "peaceful nuclear explosions". The United States Atomic Energy Commission chairman announced that the Plowshare project was intended to "highlight the peaceful applications of nuclear explosive devices and thereby create a climate of world opinion that is more favorable to weapons development and tests". The Operation Plowshare program included 27 nuclear tests designed towards investigating these non-weapon uses from 1961 through 1973. Due to the inability of the U.S. physicists to reduce the fission fraction of low-yield (approximately 1 kiloton) nuclear devices that would have been required for many civil engineering projects, when long-term health and clean-up costs from fission products were included in the cost, there was virtually no economic advantage over conventional explosives except for potentially the very largest projects.

The Qattara Depression Project was developed by Professor Friedrich Bassler during his appointment to the West German ministry of economics in 1968. He put forth a plan to create a Saharan lake and hydroelectric power station by blasting a tunnel between the Mediterranean Sea and the Qattara Depression in Egypt, an area that lies below sea level. The core problem of the entire project was the water supply to the depression. Calculations by Bassler showed that digging a canal or tunnel would be too expensive, therefore Bassler determined that the use of nuclear explosive devices, to excavate the canal or tunnel, would be the most economical. The Egyptian government declined to pursue the idea.

The Soviet Union conducted a much more exhaustive program than Plowshare, with 239 nuclear tests between 1965 and 1988. Furthermore, many of the "tests" were considered economic applications, not tests, in the Nuclear Explosions for the National Economy program.

These included a 30 kilotons explosion being used to close the Uzbekistani Urtabulak gas well in 1966 that had been blowing since 1963, and a few months later a 47 kilotons explosive was used to seal a higher-pressure blowout at the nearby Pamuk gas field. (For more details, see Blowout (well drilling)#Use of nuclear explosions.)

Devices that produced the highest proportion of their yield via fusion-only reactions are possibly the Taiga Soviet peaceful nuclear explosions of the 1970s. Their public records indicate 98% of their 15 kiloton explosive yield was derived from fusion reactions, so only 0.3 kiloton was derived from fission.

The repeated detonation of nuclear devices underground in salt domes, in a somewhat analogous manner to the explosions that power a car’s internal combustion engine (in that it would be a heat engine), has also been proposed as a means of fusion power in what is termed PACER. Other investigated uses for low-yield peaceful nuclear explosions were underground detonations to stimulate, by a process analogous to fracking, the flow of petroleum and natural gas in tight formations; this was developed most in the Soviet Union, with an increase in the production of many well heads being reported.

Terraforming

In 2015, billionaire entrepreneur Elon Musk popularized an approach in which the cold planet Mars could be terraformed by the detonation of high-fusion-yielding thermonuclear devices over the mostly dry-ice icecaps on the planet. Musk's specific plan would not be very feasible within the energy limitations of historically manufactured nuclear devices (ranging in kilotons of TNT-equivalent), therefore requiring major advancement for it to be considered. In part due to these problems, the physicist Michio Kaku (who initially put forward the concept) instead suggests using nuclear reactors in the typical land-based district heating manner to make isolated tropical biomes on the Martian surface.

Comet "Siding Spring" made a close approach to the planet Mars in October 2014.

Alternatively, as nuclear detonations are presently somewhat limited in terms of demonstrated achievable yield, the use of an off-the-shelf nuclear explosive device could be employed to "nudge" a Martian-grazing comet toward a pole of the planet. Impact would be a much more efficient scheme to deliver the required energy, water vapor, greenhouse gases, and other biologically significant volatiles that could begin to quickly terraform Mars. One such opportunity for this occurred in October 2014 when a "once-in-a-million-years" comet (designated as C/2013 A1, also known as comet "Siding Spring") came within 140000 km (87000 mi) of the Martian atmosphere.

Physics

The element einsteinium was first discovered, in minute quantities, following the analysis of the fallout from the first thermonuclear atmospheric test.

The discovery and synthesis of new chemical elements by nuclear transmutation, and their production in the necessary quantities to allow study of their properties, was carried out in nuclear explosive device testing. For example, the discovery of the short-lived einsteinium and fermium, both created under the intense neutron flux environment within thermonuclear explosions, followed the first Teller–Ulam thermonuclear device test—Ivy Mike. The rapid capture of so many neutrons required in the synthesis of einsteinium would provide the needed direct experimental confirmation of the so-called r-process, the multiple neutron absorptions needed to explain the cosmic nucleosynthesis (production) of all chemical elements heavier than nickel on the periodic table in supernova explosions, before beta decay, with the r-process explaining the existence of many stable elements in the universe.

The worldwide presence of new isotopes from atmospheric testing beginning in the 1950s led to the 2008 development of a reliable way to detect art forgeries. Paintings created after that period may contain traces of caesium-137 and strontium-90, isotopes that did not exist in nature before 1945. (Fission products were produced in the natural nuclear fission reactor at Oklo about 1.7 billion years ago, but these decayed away before the earliest known human painting.)

Both climatology and particularly aerosol science, a subfield of atmospheric science, were largely created to answer the question of how far and wide fallout would travel. Similar to radioactive tracers used in hydrology and materials testing, fallout and the neutron activation of nitrogen gas served as a radioactive tracer that was used to measure and then help model global circulations in the atmosphere by following the movements of fallout aerosols.

After the Van Allen Belts surrounding Earth were discovered about in 1958, James Van Allen suggested that a nuclear detonation would be one way of probing the magnetic phenomenon. Data obtained from the August 1958 Project Argus test shots, a high-altitude nuclear explosion investigation, were vital to the early understanding of Earth's magnetosphere.

An artist's conception of the NASA reference design for the Project Orion spacecraft powered by nuclear pulse propulsion

Soviet nuclear physicist and Nobel peace prize recipient Andrei Sakharov also proposed the idea that earthquakes could be mitigated and particle accelerators could be made by utilizing nuclear explosions, with the latter created by connecting a nuclear explosive device with another of his inventions, the explosively pumped flux compression generator, to accelerate protons to collide with each other to probe their inner workings, an endeavor that is now done at much lower energy levels with non-explosive superconducting magnets in CERN. Sakharov suggested to replace the copper coil in his MK generators by a big superconductor solenoid to magnetically compress and focus underground nuclear explosions into a shaped charge effect. He theorized this could focus 10²³ positively charged protons per second on a 1 mm² surface, then envisaged making two such beams collide in the form of a supercollider.

Underground nuclear explosive data from peaceful nuclear explosion test shots have been used to investigate the composition of Earth's mantle, analogous to the exploration geophysics practice of mineral prospecting with chemical explosives in "deep seismic sounding" reflection seismology.

Project A119, proposed in the 1960s, which as Apollo scientist Gary Latham explained, would have been the detonating of a "smallish" nuclear device on the Moon in order to facilitate research into its geologic make-up. Analogous in concept to the comparatively low yield explosion created by the water prospecting (LCROSS) Lunar Crater Observation and Sensing Satellite mission, which launched in 2009 and released the "Centaur" kinetic energy impactor, an impactor with a mass of 2,305 kg (5,081 lb), and an impact velocity of about 9,000 km/h (5,600 mph), releasing the kinetic energy equivalent of detonating approximately 2 tons of TNT (8.86 GJ).

Propulsion use

The first preliminary examination of the effects of nuclear detonations upon various metal and non-metal materials, occurred in 1955 with Operation Teapot, were a chain of approximately basketball sized spheres of material, were arrayed at fixed aerial distances, descending from the shot tower. In what was then a surprising experimental observation, all but the spheres directly within the shot tower survived, with the greatest ablation noted on the aluminum sphere located 18 metres (60 ft) from the detonation point, with slightly over 25 millimetres (1 in) of surface material absent upon recovery. These spheres are often referred to as "Lew Allen's balls", after the project manager during the experiments.

The ablation data collected for various materials and the distances the spheres were propelled, serve as the bedrock for the nuclear pulse propulsion study, Project Orion. The direct use of nuclear explosives, by using the impact of ablated propellant plasma from a nuclear shaped charge acting on the rear pusher plate of a ship, was and continues to be seriously studied as a potential propulsion mechanism.

Although likely never achieving orbit due to aerodynamic drag, the first macroscopic object to obtain Earth orbital velocity was a "900kg manhole cover" propelled by the somewhat focused detonation of test shot Pascal-B in August 1957. The use of a subterranean shaft and nuclear device to propel an object to escape velocity has since been termed a "thunder well".

In the 1970s Edward Teller, in the United States, popularized the concept of using a nuclear detonation to power an explosively pumped soft X-ray laser as a component of a ballistic missile defense shield known as Project Excalibur. This created dozens of highly focused X-ray beams that would cause the missile to break up due to laser ablation.

Laser ablation is one of the damage mechanisms of a laser weapon, but it is also one of the researched methods behind pulsed laser propulsion intended for spacecraft, though usually powered by means of conventionally pumped, laser arrays. For example, ground flight testing by Professor Leik Myrabo, using a non-nuclear, conventionally powered pulsed laser test-bed, successfully lifted a lightcraft 72 meters in altitude by a method similar to ablative laser propulsion in 2000.

A powerful solar system based soft X-ray, to ultraviolet, laser system has been calculated to be capable of propelling an interstellar spacecraft, by the light sail principle, to 11% of the speed of light. In 1972 it was also calculated that a 1 Terawatt, 1-km diameter X-ray laser with 1 angstrom wavelength impinging on a 1-km diameter sail, could propel a spacecraft to Alpha Centauri in 10 years.

Asteroid impact avoidance

A proposed means of averting an asteroid impacting with Earth, assuming short lead times between detection and Earth impact, is to detonate one, or a series, of nuclear explosive devices, on, in, or in a stand-off proximity orientation with the asteroid, with the latter method occurring far enough away from the incoming threat to prevent the potential fracturing of the near-Earth object, but still close enough to generate a high thrust laser ablation effect.

A 2007 NASA analysis of impact avoidance strategies using various technologies stated:

Nuclear stand-off explosions are assessed to be 10–100 times more effective than the non-nuclear alternatives analyzed in this study. Other techniques involving the surface or subsurface use of nuclear explosives may be more efficient, but they run an increased risk of fracturing the target near-Earth object. They also carry higher development and operations risks.

Linear least squares

From Wikipedia, the free encyclopedia

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

Main formulations

The three main linear least squares formulations are:

Ordinary least squares (OLS) is the most common estimator. OLS estimates are commonly used to analyze both experimental and observational data.
The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter vector β: $\hat{β} = (X^{T} X)^{- 1} X^{T} y,$ where $y$ is a vector whose ith element is the ith observation of the dependent variable, and $X$ is a matrix whose ij element is the ith observation of the jth independent variable. The estimator is unbiased and consistent if the errors have finite variance and are uncorrelated with the regressors: $E [x_{i} ε_{i}] = 0,$ where $x_{i}$ is the transpose of row i of the matrix $X .$ It is also efficient under the assumption that the errors have finite variance and are homoscedastic, meaning that E[ε_i²|x_i] does not depend on i. The condition that the errors are uncorrelated with the regressors will generally be satisfied in an experiment, but in the case of observational data, it is difficult to exclude the possibility of an omitted covariate z that is related to both the observed covariates and the response variable. The existence of such a covariate will generally lead to a correlation between the regressors and the response variable, and hence to an inconsistent estimator of β. The condition of homoscedasticity can fail with either experimental or observational data. If the goal is either inference or predictive modeling, the performance of OLS estimates can be poor if multicollinearity is present, unless the sample size is large.
Weighted least squares (WLS) are used when heteroscedasticity is present in the error terms of the model.
Generalized least squares (GLS) is an extension of the OLS method, that allows efficient estimation of β when either heteroscedasticity, or correlations, or both are present among the error terms of the model, as long as the form of heteroscedasticity and correlation is known independently of the data. To handle heteroscedasticity when the error terms are uncorrelated with each other, GLS minimizes a weighted analogue to the sum of squared residuals from OLS regression, where the weight for the i^th case is inversely proportional to var(ε_i). This special case of GLS is called "weighted least squares". The GLS solution to an estimation problem is $\hat{β} = (X^{T} Ω^{- 1} X)^{- 1} X^{T} Ω^{- 1} y,$ where Ω is the covariance matrix of the errors. GLS can be viewed as applying a linear transformation to the data so that the assumptions of OLS are met for the transformed data. For GLS to be applied, the covariance structure of the errors must be known up to a multiplicative constant.

Alternative formulations

Other formulations include:

Iteratively reweighted least squares (IRLS) is used when heteroscedasticity, or correlations, or both are present among the error terms of the model, but where little is known about the covariance structure of the errors independently of the data. In the first iteration, OLS, or GLS with a provisional covariance structure is carried out, and the residuals are obtained from the fit. Based on the residuals, an improved estimate of the covariance structure of the errors can usually be obtained. A subsequent GLS iteration is then performed using this estimate of the error structure to define the weights. The process can be iterated to convergence, but in many cases, only one iteration is sufficient to achieve an efficient estimate of β.
Instrumental variables regression (IV) can be performed when the regressors are correlated with the errors. In this case, we need the existence of some auxiliary instrumental variables z_i such that E[z_iε_i] = 0. If Z is the matrix of instruments, then the estimator can be given in closed form as $\hat{β} = (X^{T} Z (Z^{T} Z)^{- 1} Z^{T} X)^{- 1} X^{T} Z (Z^{T} Z)^{- 1} Z^{T} y .$ Optimal instruments regression is an extension of classical IV regression to the situation where $E[ε i | z i] = 0$ .
Total least squares (TLS) is an approach to least squares estimation of the linear regression model that treats the covariates and response variable in a more geometrically symmetric manner than OLS. It is one approach to handling the "errors in variables" problem, and is also sometimes used even when the covariates are assumed to be error-free.

Linear Template Fit (LTF) combines a linear regression with (generalized) least squares in order to determine the best estimator. The Linear Template Fit addresses the frequent issue, when the residuals cannot be expressed analytically or are too time consuming to be evaluate repeatedly, as it is often the case in iterative minimization algorithms. In the Linear Template Fit, the residuals are estimated from the random variables and from a linear approximation of the underlying true model, while the true model needs to be provided for at least $n + 1$ (were $n$ is the number of estimators) distinct reference values β. The true distribution is then approximated by a linear regression, and the best estimators are obtained in closed form as $\hat{β} = ((Y \tilde{M})^{T} Ω^{- 1} Y \tilde{M})^{- 1} (Y \tilde{M})^{T} Ω^{- 1} (d - Y \bar{m}),$ where $Y$ denotes the template matrix with the values of the known or previously determined model for any of the reference values β, $d$ are the random variables (e.g. a measurement), and the matrix $\tilde{M}$ and the vector $\tilde{m}$ are calculated from the values of β. The LTF can also be expressed for Log-normal distribution distributed random variables. A generalization of the LTF is the Quadratic Template Fit, which assumes a second order regression of the model, requires predictions for at least $n^{2} + 2 n$ distinct values β, and it finds the best estimator using Newton's method.

Percentage least squares focuses on reducing percentage errors, which is useful in the field of forecasting or time series analysis. It is also useful in situations where the dependent variable has a wide range without constant variance, as here the larger residuals at the upper end of the range would dominate if OLS were used. When the percentage or relative error is normally distributed, least squares percentage regression provides maximum likelihood estimates. Percentage regression is linked to a multiplicative error model, whereas OLS is linked to models containing an additive error term.

Constrained least squares, indicates a linear least squares problem with additional constraints on the solution.

Objective function

In OLS (i.e., assuming unweighted observations), the optimal value of the objective function is found by substituting the optimal expression for the coefficient vector:

S = y^{T} (I - H)^{T} (I - H) y = y^{T} (I - H) y,

where

H = X (X^{T} X)^{- 1} X^{T}

, the latter equality holding since

(I - H)

is symmetric and idempotent. It can be shown from this that under an appropriate assignment of weights the expected value of S is m − n. If instead unit weights are assumed, the expected value of S is

(m - n) σ^{2}

, where

σ^{2}

is the variance of each observation.

If it is assumed that the residuals belong to a normal distribution, the objective function, being a sum of weighted squared residuals, will belong to a chi-squared ( $χ^{2}$ ) distribution with m − n degrees of freedom. Some illustrative percentile values of $χ^{2}$ are given in the following table.

$m - n$	$χ_{0.50}^{2}$	$χ_{0.95}^{2}$	$χ_{0.99}^{2}$
10	9.34	18.3	23.2
25	24.3	37.7	44.3
100	99.3	124	136

These values can be used for a statistical criterion as to the goodness of fit. When unit weights are used, the numbers should be divided by the variance of an observation.

For WLS, the ordinary objective function above is replaced for a weighted average of residuals.

Discussion

In statistics and mathematics, linear least squares is an approach to fitting a mathematical or statistical model to data in cases where the idealized value provided by the model for any data point is expressed linearly in terms of the unknown parameters of the model. The resulting fitted model can be used to summarize the data, to predict unobserved values from the same system, and to understand the mechanisms that may underlie the system.

Mathematically, linear least squares is the problem of approximately solving an overdetermined system of linear equations A x = b, where b is not an element of the column space of the matrix A. The approximate solution is realized as an exact solution to A x = b', where b' is the projection of b onto the column space of A. The best approximation is then that which minimizes the sum of squared differences between the data values and their corresponding modeled values. The approach is called linear least squares since the assumed function is linear in the parameters to be estimated. Linear least squares problems are convex and have a closed-form solution that is unique, provided that the number of data points used for fitting equals or exceeds the number of unknown parameters, except in special degenerate situations. In contrast, non-linear least squares problems generally must be solved by an iterative procedure, and the problems can be non-convex with multiple optima for the objective function. If prior distributions are available, then even an underdetermined system can be solved using the Bayesian MMSE estimator.

In statistics, linear least squares problems correspond to a particularly important type of statistical model called linear regression which arises as a particular form of regression analysis. One basic form of such a model is an ordinary least squares model. The present article concentrates on the mathematical aspects of linear least squares problems, with discussion of the formulation and interpretation of statistical regression models and statistical inferences related to these being dealt with in the articles just mentioned. See outline of regression analysis for an outline of the topic.

Properties

If the experimental errors, $ε$ , are uncorrelated, have a mean of zero and a constant variance, $σ$ , the Gauss–Markov theorem states that the least-squares estimator, $\hat{β}$ , has the minimum variance of all estimators that are linear combinations of the observations. In this sense it is the best, or optimal, estimator of the parameters. Note particularly that this property is independent of the statistical distribution function of the errors. In other words, the distribution function of the errors need not be a normal distribution. However, for some probability distributions, there is no guarantee that the least-squares solution is even possible given the observations; still, in such cases it is the best estimator that is both linear and unbiased.

For example, it is easy to show that the arithmetic mean of a set of measurements of a quantity is the least-squares estimator of the value of that quantity. If the conditions of the Gauss–Markov theorem apply, the arithmetic mean is optimal, whatever the distribution of errors of the measurements might be.

However, in the case that the experimental errors do belong to a normal distribution, the least-squares estimator is also a maximum likelihood estimator.

These properties underpin the use of the method of least squares for all types of data fitting, even when the assumptions are not strictly valid.

Limitations

An assumption underlying the treatment given above is that the independent variable, x, is free of error. In practice, the errors on the measurements of the independent variable are usually much smaller than the errors on the dependent variable and can therefore be ignored. When this is not the case, total least squares or more generally errors-in-variables models, or rigorous least squares, should be used. This can be done by adjusting the weighting scheme to take into account errors on both the dependent and independent variables and then following the standard procedure.

In some cases the (weighted) normal equations matrix X^TX is ill-conditioned. When fitting polynomials the normal equations matrix is a Vandermonde matrix. Vandermonde matrices become increasingly ill-conditioned as the order of the matrix increases. In these cases, the least squares estimate amplifies the measurement noise and may be grossly inaccurate. Various regularization techniques can be applied in such cases, the most common of which is called ridge regression. If further information about the parameters is known, for example, a range of possible values of $\hat{β}$ , then various techniques can be used to increase the stability of the solution. For example, see constrained least squares.

Another drawback of the least squares estimator is the fact that the norm of the residuals, $‖ y - X \hat{β} ‖$ is minimized, whereas in some cases one is truly interested in obtaining small error in the parameter $\hat{β}$ , e.g., a small value of $‖ β - \hat{β} ‖$ . However, since the true parameter $β$ is necessarily unknown, this quantity cannot be directly minimized. If a prior probability on $\hat{β}$ is known, then a Bayes estimator can be used to minimize the mean squared error, $E {‖ β - \hat{β} ‖^{2}}$ . The least squares method is often applied when no prior is known. Surprisingly, when several parameters are being estimated jointly, better estimators can be constructed, an effect known as Stein's phenomenon. For example, if the measurement error is Gaussian, several estimators are known which dominate, or outperform, the least squares technique; the best known of these is the James–Stein estimator. This is an example of more general shrinkage estimators that have been applied to regression problems.

Applications

Polynomial fitting: models are polynomials in an independent variable, x:
- Straight line: $f (x, β) = β_{1} + β_{2} x$ .
- Quadratic: $f (x, β) = β_{1} + β_{2} x + β_{3} x^{2}$ .
- Cubic, quartic and higher polynomials. For regression with high-order polynomials, the use of orthogonal polynomials is recommended.
Numerical smoothing and differentiation — this is an application of polynomial fitting.
Multinomials in more than one independent variable, including surface fitting
Curve fitting with B-splines
Chemometrics, Calibration curve, Standard addition, Gran plot, analysis of mixtures

Uses in data fitting

The primary application of linear least squares is in data fitting. Given a set of m data points $y_{1}, y_{2}, \dots, y_{m},$ consisting of experimentally measured values taken at m values $x_{1}, x_{2}, \dots, x_{m}$ of an independent variable ( $x_{i}$ may be scalar or vector quantities), and given a model function $y = f (x, β),$ with $β = (β_{1}, β_{2}, \dots, β_{n}),$ it is desired to find the parameters $β_{j}$ such that the model function "best" fits the data. In linear least squares, linearity is meant to be with respect to parameters $β_{j},$ so

f (x, β) = \sum_{j = 1}^{n} β_{j} φ_{j} (x) .

Here, the functions $φ_{j}$ may be nonlinear with respect to the variable x.

Ideally, the model function fits the data exactly, so

y_{i} = f (x_{i}, β)

for all

i = 1, 2, \dots, m .

This is usually not possible in practice, as there are more data points than there are parameters to be determined. The approach chosen then is to find the minimal possible value of the sum of squares of the residuals

r_{i} (β) = y_{i} - f (x_{i}, β), (i = 1, 2, \dots, m)

so to minimize the function

S (β) = \sum_{i = 1}^{m} r_{i}^{2} (β) .

After substituting for $r_{i}$ and then for $f$ , this minimization problem becomes the quadratic minimization problem above with

X_{i j} = φ_{j} (x_{i}),

and the best fit can be found by solving the normal equations.

Example

A hypothetical researcher conducts an experiment and obtains four $(x, y)$ data points: $(1, 6),$ $(2, 5),$ $(3, 7),$ and $(4, 10)$ (shown in red in the diagram on the right). Because of exploratory data analysis or prior knowledge of the subject matter, the researcher suspects that the $y$ -values depend on the $x$ -values systematically. The $x$ -values are assumed to be exact, but the $y$ -values contain some uncertainty or "noise", because of the phenomenon being studied, imperfections in the measurements, etc.

Fitting a line

One of the simplest possible relationships between $x$ and $y$ is a line $y = β_{1} + β_{2} x$ . The intercept $β_{1}$ and the slope $β_{2}$ are initially unknown. The researcher would like to find values of $β_{1}$ and $β_{2}$ that cause the line to pass through the four data points. In other words, the researcher would like to solve the system of linear equations

\begin{aligned} β_{1} + 1 β_{2} & = & 6, \\ β_{1} + 2 β_{2} & = & 5, \\ β_{1} + 3 β_{2} & = & 7, \\ β_{1} + 4 β_{2} & = & 10. \end{aligned}

With four equations in two unknowns, this system is overdetermined. There is no exact solution. To consider approximate solutions, one introduces residuals

r_{1}

r_{2}

r_{3}

r_{4}

into the equations:

\begin{aligned} β_{1} + 1 β_{2} + r_{1} & = & 6, \\ β_{1} + 2 β_{2} + r_{2} & = & 5, \\ β_{1} + 3 β_{2} + r_{3} & = & 7, \\ β_{1} + 4 β_{2} + r_{4} & = & 10. \end{aligned}

The

i

th residual

r_{i}

is the misfit between the

i

th observation

y_{i}

and the

i

th prediction

β_{1} + β_{2} x_{i}

\begin{aligned} r_{1} & = & 6 - (β_{1} + 1 β_{2}), \\ r_{2} & = & 5 - (β_{1} + 2 β_{2}), \\ r_{3} & = & 7 - (β_{1} + 3 β_{2}), \\ r_{4} & = & 10 - (β_{1} + 4 β_{2}) . \end{aligned}

Among all approximate solutions, the researcher would like to find the one that is "best" in some sense.

In least squares, one focuses on the sum $S$ of the squared residuals:

\begin{aligned} S (β_{1}, β_{2}) & = r_{1}^{2} + r_{2}^{2} + r_{3}^{2} + r_{4}^{2} \\ = [6 - (β_{1} + 1 β_{2})]^{2} + [5 - (β_{1} + 2 β_{2})]^{2} + [7 - (β_{1} + 3 β_{2})]^{2} + [10 - (β_{1} + 4 β_{2})]^{2} \\ = 4 β_{1}^{2} + 30 β_{2}^{2} + 20 β_{1} β_{2} - 56 β_{1} - 154 β_{2} + 210. \end{aligned}

The best solution is defined to be the one that minimizes

S

with respect to

β_{1}

and

β_{2}

. The minimum can be calculated by setting the partial derivatives of

S

to zero:

0 = \frac{\partial S}{\partial β_{1}} = 8 β_{1} + 20 β_{2} - 56,

0 = \frac{\partial S}{\partial β_{2}} = 20 β_{1} + 60 β_{2} - 154.

These normal equations constitute a system of two linear equations in two unknowns. The solution is

β_{1} = 3.5

and

β_{2} = 1.4

, and the best-fit line is therefore

y = 3.5 + 1.4 x

. The residuals are

1.1,

- 1.3,

- 0.7,

and

0.9

(see the diagram on the right). The minimum value of the sum of squared residuals is

S (3.5, 1.4) = {1.1}^{2} + (- 1.3)^{2} + (- 0.7)^{2} + {0.9}^{2} = 4.2.

This calculation can be expressed in matrix notation as follows. The original system of equations is $y = X β$ , where

y = [\begin{matrix} 6 \\ 5 \\ 7 \\ 10 \end{matrix}], X = [\begin{array}{cc} 1 & 1 \\ 1 & 2 \\ 1 & 3 \\ 1 & 4 \end{array}], β = [\begin{matrix} β_{1} \\ β_{2} \end{matrix}] .

Intuitively,

y = X β \Rightarrow X^{⊤} y = X^{⊤} X β \Rightarrow β = {(X^{⊤} X)}^{- 1} X^{⊤} y = [\begin{matrix} 3.5 \\ 1.4 \end{matrix}] .

More rigorously, if

X^{⊤} X

is invertible, then the matrix

X {(X^{⊤} X)}^{- 1} X^{⊤}

represents orthogonal projection onto the column space of

X

. Therefore, among all vectors of the form

X β

, the one closest to

y

X {(X^{⊤} X)}^{- 1} X^{⊤} y

. Setting

X {(X^{⊤} X)}^{- 1} X^{⊤} y = X β,

it is evident that

β = {(X^{⊤} X)}^{- 1} X^{⊤} y

is a solution.

Fitting a parabola

Suppose that the hypothetical researcher wishes to fit a parabola of the form $y = β_{1} x^{2}$ . Importantly, this model is still linear in the unknown parameters (now just $β_{1}$ ), so linear least squares still applies. The system of equations incorporating residuals is