A Medley of Potpourri: Jul 6, 2023

Thursday, July 6, 2023

Molecular orbital

From Wikipedia, the free encyclopedia

Complete acetylene (H–C≡C–H) molecular orbital set. The left column shows MO's which are occupied in the ground state, with the lowest-energy orbital at the top. The white and grey line visible in some MO's is the molecular axis passing through the nuclei. The orbital wave functions are positive in the red regions and negative in the blue. The right column shows virtual MO's which are empty in the ground state, but may be occupied in excited states.

In chemistry, a molecular orbital (/ɒrbədl/) is a mathematical function describing the location and wave-like behavior of an electron in a molecule. This function can be used to calculate chemical and physical properties such as the probability of finding an electron in any specific region. The terms atomic orbital and molecular orbital were introduced by Robert S. Mulliken in 1932 to mean one-electron orbital wave functions. At an elementary level, they are used to describe the region of space in which a function has a significant amplitude.

In an isolated atom, the orbital electrons' location is determined by functions called atomic orbitals. When multiple atoms combine chemically into a molecule, the electrons' locations are determined by the molecule as a whole, so the atomic orbitals combine to form molecular orbitals. The electrons from the constituent atoms occupy the molecular orbitals. Mathematically, molecular orbitals are an approximate solution to the Schrödinger equation for the electrons in the field of the molecule's atomic nuclei. They are usually constructed by combining atomic orbitals or hybrid orbitals from each atom of the molecule, or other molecular orbitals from groups of atoms. They can be quantitatively calculated using the Hartree–Fock or self-consistent field (SCF) methods.

Molecular orbitals are of three types: bonding orbitals which have an energy lower than the energy of the atomic orbitals which formed them, and thus promote the chemical bonds which hold the molecule together; antibonding orbitals which have an energy higher than the energy of their constituent atomic orbitals, and so oppose the bonding of the molecule, and non-bonding orbitals which have the same energy as their constituent atomic orbitals and thus have no effect on the bonding of the molecule.

Overview

A molecular orbital (MO) can be used to represent the regions in a molecule where an electron occupying that orbital is likely to be found. Molecular orbitals are approximate solutions to the Schrödinger equation for the electrons in the electric field of the molecule's atomic nuclei. However calculating the orbitals directly from this equation is far too intractable a problem. Instead they are obtained from the combination of atomic orbitals, which predict the location of an electron in an atom. A molecular orbital can specify the electron configuration of a molecule: the spatial distribution and energy of one (or one pair of) electron(s). Most commonly a MO is represented as a linear combination of atomic orbitals (the LCAO-MO method), especially in qualitative or very approximate usage. They are invaluable in providing a simple model of bonding in molecules, understood through molecular orbital theory. Most present-day methods in computational chemistry begin by calculating the MOs of the system. A molecular orbital describes the behavior of one electron in the electric field generated by the nuclei and some average distribution of the other electrons. In the case of two electrons occupying the same orbital, the Pauli principle demands that they have opposite spin. Necessarily this is an approximation, and highly accurate descriptions of the molecular electronic wave function do not have orbitals (see configuration interaction).

Molecular orbitals are, in general, delocalized throughout the entire molecule. Moreover, if the molecule has symmetry elements, its nondegenerate molecular orbitals are either symmetric or antisymmetric with respect to any of these symmetries. In other words, the application of a symmetry operation S (e.g., a reflection, rotation, or inversion) to molecular orbital ψ results in the molecular orbital being unchanged or reversing its mathematical sign: Sψ = ±ψ. In planar molecules, for example, molecular orbitals are either symmetric (sigma) or antisymmetric (pi) with respect to reflection in the molecular plane. If molecules with degenerate orbital energies are also considered, a more general statement that molecular orbitals form bases for the irreducible representations of the molecule's symmetry group holds. The symmetry properties of molecular orbitals means that delocalization is an inherent feature of molecular orbital theory and makes it fundamentally different from (and complementary to) valence bond theory, in which bonds are viewed as localized electron pairs, with allowance for resonance to account for delocalization.

In contrast to these symmetry-adapted canonical molecular orbitals, localized molecular orbitals can be formed by applying certain mathematical transformations to the canonical orbitals. The advantage of this approach is that the orbitals will correspond more closely to the "bonds" of a molecule as depicted by a Lewis structure. As a disadvantage, the energy levels of these localized orbitals no longer have physical meaning. (The discussion in the rest of this article will focus on canonical molecular orbitals. For further discussions on localized molecular orbitals, see: natural bond orbital and sigma-pi and equivalent-orbital models.)

Formation of molecular orbitals

Molecular orbitals arise from allowed interactions between atomic orbitals, which are allowed if the symmetries (determined from group theory) of the atomic orbitals are compatible with each other. Efficiency of atomic orbital interactions is determined from the overlap (a measure of how well two orbitals constructively interact with one another) between two atomic orbitals, which is significant if the atomic orbitals are close in energy. Finally, the number of molecular orbitals formed must be equal to the number of atomic orbitals in the atoms being combined to form the molecule.

Qualitative discussion

For an imprecise, but qualitatively useful, discussion of the molecular structure, the molecular orbitals can be obtained from the "Linear combination of atomic orbitals molecular orbital method" ansatz. Here, the molecular orbitals are expressed as linear combinations of atomic orbitals.

Linear combinations of atomic orbitals (LCAO)

Molecular orbitals were first introduced by Friedrich Hund and Robert S. Mulliken in 1927 and 1928. The linear combination of atomic orbitals or "LCAO" approximation for molecular orbitals was introduced in 1929 by Sir John Lennard-Jones. His ground-breaking paper showed how to derive the electronic structure of the fluorine and oxygen molecules from quantum principles. This qualitative approach to molecular orbital theory is part of the start of modern quantum chemistry. Linear combinations of atomic orbitals (LCAO) can be used to estimate the molecular orbitals that are formed upon bonding between the molecule's constituent atoms. Similar to an atomic orbital, a Schrödinger equation, which describes the behavior of an electron, can be constructed for a molecular orbital as well. Linear combinations of atomic orbitals, or the sums and differences of the atomic wavefunctions, provide approximate solutions to the Hartree–Fock equations which correspond to the independent-particle approximation of the molecular Schrödinger equation. For simple diatomic molecules, the wavefunctions obtained are represented mathematically by the equations

Ψ = c_{a} ψ_{a} + c_{b} ψ_{b}

Ψ^{*} = c_{a} ψ_{a} - c_{b} ψ_{b}

where $Ψ$ and $Ψ^{*}$ are the molecular wavefunctions for the bonding and antibonding molecular orbitals, respectively, $ψ_{a}$ and $ψ_{b}$ are the atomic wavefunctions from atoms a and b, respectively, and $c_{a}$ and $c_{b}$ are adjustable coefficients. These coefficients can be positive or negative, depending on the energies and symmetries of the individual atomic orbitals. As the two atoms become closer together, their atomic orbitals overlap to produce areas of high electron density, and, as a consequence, molecular orbitals are formed between the two atoms. The atoms are held together by the electrostatic attraction between the positively charged nuclei and the negatively charged electrons occupying bonding molecular orbitals.

Bonding, antibonding, and nonbonding MOs

When atomic orbitals interact, the resulting molecular orbital can be of three types: bonding, antibonding, or nonbonding.

Bonding MOs:

Bonding interactions between atomic orbitals are constructive (in-phase) interactions.
Bonding MOs are lower in energy than the atomic orbitals that combine to produce them.

Antibonding MOs:

Antibonding interactions between atomic orbitals are destructive (out-of-phase) interactions, with a nodal plane where the wavefunction of the antibonding orbital is zero between the two interacting atoms
Antibonding MOs are higher in energy than the atomic orbitals that combine to produce them.

Nonbonding MOs:

Nonbonding MOs are the result of no interaction between atomic orbitals because of lack of compatible symmetries.
Nonbonding MOs will have the same energy as the atomic orbitals of one of the atoms in the molecule.

Sigma and pi labels for MOs

The type of interaction between atomic orbitals can be further categorized by the molecular-orbital symmetry labels σ (sigma), π (pi), δ (delta), φ (phi), γ (gamma) etc. These are the Greek letters corresponding to the atomic orbitals s, p, d, f and g respectively. The number of nodal planes containing the internuclear axis between the atoms concerned is zero for σ MOs, one for π, two for δ, three for φ and four for γ.

σ symmetry

A MO with σ symmetry results from the interaction of either two atomic s-orbitals or two atomic p_z-orbitals. An MO will have σ-symmetry if the orbital is symmetric with respect to the axis joining the two nuclear centers, the internuclear axis. This means that rotation of the MO about the internuclear axis does not result in a phase change. A σ* orbital, sigma antibonding orbital, also maintains the same phase when rotated about the internuclear axis. The σ* orbital has a nodal plane that is between the nuclei and perpendicular to the internuclear axis.

π symmetry

A MO with π symmetry results from the interaction of either two atomic p_x orbitals or p_y orbitals. An MO will have π symmetry if the orbital is asymmetric with respect to rotation about the internuclear axis. This means that rotation of the MO about the internuclear axis will result in a phase change. There is one nodal plane containing the internuclear axis, if real orbitals are considered.

A π* orbital, pi antibonding orbital, will also produce a phase change when rotated about the internuclear axis. The π* orbital also has a second nodal plane between the nuclei.

δ symmetry

A MO with δ symmetry results from the interaction of two atomic d_xy or d_x²-y² orbitals. Because these molecular orbitals involve low-energy d atomic orbitals, they are seen in transition-metal complexes. A δ bonding orbital has two nodal planes containing the internuclear axis, and a δ* antibonding orbital also has a third nodal plane between the nuclei.

φ symmetry

Suitably aligned f atomic orbitals overlap to form phi molecular orbital (a phi bond)

Theoretical chemists have conjectured that higher-order bonds, such as phi bonds corresponding to overlap of f atomic orbitals, are possible. There is no known example of a molecule purported to contain a phi bond.

Gerade and ungerade symmetry

For molecules that possess a center of inversion (centrosymmetric molecules) there are additional labels of symmetry that can be applied to molecular orbitals. Centrosymmetric molecules include:

Homonuclear diatomics, X₂
Octahedral, EX₆
Square planar, EX₄.

Non-centrosymmetric molecules include:

Heteronuclear diatomics, XY
Tetrahedral, EX₄.

If inversion through the center of symmetry in a molecule results in the same phases for the molecular orbital, then the MO is said to have gerade (g) symmetry, from the German word for even. If inversion through the center of symmetry in a molecule results in a phase change for the molecular orbital, then the MO is said to have ungerade (u) symmetry, from the German word for odd. For a bonding MO with σ-symmetry, the orbital is σ_g (s' + s'' is symmetric), while an antibonding MO with σ-symmetry the orbital is σ_u, because inversion of s' – s'' is antisymmetric. For a bonding MO with π-symmetry the orbital is π_u because inversion through the center of symmetry for would produce a sign change (the two p atomic orbitals are in phase with each other but the two lobes have opposite signs), while an antibonding MO with π-symmetry is π_g because inversion through the center of symmetry for would not produce a sign change (the two p orbitals are antisymmetric by phase).

MO diagrams

The qualitative approach of MO analysis uses a molecular orbital diagram to visualize bonding interactions in a molecule. In this type of diagram, the molecular orbitals are represented by horizontal lines; the higher a line the higher the energy of the orbital, and degenerate orbitals are placed on the same level with a space between them. Then, the electrons to be placed in the molecular orbitals are slotted in one by one, keeping in mind the Pauli exclusion principle and Hund's rule of maximum multiplicity (only 2 electrons, having opposite spins, per orbital; place as many unpaired electrons on one energy level as possible before starting to pair them). For more complicated molecules, the wave mechanics approach loses utility in a qualitative understanding of bonding (although is still necessary for a quantitative approach). Some properties:

A basis set of orbitals includes those atomic orbitals that are available for molecular orbital interactions, which may be bonding or antibonding
The number of molecular orbitals is equal to the number of atomic orbitals included in the linear expansion or the basis set
If the molecule has some symmetry, the degenerate atomic orbitals (with the same atomic energy) are grouped in linear combinations (called symmetry-adapted atomic orbitals (SO)), which belong to the representation of the symmetry group, so the wave functions that describe the group are known as symmetry-adapted linear combinations (SALC).
The number of molecular orbitals belonging to one group representation is equal to the number of symmetry-adapted atomic orbitals belonging to this representation
Within a particular representation, the symmetry-adapted atomic orbitals mix more if their atomic energy levels are closer.

The general procedure for constructing a molecular orbital diagram for a reasonably simple molecule can be summarized as follows:

1. Assign a point group to the molecule.

2. Look up the shapes of the SALCs.

3. Arrange the SALCs of each molecular fragment in order of energy, noting first whether they stem from s, p, or d orbitals (and put them in the order s < p < d), and then their number of internuclear nodes.

4. Combine SALCs of the same symmetry type from the two fragments, and from N SALCs form N molecular orbitals.

5. Estimate the relative energies of the molecular orbitals from considerations of overlap and relative energies of the parent orbitals, and draw the levels on a molecular orbital energy level diagram (showing the origin of the orbitals).

6. Confirm, correct, and revise this qualitative order by carrying out a molecular orbital calculation by using commercial software.

Bonding in molecular orbitals

Orbital degeneracy

Molecular orbitals are said to be degenerate if they have the same energy. For example, in the homonuclear diatomic molecules of the first ten elements, the molecular orbitals derived from the p_x and the p_y atomic orbitals result in two degenerate bonding orbitals (of low energy) and two degenerate antibonding orbitals (of high energy).

Ionic bonds

When the energy difference between the atomic orbitals of two atoms is quite large, one atom's orbitals contribute almost entirely to the bonding orbitals, and the other atom's orbitals contribute almost entirely to the antibonding orbitals. Thus, the situation is effectively that one or more electrons have been transferred from one atom to the other. This is called an (mostly) ionic bond.

Bond order

The bond order, or number of bonds, of a molecule can be determined by combining the number of electrons in bonding and antibonding molecular orbitals. A pair of electrons in a bonding orbital creates a bond, whereas a pair of electrons in an antibonding orbital negates a bond. For example, N₂, with eight electrons in bonding orbitals and two electrons in antibonding orbitals, has a bond order of three, which constitutes a triple bond.

Bond strength is proportional to bond order—a greater amount of bonding produces a more stable bond—and bond length is inversely proportional to it—a stronger bond is shorter.

There are rare exceptions to the requirement of molecule having a positive bond order. Although Be₂ has a bond order of 0 according to MO analysis, there is experimental evidence of a highly unstable Be₂ molecule having a bond length of 245 pm and bond energy of 10 kJ/mol.

HOMO and LUMO

The highest occupied molecular orbital and lowest unoccupied molecular orbital are often referred to as the HOMO and LUMO, respectively. The difference of the energies of the HOMO and LUMO is called the HOMO-LUMO gap. This notion is often the matter of confusion in literature and should be considered with caution. Its value is usually located between the fundamental gap (difference between ionization potential and electron affinity) and the optical gap. In addition, HOMO-LUMO gap can be related to a bulk material band gap or transport gap, which is usually much smaller than fundamental gap.

Examples

Homonuclear diatomics

Homonuclear diatomic MOs contain equal contributions from each atomic orbital in the basis set. This is shown in the homonuclear diatomic MO diagrams for H₂, He₂, and Li₂, all of which containing symmetric orbitals.

H₂

Electron wavefunctions for the 1s orbital of a lone hydrogen atom (left and right) and the corresponding bonding (bottom) and antibonding (top) molecular orbitals of the H₂ molecule. The real part of the wavefunction is the blue curve, and the imaginary part is the red curve. The red dots mark the locations of the nuclei. The electron wavefunction oscillates according to the Schrödinger wave equation, and orbitals are its standing waves. The standing wave frequency is proportional to the orbital's kinetic energy. (This plot is a one-dimensional slice through the three-dimensional system.)

As a simple MO example, consider the electrons in a hydrogen molecule, H₂ (see molecular orbital diagram), with the two atoms labelled H' and H". The lowest-energy atomic orbitals, 1s' and 1s", do not transform according to the symmetries of the molecule. However, the following symmetry adapted atomic orbitals do:

1s' – 1s"	Antisymmetric combination: negated by reflection, unchanged by other operations
1s' + 1s"	Symmetric combination: unchanged by all symmetry operations

The symmetric combination (called a bonding orbital) is lower in energy than the basis orbitals, and the antisymmetric combination (called an antibonding orbital) is higher. Because the H₂ molecule has two electrons, they can both go in the bonding orbital, making the system lower in energy (hence more stable) than two free hydrogen atoms. This is called a covalent bond. The bond order is equal to the number of bonding electrons minus the number of antibonding electrons, divided by 2. In this example, there are 2 electrons in the bonding orbital and none in the antibonding orbital; the bond order is 1, and there is a single bond between the two hydrogen atoms.

He₂

On the other hand, consider the hypothetical molecule of He₂ with the atoms labeled He' and He". As with H₂, the lowest energy atomic orbitals are the 1s' and 1s", and do not transform according to the symmetries of the molecule, while the symmetry adapted atomic orbitals do. The symmetric combination—the bonding orbital—is lower in energy than the basis orbitals, and the antisymmetric combination—the antibonding orbital—is higher. Unlike H₂, with two valence electrons, He₂ has four in its neutral ground state. Two electrons fill the lower-energy bonding orbital, σ_g(1s), while the remaining two fill the higher-energy antibonding orbital, σ_u*(1s). Thus, the resulting electron density around the molecule does not support the formation of a bond between the two atoms; without a stable bond holding the atoms together, the molecule would not be expected to exist. Another way of looking at it is that there are two bonding electrons and two antibonding electrons; therefore, the bond order is 0 and no bond exists (the molecule has one bound state supported by the Van der Waals potential).

Li₂

Dilithium Li₂ is formed from the overlap of the 1s and 2s atomic orbitals (the basis set) of two Li atoms. Each Li atom contributes three electrons for bonding interactions, and the six electrons fill the three MOs of lowest energy, σ_g(1s), σ_u*(1s), and σ_g(2s). Using the equation for bond order, it is found that dilithium has a bond order of one, a single bond.

Noble gases

Considering a hypothetical molecule of He₂, since the basis set of atomic orbitals is the same as in the case of H₂, we find that both the bonding and antibonding orbitals are filled, so there is no energy advantage to the pair. HeH would have a slight energy advantage, but not as much as H₂ + 2 He, so the molecule is very unstable and exists only briefly before decomposing into hydrogen and helium. In general, we find that atoms such as He that have full energy shells rarely bond with other atoms. Except for short-lived Van der Waals complexes, there are very few noble gas compounds known.

Heteronuclear diatomics

While MOs for homonuclear diatomic molecules contain equal contributions from each interacting atomic orbital, MOs for heteronuclear diatomics contain different atomic orbital contributions. Orbital interactions to produce bonding or antibonding orbitals in heteronuclear diatomics occur if there is sufficient overlap between atomic orbitals as determined by their symmetries and similarity in orbital energies.

HF

In hydrogen fluoride HF overlap between the H 1s and F 2s orbitals is allowed by symmetry but the difference in energy between the two atomic orbitals prevents them from interacting to create a molecular orbital. Overlap between the H 1s and F 2p_z orbitals is also symmetry allowed, and these two atomic orbitals have a small energy separation. Thus, they interact, leading to creation of σ and σ* MOs and a molecule with a bond order of 1. Since HF is a non-centrosymmetric molecule, the symmetry labels g and u do not apply to its molecular orbitals.

Quantitative approach

To obtain quantitative values for the molecular energy levels, one needs to have molecular orbitals that are such that the configuration interaction (CI) expansion converges fast towards the full CI limit. The most common method to obtain such functions is the Hartree–Fock method, which expresses the molecular orbitals as eigenfunctions of the Fock operator. One usually solves this problem by expanding the molecular orbitals as linear combinations of Gaussian functions centered on the atomic nuclei (see linear combination of atomic orbitals and basis set (chemistry)). The equation for the coefficients of these linear combinations is a generalized eigenvalue equation known as the Roothaan equations, which are in fact a particular representation of the Hartree–Fock equation. There are a number of programs in which quantum chemical calculations of MOs can be performed, including Spartan.

Simple accounts often suggest that experimental molecular orbital energies can be obtained by the methods of ultra-violet photoelectron spectroscopy for valence orbitals and X-ray photoelectron spectroscopy for core orbitals. This, however, is incorrect as these experiments measure the ionization energy, the difference in energy between the molecule and one of the ions resulting from the removal of one electron. Ionization energies are linked approximately to orbital energies by Koopmans' theorem. While the agreement between these two values can be close for some molecules, it can be very poor in other cases.

Simple linear regression

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Simple_linear_regression

Okun's law in macroeconomics is an example of the simple linear regression. Here the dependent variable (GDP growth) is presumed to be in a linear relationship with the changes in the unemployment rate.

In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the x and y coordinates in a Cartesian coordinate system) and finds a linear function (a non-vertical straight line) that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

It is common to make the additional stipulation that the ordinary least squares (OLS) method should be used: the accuracy of each predicted value is measured by its squared residual (vertical distance between the point of the data set and the fitted line), and the goal is to make the sum of these squared deviations as small as possible. Other regression methods that can be used in place of ordinary least squares include least absolute deviations (minimizing the sum of absolute values of residuals) and the Theil–Sen estimator (which chooses a line whose slope is the median of the slopes determined by pairs of sample points). Deming regression (total least squares) also finds a line that fits a set of two-dimensional sample points, but (unlike ordinary least squares, least absolute deviations, and median slope regression) it is not really an instance of simple linear regression, because it does not separate the coordinates into one dependent and one independent variable and could potentially return a vertical line as its fit.

The remainder of the article assumes an ordinary least squares regression. In this case, the slope of the fitted line is equal to the correlation between $y$ and $x$ corrected by the ratio of standard deviations of these variables. The intercept of the fitted line is such that the line passes through the center of mass $(x, y)$ of the data points.

Fitting the regression line

Consider the model function

y = α + β x,

which describes a line with slope $β$ and $y$ -intercept $α$ . In general such a relationship may not hold exactly for the largely unobserved population of values of the independent and dependent variables; we call the unobserved deviations from the above equation the errors. Suppose we observe $n$ data pairs and call them ${(x i, y i), i = 1, ..., n$ }. We can describe the underlying relationship between $y i$ and $x i$ involving this error term $ε i$ by

y_{i} = α + β x_{i} + ε_{i} .

This relationship between the true (but unobserved) underlying parameters $α$ and $β$ and the data points is called a linear regression model.

The goal is to find estimated values $\hat{α}$ and $\hat{β}$ for the parameters $α$ and $β$ which would provide the "best" fit in some sense for the data points. As mentioned in the introduction, in this article the "best" fit will be understood as in the least-squares approach: a line that minimizes the sum of squared residuals (see also Errors and residuals) ${\hat{ε}}_{i}$ (differences between actual and predicted values of the dependent variable y), each of which is given by, for any candidate parameter values $α$ and $β$ ,

{\hat{ε}}_{i} = y_{i} - α - β x_{i} .

In other words, $\hat{α}$ and $\hat{β}$ solve the following minimization problem:

Find min_{α, β} Q (α, β), for Q (α, β) = \sum_{i = 1}^{n} {\hat{ε}}_{i}^{2} = \sum_{i = 1}^{n} (y_{i} - α - β x_{i})^{2} .

By expanding to get a quadratic expression in $α$ and $β,$ we can derive values of $α$ and $β$ that minimize the objective function $Q$ (these minimizing values are denoted $\hat{α}$ and $\hat{β}$ ):

\begin{aligned} \hat{α} & = \bar{y} - (\hat{β} \bar{x}), \\ \hat{β} & = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}} \\ = \frac{s_{x, y}}{s_{x}^{2}} \\ = r_{x y} \frac{s_{y}}{s_{x}} . \end{aligned}

Here we have introduced

$\bar{x}$ and $\bar{y}$ as the average of the $x i$ and $y i$ , respectively
$r xy$ as the sample correlation coefficient between $x$ and $y$
$s x$ and $s y$ as the uncorrected sample standard deviations of $x$ and $y$
$s_{x}^{2}$ and $s_{x, y}$ as the sample variance and sample covariance, respectively

Substituting the above expressions for $\hat{α}$ and $\hat{β}$ into

f = \hat{α} + \hat{β} x,

yields

\frac{f - \bar{y}}{s_{y}} = r_{x y} \frac{x - \bar{x}}{s_{x}} .

This shows that $r xy$ is the slope of the regression line of the standardized data points (and that this line passes through the origin). Since $- 1 \leq r_{x y} \leq 1$ then we get that if x is some measurement and y is a followup measurement from the same item, then we expect that y (on average) will be closer to the mean measurement than it was to the original value of x. This phenomenon is known as regressions toward the mean.

Generalizing the $\bar{x}$ notation, we can write a horizontal bar over an expression to indicate the average value of that expression over the set of samples. For example:

\bar{x y} = \frac{1}{n} \sum_{i = 1}^{n} x_{i} y_{i} .

This notation allows us a concise formula for $r xy$ :

r_{x y} = \frac{\bar{x y} - \bar{x} \bar{y}}{\sqrt{(\bar{x^{2}} - {\bar{x}}^{2}) (\bar{y^{2}} - {\bar{y}}^{2})}} .

The coefficient of determination ("R squared") is equal to $r_{x y}^{2}$ when the model is linear with a single independent variable. See sample correlation coefficient for additional details.

Intuition about the slope

By multiplying all members of the summation in the numerator by : $\begin{aligned} \frac{(x_{i} - \bar{x})}{(x_{i} - \bar{x})} = 1 \end{aligned}$ (thereby not changing it):

\begin{aligned} \hat{β} & = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}} = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2} \frac{(y_{i} - \bar{y})}{(x_{i} - \bar{x})}}{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}} = \sum_{i = 1}^{n} \frac{(x_{i} - \bar{x})^{2}}{\sum_{j = 1}^{n} (x_{j} - \bar{x})^{2}} \frac{(y_{i} - \bar{y})}{(x_{i} - \bar{x})} \end{aligned}

We can see that the slope (tangent of angle) of the regression line is the weighted average of $\frac{(y_{i} - \bar{y})}{(x_{i} - \bar{x})}$ that is the slope (tangent of angle) of the line that connects the i-th point to the average of all points, weighted by $(x_{i} - \bar{x})^{2}$ because the further the point is the more "important" it is, since small errors in its position will affect the slope connecting it to the center point more.

Intuition about the intercept

\begin{aligned} \hat{α} & = \bar{y} - \hat{β} \bar{x}, \end{aligned}

Given $\hat{β} = \tan (θ) = d y / d x \to d y = d x \times \hat{β}$ with $θ$ the angle the line makes with the positive x axis, we have $y_{i n t e r s e c t i o n} = \bar{y} - d x \times \hat{β} = \bar{y} - d y$

Intuition about the correlation

In the above formulation, notice that each $x_{i}$ is a constant ("known upfront") value, while the $y_{i}$ are random variables that depend on the linear function of $x_{i}$ and the random term $ε_{i}$ . This assumption is used when deriving the standard error of the slope and showing that it is unbiased.

In this framing, when $x_{i}$ is not actually a random variable, what type of parameter does the empirical correlation $r_{x y}$ estimate? The issue is that for each value i we'll have: $E (x_{i}) = x_{i}$ and $V a r (x_{i}) = 0$ . A possible interpretation of $r_{x y}$ is to imagine that $x_{i}$ defines a random variable drawn from the empirical distribution of the x values in our sample. For example, if x had 10 values from the natural numbers: [1,2,3...,10], then we can imagine x to be a Discrete uniform distribution. Under this interpretation all $x_{i}$ have the same expectation and some positive variance. With this interpretation we can think of $r_{x y}$ as the estimator of the Pearson's correlation between the random variable y and the random variable x (as we just defined it).

Simple linear regression without the intercept term (single regressor)

Sometimes it is appropriate to force the regression line to pass through the origin, because $x$ and $y$ are assumed to be proportional. For the model without the intercept term, $y = βx$ , the OLS estimator for $β$ simplifies to

\hat{β} = \frac{\sum_{i = 1}^{n} x_{i} y_{i}}{\sum_{i = 1}^{n} x_{i}^{2}} = \frac{\bar{x y}}{\bar{x^{2}}}

Substituting $(x - h, y - k)$ in place of $(x, y)$ gives the regression through $(h, k)$ :

\begin{aligned} \hat{β} & = \frac{\sum_{i = 1}^{n} (x_{i} - h) (y_{i} - k)}{\sum_{i = 1}^{n} (x_{i} - h)^{2}} = \frac{\bar{(x - h) (y - k)}}{\bar{(x - h)^{2}}} \\ = \frac{\bar{x y} - k \bar{x} - h \bar{y} + h k}{\bar{x^{2}} - 2 h \bar{x} + h^{2}} \\ = \frac{\bar{x y} - \bar{x} \bar{y} + (\bar{x} - h) (\bar{y} - k)}{\bar{x^{2}} - {\bar{x}}^{2} + (\bar{x} - h)^{2}} \\ = \frac{Cov (x, y) + (\bar{x} - h) (\bar{y} - k)}{Var (x) + (\bar{x} - h)^{2}}, \end{aligned}

where Cov and Var refer to the covariance and variance of the sample data (uncorrected for bias).

The last form above demonstrates how moving the line away from the center of mass of the data points affects the slope.

Numerical properties

The regression line goes through the center of mass point, $(\bar{x}, \bar{y})$ , if the model includes an intercept term (i.e., not forced through the origin).
The sum of the residuals is zero if the model includes an intercept term:
$\sum_{i = 1}^{n} {\hat{ε}}_{i} = 0.$
The residuals and $x$ values are uncorrelated (whether or not there is an intercept term in the model), meaning:
$\sum_{i = 1}^{n} x_{i} {\hat{ε}}_{i} = 0$
The relationship between $ρ_{x y}$ (the correlation coefficient for the population) and the population variances of $y$ ( $σ_{y}^{2}$ ) and the error term of $ϵ$ ( $σ_{ϵ}^{2}$ ) is:^[7]^: 401
$σ_{ϵ}^{2} = (1 - ρ_{x y}^{2}) σ_{y}^{2}$
For extreme values of $ρ_{x y}$ this is self evident. Since when $ρ_{x y} = 0$ then $σ_{ϵ}^{2} = σ_{y}^{2}$ . And when $ρ_{x y} = 1$ then $σ_{ϵ}^{2} = 0$ .

Model-based properties

Description of the statistical properties of estimators from the simple linear regression estimates requires the use of a statistical model. The following is based on assuming the validity of a model under which the estimates are optimal. It is also possible to evaluate the properties under other assumptions, such as inhomogeneity, but this is discussed elsewhere.

Unbiasedness

The estimators $\hat{α}$ and $\hat{β}$ are unbiased.

To formalize this assertion we must define a framework in which these estimators are random variables. We consider the residuals $ε i$ as random variables drawn independently from some distribution with mean zero. In other words, for each value of $x$ , the corresponding value of $y$ is generated as a mean response $α + βx$ plus an additional random variable $ε$ called the error term, equal to zero on average. Under such interpretation, the least-squares estimators $\hat{α}$ and $\hat{β}$ will themselves be random variables whose means will equal the "true values" $α$ and $β$ . This is the definition of an unbiased estimator.

Confidence intervals

The formulas given in the previous section allow one to calculate the point estimates of $α$ and $β$ — that is, the coefficients of the regression line for the given set of data. However, those formulas don't tell us how precise the estimates are, i.e., how much the estimators $\hat{α}$ and $\hat{β}$ vary from sample to sample for the specified sample size. Confidence intervals were devised to give a plausible set of values to the estimates one might have if one repeated the experiment a very large number of times.

The standard method of constructing confidence intervals for linear regression coefficients relies on the normality assumption, which is justified if either:

the errors in the regression are normally distributed (the so-called classic regression assumption), or
the number of observations $n$ is sufficiently large, in which case the estimator is approximately normally distributed.

The latter case is justified by the central limit theorem.

Normality assumption

Under the first assumption above, that of the normality of the error terms, the estimator of the slope coefficient will itself be normally distributed with mean $β$ and variance $σ^{2} / \sum (x_{i} - \bar{x})^{2},$ where $σ 2$ is the variance of the error terms (see Proofs involving ordinary least squares). At the same time the sum of squared residuals $Q$ is distributed proportionally to $χ 2$ with $n - 2$ degrees of freedom, and independently from $\hat{β}$ . This allows us to construct a $t$ -value

t = \frac{\hat{β} - β}{s_{\hat{β}}} \sim t_{n - 2},

where

s_{\hat{β}} = \sqrt{\frac{\frac{1}{n - 2} \sum_{i = 1}^{n} {\hat{ε}}_{i}^{2}}{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}}}

is the standard error of the estimator $\hat{β}$ .

This $t$ -value has a Student's $t$ -distribution with $n - 2$ degrees of freedom. Using it we can construct a confidence interval for $β$ :

β \in [\hat{β} - s_{\hat{β}} t_{n - 2}^{*}, \hat{β} + s_{\hat{β}} t_{n - 2}^{*}],

at confidence level $(1 - γ)$ , where $t_{n - 2}^{*}$ is the $(1 - \frac{γ}{2}) -th$ quantile of the $t n -2$ distribution. For example, if $γ = 0.05$ then the confidence level is 95%.

Similarly, the confidence interval for the intercept coefficient $α$ is given by

α \in [\hat{α} - s_{\hat{α}} t_{n - 2}^{*}, \hat{α} + s_{\hat{α}} t_{n - 2}^{*}],

at confidence level (1 − γ), where

s_{\hat{α}} = s_{\hat{β}} \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}} = \sqrt{\frac{1}{n (n - 2)} (\sum_{i = 1}^{n} {\hat{ε}}_{i}^{2}) \frac{\sum_{i = 1}^{n} x_{i}^{2}}{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}}}

The US "changes in unemployment – GDP growth" regression with the 95% confidence bands.

The confidence intervals for $α$ and $β$ give us the general idea where these regression coefficients are most likely to be. For example, in the Okun's law regression shown here the point estimates are

\hat{α} = 0.859, \hat{β} = - 1.817.

The 95% confidence intervals for these estimates are

α \in [0.76, 0.96], β \in [- 2.06, - 1.58] .

In order to represent this information graphically, in the form of the confidence bands around the regression line, one has to proceed carefully and account for the joint distribution of the estimators. It can be shown^[8] that at confidence level (1 − γ) the confidence band has hyperbolic form given by the equation

(α + β ξ) \in [\hat{α} + \hat{β} ξ \pm t_{n - 2}^{*} \sqrt{(\frac{1}{n - 2} \sum {\hat{ε}}_{i}^{2}) \cdot (\frac{1}{n} + \frac{(ξ - \bar{x})^{2}}{\sum (x_{i} - \bar{x})^{2}})}] .

When the model assumed the intercept is fixed and equal to 0 ( $α = 0$ ), the standard error of the slope turns into:

s_{\hat{β}} = \sqrt{\frac{1}{n - 1} \frac{\sum_{i = 1}^{n} {\hat{ε}}_{i}^{2}}{\sum_{i = 1}^{n} x_{i}^{2}}}

With: ${\hat{ε}}_{i} = y_{i} - {\hat{y}}_{i}$

Asymptotic assumption

The alternative second assumption states that when the number of points in the dataset is "large enough", the law of large numbers and the central limit theorem become applicable, and then the distribution of the estimators is approximately normal. Under this assumption all formulas derived in the previous section remain valid, with the only exception that the quantile t*_n−2 of Student's t distribution is replaced with the quantile q* of the standard normal distribution. Occasionally the fraction $1 / n -2$ is replaced with $1 / n$ . When $n$ is large such a change does not alter the results appreciably.

Numerical example

This data set gives average masses for women as a function of their height in a sample of American women of age 30–39. Although the OLS article argues that it would be more appropriate to run a quadratic regression for this data, the simple linear regression model is applied here instead.

Height (m), x_i	1.47	1.50	1.52	1.55	1.57	1.60	1.63	1.65	1.68	1.70	1.73	1.75	1.78	1.80	1.83
Mass (kg), y_i	52.21	53.12	54.48	55.84	57.20	58.57	59.93	61.29	63.11	64.47	66.28	68.10	69.92	72.19	74.46

$i$	$x_{i}$	$y_{i}$	$x_{i}^{2}$	$x_{i} y_{i}$	$y_{i}^{2}$
1	1.47	52.21	2.1609	76.7487	2725.8841
2	1.50	53.12	2.2500	79.6800	2821.7344
3	1.52	54.48	2.3104	82.8096	2968.0704
4	1.55	55.84	2.4025	86.5520	3118.1056
5	1.57	57.20	2.4649	89.8040	3271.8400
6	1.60	58.57	2.5600	93.7120	3430.4449
7	1.63	59.93	2.6569	97.6859	3591.6049
8	1.65	61.29	2.7225	101.1285	3756.4641
9	1.68	63.11	2.8224	106.0248	3982.8721
10	1.70	64.47	2.8900	109.5990	4156.3809
11	1.73	66.28	2.9929	114.6644	4393.0384
12	1.75	68.10	3.0625	119.1750	4637.6100
13	1.78	69.92	3.1684	124.4576	4888.8064
14	1.80	72.19	3.2400	129.9420	5211.3961
15	1.83	74.46	3.3489	136.2618	5544.2916
$Σ$	24.76	931.17	41.0532	1548.2453	58498.5439

There are n = 15 points in this data set. Hand calculations would be started by finding the following five sums:

\begin{aligned} S_{x} & = \sum x_{i} = 24.76, S_{y} = \sum y_{i} = 931.17, \\ S_{x x} & = \sum x_{i}^{2} = 41.0532, S_{y y} = \sum y_{i}^{2} = 58498.5439, \\ S_{x y} & = \sum x_{i} y_{i} = 1548.2453 \end{aligned}

These quantities would be used to calculate the estimates of the regression coefficients, and their standard errors.

\begin{aligned} \hat{β} & = \frac{n S_{x y} - S_{x} S_{y}}{n S_{x x} - S_{x}^{2}} = 61.272 \\ \hat{α} & = \frac{1}{n} S_{y} - \hat{β} \frac{1}{n} S_{x} = - 39.062 \\ s_{ε}^{2} & = \frac{1}{n (n - 2)} [n S_{y y} - S_{y}^{2} - {\hat{β}}^{2} (n S_{x x} - S_{x}^{2})] = 0.5762 \\ s_{\hat{β}}^{2} & = \frac{n s_{ε}^{2}}{n S_{x x} - S_{x}^{2}} = 3.1539 \\ s_{\hat{α}}^{2} & = s_{\hat{β}}^{2} \frac{1}{n} S_{x x} = 8.63185 \end{aligned}

Graph of points and linear least squares lines in the simple linear regression numerical example

The 0.975 quantile of Student's t-distribution with 13 degrees of freedom is $t * 13 = 2.1604$ , and thus the 95% confidence intervals for $α$ and $β$ are

\begin{aligned} α \in [\hat{α} \mp t_{13}^{*} s_{α}] = [- 45.4, - 32.7] \\ β \in [\hat{β} \mp t_{13}^{*} s_{β}] = [57.4, 65.1] \end{aligned}

The product-moment correlation coefficient might also be calculated:

\hat{r} = \frac{n S_{x y} - S_{x} S_{y}}{\sqrt{(n S_{x x} - S_{x}^{2}) (n S_{y y} - S_{y}^{2})}} = 0.9946

Search This Blog