Search This Blog

Thursday, June 23, 2022

Hermitian matrix

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Hermitian_matrix

In mathematics, a Hermitian matrix (or self-adjoint matrix) is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the i-th row and j-th column is equal to the complex conjugate of the element in the j-th row and i-th column, for all indices i and j:

or in matrix form:

Hermitian matrices can be understood as the complex extension of real symmetric matrices.

If the conjugate transpose of a matrix is denoted by , then the Hermitian property can be written concisely as

Hermitian matrices are named after Charles Hermite, who demonstrated in 1855 that matrices of this form share a property with real symmetric matrices of always having real eigenvalues. Other, equivalent notations in common use are , although note that in quantum mechanics, typically means the complex conjugate only, and not the conjugate transpose.

Alternative characterizations

Hermitian matrices can be characterized in a number of equivalent ways, some of which are listed below:

Equality with the adjoint

A square matrix is Hermitian if and only if it is equal to its adjoint, that is, it satisfies

for any pair of vectors , where denotes the inner product operation.

This is also the way that the more general concept of self-adjoint operator is defined.

Reality of quadratic forms

A square matrix is Hermitian if and only if

Spectral properties

A square matrix is Hermitian if and only if it is unitarily diagonalizable with real eigenvalues.

Applications

Hermitian matrices are fundamental to the quantum theory of matrix mechanics created by Werner Heisenberg, Max Born, and Pascual Jordan in 1925.

Examples and solutions

In this section, the conjugate transpose of matrix is denoted as the transpose of matrix is denoted as and conjugate of matrix is denoted as

See the following example:

The diagonal elements must be real, as they must be their own complex conjugate.

Well-known families of Hermitian matrices include the Pauli matrices, the Gell-Mann matrices and their generalizations. In theoretical physics such Hermitian matrices are often multiplied by imaginary coefficients, which results in skew-Hermitian matrices.

Here, we offer another useful Hermitian matrix using an abstract example. If a square matrix equals the multiplication of a matrix and its conjugate transpose, that is, then is a Hermitian positive semi-definite matrix. Furthermore, if is row full-rank, then is positive definite.

Properties

Main diagonal values are real

The entries on the main diagonal (top left to bottom right) of any Hermitian matrix are real.

Proof

By definition of the Hermitian matrix

so for i = j the above follows.

Only the main diagonal entries are necessarily real; Hermitian matrices can have arbitrary complex-valued entries in their off-diagonal elements, as long as diagonally-opposite entries are complex conjugates.

Symmetric

A matrix that has only real entries is symmetric if and only if it is Hermitian matrix. A real and symmetric matrix is simply a special case of a Hermitian matrix.

Proof

by definition. Thus (matrix symmetry) if and only if ( is real).

So, if a real anti-symmetric matrix is multiplied by a multiple of imaginary unit then it becomes Hermitian.

Normal

Every Hermitian matrix is a normal matrix. That is to say,

Proof

, so

Diagonalizable

The finite-dimensional spectral theorem says that any Hermitian matrix can be diagonalized by a unitary matrix, and that the resulting diagonal matrix has only real entries. This implies that all eigenvalues of a Hermitian matrix A with dimension n are real, and that A has n linearly independent eigenvectors. Moreover, a Hermitian matrix has orthogonal eigenvectors for distinct eigenvalues. Even if there are degenerate eigenvalues, it is always possible to find an orthogonal basis of Cn consisting of n eigenvectors of A.

Sum of Hermitian matrices

The sum of any two Hermitian matrices is Hermitian.

Proof

as claimed.

Inverse is Hermitian

The inverse of an invertible Hermitian matrix is Hermitian as well.

Proof

If , then , so as claimed.

Associative product of Hermitian matrices

The product of two Hermitian matrices A and B is Hermitian if and only if AB = BA.

Proof

Note that

Thus if and only if

Thus An is Hermitian if A is Hermitian and n is an integer.

ABA Hermitian

If A and B are Hermitian, then ABA is also Hermitian.

Proof

vHAv is real for complex v

For an arbitrary complex valued vector v the product is real because of This is especially important in quantum physics where Hermitian matrices are operators that measure properties of a system e.g. total spin which have to be real.

Complex Hermitian forms vector space over R

The Hermitian complex n-by-n matrices do not form a vector space over the complex numbers, C, since the identity matrix In is Hermitian, but iIn is not. However the complex Hermitian matrices do form a vector space over the real numbers R. In the 2n2-dimensional vector space of complex n × n matrices over R, the complex Hermitian matrices form a subspace of dimension n2. If Ejk denotes the n-by-n matrix with a 1 in the j,k position and zeros elsewhere, a basis (orthonormal with respect to the Frobenius inner product) can be described as follows:

together with the set of matrices of the form

and the matrices

where denotes the imaginary unit,

An example is that the four Pauli matrices form a complete basis for the vector space of all complex 2-by-2 Hermitian matrices over R.

Eigendecomposition

If n orthonormal eigenvectors of a Hermitian matrix are chosen and written as the columns of the matrix U, then one eigendecomposition of A is where and therefore

where are the eigenvalues on the diagonal of the diagonal matrix

Real determinant

The determinant of a Hermitian matrix is real:

Proof

Therefore if .

(Alternatively, the determinant is the product of the matrix's eigenvalues, and as mentioned before, the eigenvalues of a Hermitian matrix are real.)

Decomposition into Hermitian and skew-Hermitian matrices

Additional facts related to Hermitian matrices include:

  • The sum of a square matrix and its conjugate transpose is Hermitian.
  • The difference of a square matrix and its conjugate transpose is skew-Hermitian (also called antihermitian). This implies that the commutator of two Hermitian matrices is skew-Hermitian.
  • An arbitrary square matrix C can be written as the sum of a Hermitian matrix A and a skew-Hermitian matrix B. This is known as the Toeplitz decomposition of C.

Rayleigh quotient

In mathematics, for a given complex Hermitian matrix M and nonzero vector x, the Rayleigh quotient , is defined as:

For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the conjugate transpose to the usual transpose . Note that for any non-zero real scalar . Also, recall that a Hermitian (or real symmetric) matrix has real eigenvalues.

It can be shown that, for a given matrix, the Rayleigh quotient reaches its minimum value (the smallest eigenvalue of M) when is (the corresponding eigenvector). Similarly, and

The Rayleigh quotient is used in the min-max theorem to get exact values of all eigenvalues. It is also used in eigenvalue algorithms to obtain an eigenvalue approximation from an eigenvector approximation. Specifically, this is the basis for Rayleigh quotient iteration.

The range of the Rayleigh quotient (for matrix that is not necessarily Hermitian) is called a numerical range (or spectrum in functional analysis). When the matrix is Hermitian, the numerical range is equal to the spectral norm. Still in functional analysis, is known as the spectral radius. In the context of C*-algebras or algebraic quantum mechanics, the function that to M associates the Rayleigh quotient R(M, x) for a fixed x and M varying through the algebra would be referred to as "vector state" of the algebra.

Eigenface

From Wikipedia, the free encyclopedia
 
Some eigenfaces from AT&T Laboratories Cambridge

An eigenface (/ˈɡənˌfs/) is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set.

History

The eigenface approach began with a search for a low-dimensional representation of face images. Sirovich and Kirby showed that principal component analysis could be used on a collection of face images to form a set of basis features. These basis images, known as eigenpictures, could be linearly combined to reconstruct images in the original training set. If the training set consists of M images, principal component analysis could form a basis set of N images, where N < M. The reconstruction error is reduced by increasing the number of eigenpictures; however, the number needed is always chosen less than M. For example, if you need to generate a number of N eigenfaces for a training set of M face images, you can say that each face image can be made up of "proportions" of all the K "features" or eigenfaces: Face image1 = (23% of E1) + (2% of E2) + (51% of E3) + ... + (1% En).

In 1991 M. Turk and A. Pentland expanded these results and presented the eigenface method of face recognition. In addition to designing a system for automated face recognition using eigenfaces, they showed a way of calculating the eigenvectors of a covariance matrix such that computers of the time could perform eigen-decomposition on a large number of face images. Face images usually occupy a high-dimensional space and conventional principal component analysis was intractable on such data sets. Turk and Pentland's paper demonstrated ways to extract the eigenvectors based on matrices sized by the number of images rather than the number of pixels.

Once established, the eigenface method was expanded to include methods of preprocessing to improve accuracy. Multiple manifold approaches were also used to build sets of eigenfaces for different subjects and different features, such as the eyes.

Generation

A set of eigenfaces can be generated by performing a mathematical process called principal component analysis (PCA) on a large set of images depicting different human faces. Informally, eigenfaces can be considered a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. Any human face can be considered to be a combination of these standard faces. For example, one's face might be composed of the average face plus 10% from eigenface 1, 55% from eigenface 2, and even −3% from eigenface 3. Remarkably, it does not take many eigenfaces combined together to achieve a fair approximation of most faces. Also, because a person's face is not recorded by a digital photograph, but instead as just a list of values (one value for each eigenface in the database used), much less space is taken for each person's face.

The eigenfaces that are created will appear as light and dark areas that are arranged in a specific pattern. This pattern is how different features of a face are singled out to be evaluated and scored. There will be a pattern to evaluate symmetry, whether there is any style of facial hair, where the hairline is, or an evaluation of the size of the nose or mouth. Other eigenfaces have patterns that are less simple to identify, and the image of the eigenface may look very little like a face.

The technique used in creating eigenfaces and using them for recognition is also used outside of face recognition: handwriting recognition, lip reading, voice recognition, sign language/hand gestures interpretation and medical imaging analysis. Therefore, some do not use the term eigenface, but prefer to use 'eigenimage'.

Practical implementation

To create a set of eigenfaces, one must:

  1. Prepare a training set of face images. The pictures constituting the training set should have been taken under the same lighting conditions, and must be normalized to have the eyes and mouths aligned across all images. They must also be all resampled to a common pixel resolution (r × c). Each image is treated as one vector, simply by concatenating the rows of pixels in the original image, resulting in a single column with r × c elements. For this implementation, it is assumed that all images of the training set are stored in a single matrix T, where each column of the matrix is an image.
  2. Subtract the mean. The average image a has to be calculated and then subtracted from each original image in T.
  3. Calculate the eigenvectors and eigenvalues of the covariance matrix S. Each eigenvector has the same dimensionality (number of components) as the original images, and thus can itself be seen as an image. The eigenvectors of this covariance matrix are therefore called eigenfaces. They are the directions in which the images differ from the mean image. Usually this will be a computationally expensive step (if at all possible), but the practical applicability of eigenfaces stems from the possibility to compute the eigenvectors of S efficiently, without ever computing S explicitly, as detailed below.
  4. Choose the principal components. Sort the eigenvalues in descending order and arrange eigenvectors accordingly. The number of principal components k is determined arbitrarily by setting a threshold ε on the total variance. Total variance , n = number of components.
  5. k is the smallest number that satisfies

These eigenfaces can now be used to represent both existing and new faces: we can project a new (mean-subtracted) image on the eigenfaces and thereby record how that new face differs from the mean face. The eigenvalues associated with each eigenface represent how much the images in the training set vary from the mean image in that direction. Information is lost by projecting the image on a subset of the eigenvectors, but losses are minimized by keeping those eigenfaces with the largest eigenvalues. For instance, working with a 100 × 100 image will produce 10,000 eigenvectors. In practical applications, most faces can typically be identified using a projection on between 100 and 150 eigenfaces, so that most of the 10,000 eigenvectors can be discarded.

Matlab example code

Here is an example of calculating eigenfaces with Extended Yale Face Database B. To evade computational and storage bottleneck, the face images are sampled down by a factor 4×4=16.

clear all;
close all;
load yalefaces
[h, w, n] = size(yalefaces);
d = h * w;
% vectorize images
x = reshape(yalefaces, [d n]);
x = double(x);
% subtract mean
mean_matrix = mean(x, 2);
x = bsxfun(@minus, x, mean_matrix);
% calculate covariance
s = cov(x');
% obtain eigenvalue & eigenvector
[V, D] = eig(s);
eigval = diag(D);
% sort eigenvalues in descending order
eigval = eigval(end: - 1:1);
V = fliplr(V);
% show mean and 1st through 15th principal eigenvectors
figure, subplot(4, 4, 1)
imagesc(reshape(mean_matrix, [h, w]))
colormap gray
for i = 1:15
    subplot(4, 4, i + 1)
    imagesc(reshape(V(:, i), h, w))
end

Note that although the covariance matrix S generates many eigenfaces, only a fraction of those are needed to represent the majority of the faces. For example, to represent 95% of the total variation of all face images, only the first 43 eigenfaces are needed. To calculate this result, implement the following code:

% evaluate the number of principal components needed to represent 95% Total variance.
eigsum = sum(eigval);
csum = 0;
for i = 1:d
    csum = csum + eigval(i);
    tv = csum / eigsum;
    if tv > 0.95
        k95 = i;
        break
    end;
end;

Computing the eigenvectors

Performing PCA directly on the covariance matrix of the images is often computationally infeasible. If small images are used, say 100 × 100 pixels, each image is a point in a 10,000-dimensional space and the covariance matrix S is a matrix of 10,000 × 10,000 = 108 elements. However the rank of the covariance matrix is limited by the number of training examples: if there are N training examples, there will be at most N − 1 eigenvectors with non-zero eigenvalues. If the number of training examples is smaller than the dimensionality of the images, the principal components can be computed more easily as follows.

Let T be the matrix of preprocessed training examples, where each column contains one mean-subtracted image. The covariance matrix can then be computed as S = TTT and the eigenvector decomposition of S is given by

However TTT is a large matrix, and if instead we take the eigenvalue decomposition of

then we notice that by pre-multiplying both sides of the equation with T, we obtain

Meaning that, if ui is an eigenvector of TTT, then vi = Tui is an eigenvector of S. If we have a training set of 300 images of 100 × 100 pixels, the matrix TTT is a 300 × 300 matrix, which is much more manageable than the 10,000 × 10,000 covariance matrix. Notice however that the resulting vectors vi are not normalised; if normalisation is required it should be applied as an extra step.

Connection with SVD

Let X denote the data matrix with column as the image vector with mean subtracted. Then,

Let the singular value decomposition (SVD) of X be:

Then the eigenvalue decomposition for is:

, where Λ=diag (eigenvalues of )

Thus we can see easily that:

The eigenfaces = the first () columns of associated with the nonzero singular values.
The ith eigenvalue of ith singular value of

Using SVD on data matrix X, it is unnecessary to calculate the actual covariance matrix to get eigenfaces.

Use in facial recognition

Facial recognition was the motivation for the creation of eigenfaces. For this use, eigenfaces have advantages over other techniques available, such as the system's speed and efficiency. As eigenface is primarily a dimension reduction method, a system can represent many subjects with a relatively small set of data. As a face-recognition system it is also fairly invariant to large reductions in image sizing; however, it begins to fail considerably when the variation between the seen images and probe image is large.

To recognise faces, gallery images – those seen by the system – are saved as collections of weights describing the contribution each eigenface has to that image. When a new face is presented to the system for classification, its own weights are found by projecting the image onto the collection of eigenfaces. This provides a set of weights describing the probe face. These weights are then classified against all weights in the gallery set to find the closest match. A nearest-neighbour method is a simple approach for finding the Euclidean distance between two vectors, where the minimum can be classified as the closest subject.

Intuitively, the recognition process with the eigenface method is to project query images into the face-space spanned by eigenfaces calculated, and to find the closest match to a face class in that face-space.

Pseudo code
  • Given input image vector , the mean image vector from the database , calculate the weight of the kth eigenface as:
    Then form a weight vector
  • Compare W with weight vectors of images in the database. Find the Euclidean distance.
  • If , then the mth entry in the database is a candidate of recognition.
  • If , then U may be an unknown face and can be added to the database.
  • If is not a face image.

The weights of each gallery image only convey information describing that image, not that subject. An image of one subject under frontal lighting may have very different weights to those of the same subject under strong left lighting. This limits the application of such a system. Experiments in the original Eigenface paper presented the following results: an average of 96% with light variation, 85% with orientation variation, and 64% with size variation.

Various extensions have been made to the eigenface method such eigenfeatures. This method combines facial metrics (measuring distance between facial features) with the eigenface representation. Another method similar to the eigenface technique is 'fisherfaces' which uses linear discriminant analysis. This method for facial recognition is less sensitive to variation in lighting and pose of the face than using eigenfaces. Fisherface uses labelled data to retain more of the class-specific information during the dimension reduction stage.

A further alternative to eigenfaces and fisherfaces is the active appearance model. This approach uses an active shape model to describe the outline of a face. By collecting many face outlines, principal component analysis can be used to form a basis set of models that encapsulate the variation of different faces.

Many modern approaches still use principal component analysis as a means of dimension reduction or to form basis images for different modes of variation.

Review

Eigenface provides an easy and cheap way to realize face recognition in that:

  • Its training process is completely automatic and easy to code.
  • Eigenface adequately reduces statistical complexity in face image representation.
  • Once eigenfaces of a database are calculated, face recognition can be achieved in real time.
  • Eigenface can handle large databases.

However, the deficiencies of the eigenface method are also obvious:

  • It is very sensitive to lighting, scale and translation, and requires a highly controlled environment.
  • Eigenface has difficulty capturing expression changes.
  • The most significant eigenfaces are mainly about illumination encoding and do not provide useful information regarding the actual face.

To cope with illumination distraction in practice, the eigenface method usually discards the first three eigenfaces from the dataset. Since illumination is usually the cause behind the largest variations in face images, the first three eigenfaces will mainly capture the information of 3-dimensional lighting changes, which has little contribution to face recognition. By discarding those three eigenfaces, there will be a decent amount of boost in accuracy of face recognition, but other methods such as fisherface and linear space still have the advantage.

Entropy (information theory)

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Entropy_(information_theory) In info...