A Medley of Potpourri: Jun 29, 2024

Saturday, June 29, 2024

Variational method (quantum mechanics)

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Variational_method_(quantum_mechanics)

In quantum mechanics, the variational method is one way of finding approximations to the lowest energy eigenstate or ground state, and some excited states. This allows calculating approximate wavefunctions such as molecular orbitals. The basis for this method is the variational principle.

The method consists of choosing a "trial wavefunction" depending on one or more parameters, and finding the values of these parameters for which the expectation value of the energy is the lowest possible. The wavefunction obtained by fixing the parameters to such values is then an approximation to the ground state wavefunction, and the expectation value of the energy in that state is an upper bound to the ground state energy. The Hartree–Fock method, Density matrix renormalization group, and Ritz method apply the variational method.

Description

Suppose we are given a Hilbert space and a Hermitian operator over it called the Hamiltonian $H$ . Ignoring complications about continuous spectra, we consider the discrete spectrum of $H$ and a basis of eigenvectors ${| ψ_{λ} ⟩}$ (see spectral theorem for Hermitian operators for the mathematical background): $⟨ ψ_{λ_{1}} | ψ_{λ_{2}} ⟩ = δ_{λ_{1} λ_{2}},$ where $δ_{i j}$ is the Kronecker delta $δ_{i j} = {\begin{cases} 0 & if i \neq j, \\ 1 & if i = j, \end{cases}$ and the ${| ψ_{λ} ⟩}$ satisfy the eigenvalue equation $H | ψ_{λ} ⟩ = λ | ψ_{λ} ⟩ .$

Once again ignoring complications involved with a continuous spectrum of $H$ , suppose the spectrum of $H$ is bounded from below and that its greatest lower bound is $E 0$ . The expectation value of $H$ in a state $| ψ ⟩$ is then $\begin{aligned} ⟨ ψ | H | ψ ⟩ & = \sum_{λ_{1}, λ_{2} \in S p e c (H)} ⟨ ψ | ψ_{λ_{1}} ⟩ ⟨ ψ_{λ_{1}} | H | ψ_{λ_{2}} ⟩ ⟨ ψ_{λ_{2}} | ψ ⟩ \\ = \sum_{λ \in S p e c (H)} λ {| ⟨ ψ_{λ} | ψ ⟩ |}^{2} \geq \sum_{λ \in S p e c (H)} E_{0} {| ⟨ ψ_{λ} | ψ ⟩ |}^{2} = E_{0} ⟨ ψ | ψ ⟩ . \end{aligned}$

If we were to vary over all possible states with norm 1 trying to minimize the expectation value of $H$ , the lowest value would be $E_{0}$ and the corresponding state would be the ground state, as well as an eigenstate of $H$ . Varying over the entire Hilbert space is usually too complicated for physical calculations, and a subspace of the entire Hilbert space is chosen, parametrized by some (real) differentiable parameters $α i (i = 1, 2, ..., N)$ . The choice of the subspace is called the ansatz. Some choices of ansatzes lead to better approximations than others, therefore the choice of ansatz is important.

Let's assume there is some overlap between the ansatz and the ground state (otherwise, it's a bad ansatz). We wish to normalize the ansatz, so we have the constraints $⟨ ψ (α) | ψ (α) ⟩ = 1$ and we wish to minimize $ε (α) = ⟨ ψ (α) | H | ψ (α) ⟩ .$

This, in general, is not an easy task, since we are looking for a global minimum and finding the zeroes of the partial derivatives of $ε$ over all $α i$ is not sufficient. If $ψ (α)$ is expressed as a linear combination of other functions ( $α i$ being the coefficients), as in the Ritz method, there is only one minimum and the problem is straightforward. There are other, non-linear methods, however, such as the Hartree–Fock method, that are also not characterized by a multitude of minima and are therefore comfortable in calculations.

There is an additional complication in the calculations described. As $ε$ tends toward $E 0$ in minimization calculations, there is no guarantee that the corresponding trial wavefunctions will tend to the actual wavefunction. This has been demonstrated by calculations using a modified harmonic oscillator as a model system, in which an exactly solvable system is approached using the variational method. A wavefunction different from the exact one is obtained by use of the method described above.

Although usually limited to calculations of the ground state energy, this method can be applied in certain cases to calculations of excited states as well. If the ground state wavefunction is known, either by the method of variation or by direct calculation, a subset of the Hilbert space can be chosen which is orthogonal to the ground state wavefunction.

$| ψ ⟩ = | ψ_{test} ⟩ - ⟨ ψ_{g r} | ψ_{test} ⟩ | ψ_{gr} ⟩$

The resulting minimum is usually not as accurate as for the ground state, as any difference between the true ground state and $ψ_{gr}$ results in a lower excited energy. This defect is worsened with each higher excited state.

In another formulation: $E_{ground} \leq ⟨ ϕ | H | ϕ ⟩ .$

This holds for any trial φ since, by definition, the ground state wavefunction has the lowest energy, and any trial wavefunction will have energy greater than or equal to it.

Proof: $φ$ can be expanded as a linear combination of the actual eigenfunctions of the Hamiltonian (which we assume to be normalized and orthogonal): $ϕ = \sum_{n} c_{n} ψ_{n} .$

Then, to find the expectation value of the Hamiltonian: $\begin{aligned} ⟨ H ⟩ = ⟨ ϕ | H | ϕ ⟩ = & ⟨ \sum_{n} c_{n} ψ_{n} | H | \sum_{m} c_{m} ψ_{m} ⟩ \\ = & \sum_{n} \sum_{m} ⟨ c_{n}^{*} ψ_{n} | E_{m} | c_{m} ψ_{m} ⟩ \\ = & \sum_{n} \sum_{m} c_{n}^{*} c_{m} E_{m} ⟨ ψ_{n} | ψ_{m} ⟩ \\ = & \sum_{n} | c_{n} |^{2} E_{n} . \end{aligned}$

Now, the ground state energy is the lowest energy possible, i.e., $E_{n} \geq E_{ground}$ . Therefore, if the guessed wave function $φ$ is normalized: $⟨ ϕ | H | ϕ ⟩ \geq E_{ground} \sum_{n} | c_{n} |^{2} = E_{ground} .$

In general

For a hamiltonian H that describes the studied system and any normalizable function Ψ with arguments appropriate for the unknown wave function of the system, we define the functional $ε [Ψ] = \frac{⟨ Ψ | \hat{H} | Ψ ⟩}{⟨ Ψ | Ψ ⟩} .$

The variational principle states that

$ε \geq E_{0}$ , where $E_{0}$ is the lowest energy eigenstate (ground state) of the hamiltonian
$ε = E_{0}$ if and only if $Ψ$ is exactly equal to the wave function of the ground state of the studied system.

The variational principle formulated above is the basis of the variational method used in quantum mechanics and quantum chemistry to find approximations to the ground state.

Another facet in variational principles in quantum mechanics is that since $Ψ$ and $Ψ^{†}$ can be varied separately (a fact arising due to the complex nature of the wave function), the quantities can be varied in principle just one at a time.

Helium atom ground state

The helium atom consists of two electrons with mass m and electric charge $- e$ , around an essentially fixed nucleus of mass $M ≫ m$ and charge $+2 e$ . The Hamiltonian for it, neglecting the fine structure, is: $H = - \frac{ℏ^{2}}{2 m} (\nabla_{1}^{2} + \nabla_{2}^{2}) - \frac{e^{2}}{4 π ε_{0}} (\frac{2}{r_{1}} + \frac{2}{r_{2}} - \frac{1}{| r_{1} - r_{2} |})$ where ħ is the reduced Planck constant, $ε 0$ is the vacuum permittivity, $r i$ (for $i = 1, 2$ ) is the distance of the $i$ -th electron from the nucleus, and $| r 1 - r 2 |$ is the distance between the two electrons.

If the term $V ee = e 2 /(4 πε 0 | r 1 - r 2 |)$ , representing the repulsion between the two electrons, were excluded, the Hamiltonian would become the sum of two hydrogen-like atom Hamiltonians with nuclear charge $+2 e$ . The ground state energy would then be $8 E 1 = -109 eV$ , where $E 1$ is the Rydberg constant, and its ground state wavefunction would be the product of two wavefunctions for the ground state of hydrogen-like atoms: $ψ (r_{1}, r_{2}) = \frac{Z^{3}}{π a_{0}^{3}} e^{- Z (r_{1} + r_{2}) / a_{0}} .$ where $a 0$ is the Bohr radius and $Z = 2$ , helium's nuclear charge. The expectation value of the total Hamiltonian H (including the term $V ee$ ) in the state described by $ψ 0$ will be an upper bound for its ground state energy. $⟨ V ee ⟩$ is $-5 E 1 /2 = 34 eV$ , so $⟨ H ⟩$ is $8 E 1 - 5 E 1 /2 = -75 eV$ .

A tighter upper bound can be found by using a better trial wavefunction with 'tunable' parameters. Each electron can be thought to see the nuclear charge partially "shielded" by the other electron, so we can use a trial wavefunction equal with an "effective" nuclear charge $Z < 2$ : The expectation value of $H$ in this state is: $⟨ H ⟩ = [- 2 Z^{2} + \frac{27}{4} Z] E_{1}$

This is minimal for $Z = 27/16$ implying shielding reduces the effective charge to ~1.69. Substituting this value of $Z$ into the expression for $H$ yields $729 E 1 /128 = -77.5 eV$ , within 2% of the experimental value, −78.975 eV.

Even closer estimations of this energy have been found using more complicated trial wave functions with more parameters. This is done in physical chemistry via variational Monte Carlo.

The Wisdom of Crowds

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/The_Wisdom_of_Crowds

The Wisdom of Crowds
Cover of mass market edition by Anchor
Author	James Surowiecki
Language	English
Publisher	Doubleday; Anchor
Publication date	2004
Publication place	United States
Pages	336
ISBN	978-0-385-50386-0
OCLC	61254310
Dewey Decimal	303.3/8 22
LC Class	JC328.2 .S87 2005

The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, published in 2004, is a book written by James Surowiecki about the aggregation of information in groups, resulting in decisions that, he argues, are often better than could have been made by any single member of the group. The book presents numerous case studies and anecdotes to illustrate its argument, and touches on several fields, primarily economics and psychology.

The opening anecdote relates Francis Galton's surprise that the crowd at a county fair accurately guessed the weight of an ox when their individual guesses were averaged (the average was closer to the ox's true butchered weight than the estimates of most crowd members).

The book relates to diverse collections of independently deciding individuals, rather than crowd psychology as traditionally understood. Its central thesis, that a diverse collection of independently deciding individuals is likely to make certain types of decisions and predictions better than individuals or even experts, draws many parallels with statistical sampling; however, there is little overt discussion of statistics in the book.

Its title is an allusion to Charles Mackay's Extraordinary Popular Delusions and the Madness of Crowds, published in 1841.

Types of crowd wisdom

Surowiecki breaks down the advantages he sees in disorganized decisions into three main types, which he classifies as

Cognition: Thinking and information processing, such as market judgment, which he argues can be much faster, more reliable, and less subject to political forces than the deliberations of experts or expert committees.
Coordination: Coordination of behavior includes optimizing the utilization of a popular bar and not colliding in moving traffic flows. The book is replete with examples from experimental economics, but this section relies more on naturally occurring experiments such as pedestrians optimizing the pavement flow or the extent of crowding in popular restaurants. He examines how common understanding within a culture allows remarkably accurate judgments about specific reactions of other members of the culture.
Cooperation: How groups of people can form networks of trust without a central system controlling their behavior or directly enforcing their compliance. This section is especially pro free market.

Five elements required to form a wise crowd

Not all crowds (groups) are wise. Consider, for example, mobs or crazed investors in a stock market bubble. According to Surowiecki, these key criteria separate wise crowds from irrational ones:

Criteria	Description
Diversity of opinion	Each person should have private information even if it is just an eccentric interpretation of the known facts. (Chapter 2)
Independence	People's opinions are not determined by the opinions of those around them. (Chapter 3)
Decentralization	People are able to specialize and draw on local knowledge. (Chapter 4)
Aggregation	Some mechanism exists for turning private judgements into a collective decision. (Chapter 5)
Trust	Each person trusts the collective group to be fair. (Chapter 6)

Based on Surowiecki's book, Oinas-Kukkonen captures the wisdom of crowds approach with the following eight conjectures:

It is possible to describe how people in a group think as a whole.
In some cases, groups are remarkably intelligent and are often smarter than the smartest people in them.
The three conditions for a group to be intelligent are diversity, independence, and decentralization.
The best decisions are a product of disagreement and contest.
Too much communication can make the group as a whole less intelligent.
Information aggregation functionality is needed.
The right information needs to be delivered to the right people in the right place, at the right time, and in the right way.
There is no need to chase the expert.

Failures of crowd intelligence

Surowiecki studies situations (such as rational bubbles) in which the crowd produces very bad judgment, and argues that in these types of situations their cognition or cooperation failed because (in one way or another) the members of the crowd were too conscious of the opinions of others and began to emulate each other and conform rather than think differently. Although he gives experimental details of crowds collectively swayed by a persuasive speaker, he says that the main reason that groups of people intellectually conform is that the system for making decisions has a systematic flaw.

Causes and detailed case histories of such failures include:

Extreme	Description
Homogeneity	Surowiecki stresses the need for diversity within a crowd to ensure enough variance in approach, thought process, and private information.
Centralization	The 2003 Space Shuttle Columbia disaster, which he blames on a hierarchical NASA management bureaucracy that was totally closed to the wisdom of low-level engineers.
Division	The United States Intelligence Community, the 9/11 Commission Report claims, failed to prevent the 11 September 2001 attacks partly because information held by one subdivision was not accessible by another. Surowiecki's argument is that crowds (of intelligence analysts in this case) work best when they choose for themselves what to work on and what information they need. (He cites the SARS-virus isolation as an example in which the free flow of data enabled laboratories around the world to coordinate research without a central point of control.) The Office of the Director of National Intelligence and the CIA have created a Wikipedia-style information sharing network called Intellipedia that will help the free flow of information to prevent such failures again.
Imitation	Where choices are visible and made in sequence, an "information cascade" can form in which only the first few decision makers gain anything by contemplating the choices available: once past decisions have become sufficiently informative, it pays for later decision makers to simply copy those around them. This can lead to fragile social outcomes.
Emotionality	Emotional factors, such as a feeling of belonging, can lead to peer pressure, herd instinct, and in extreme cases collective hysteria.

Connection

At the 2005 O'Reilly Emerging Technology Conference Surowiecki presented a session entitled Independent Individuals and Wise Crowds, or Is It Possible to Be Too Connected?

The question for all of us is, how can you have interaction without information cascades, without losing the independence that's such a key factor in group intelligence?

He recommends:

Keep your ties loose.
Keep yourself exposed to as many diverse sources of information as possible.
Make groups that range across hierarchies.

Tim O'Reilly and others also discuss the success of Google, wikis, blogging, and Web 2.0 in the context of the wisdom of crowds.

Applications

Surowiecki is a strong advocate of the benefits of decision markets and regrets the failure of DARPA's controversial Policy Analysis Market to get off the ground. He points to the success of public and internal corporate markets as evidence that a collection of people with varying points of view but the same motivation (to make a good guess) can produce an accurate aggregate prediction. According to Surowiecki, the aggregate predictions have been shown to be more reliable than the output of any think tank. He advocates extensions of the existing futures markets even into areas such as terrorist activity and prediction markets within companies.

To illustrate this thesis, he says that his publisher can publish a more compelling output by relying on individual authors under one-off contracts bringing book ideas to them. In this way, they are able to tap into the wisdom of a much larger crowd than would be possible with an in-house writing team.

Will Hutton has argued that Surowiecki's analysis applies to value judgments as well as factual issues, with crowd decisions that "emerge of our own aggregated free will [being] astonishingly... decent". He concludes that "There's no better case for pluralism, diversity and democracy, along with a genuinely independent press."

Applications of the wisdom-of-crowds effect exist in three general categories: Prediction markets, Delphi methods, and extensions of the traditional opinion poll.

Prediction markets

The most common application is the prediction market, a speculative or betting market created to make verifiable predictions. Surowiecki discusses the success of prediction markets. Similar to Delphi methods but unlike opinion polls, prediction (information) markets ask questions like, "Who do you think will win the election?" and predict outcomes rather well. Answers to the question, "Who will you vote for?" are not as predictive.

Assets are cash values tied to specific outcomes (e.g., Candidate X will win the election) or parameters (e.g., Next quarter's revenue). The current market prices are interpreted as predictions of the probability of the event or the expected value of the parameter. Betfair is the world's biggest prediction exchange, with around $28 billion traded in 2007. NewsFutures is an international prediction market that generates consensus probabilities for news events. Intrade.com, which operated a person to person prediction market based in Dublin Ireland achieved very high media attention in 2012 related to the US Presidential Elections, with more than 1.5 million search references to Intrade and Intrade data. Several companies now offer enterprise class prediction marketplaces to predict project completion dates, sales, or the market potential for new ideas. A number of Web-based quasi-prediction marketplace companies have sprung up to offer predictions primarily on sporting events and stock markets but also on other topics. The principle of the prediction market is also used in project management software to let team members predict a project's "real" deadline and budget.

Delphi methods

The Delphi method is a systematic, interactive forecasting method which relies on a panel of independent experts. The carefully selected experts answer questionnaires in two or more rounds. After each round, a facilitator provides an anonymous summary of the experts' forecasts from the previous round as well as the reasons they provided for their judgments. Thus, participants are encouraged to revise their earlier answers in light of the replies of other members of the group. It is believed that during this process the range of the answers will decrease and the group will converge towards the "correct" answer. Many of the consensus forecasts have proven to be more accurate than forecasts made by individuals.

Human Swarming

Designed as an optimized method for unleashing the wisdom of crowds, this approach implements real-time feedback loops around synchronous groups of users with the goal of achieving more accurate insights from fewer numbers of users. Human Swarming (sometimes referred to as Social Swarming) is modeled after biological processes in birds, fish, and insects, and is enabled among networked users by using mediating software such as the UNU collective intelligence platform. As published by Rosenberg (2015), such real-time control systems enable groups of human participants to behave as a unified collective intelligence. When logged into the UNU platform, for example, groups of distributed users can collectively answer questions, generate ideas, and make predictions as a singular emergent entity. Early testing shows that human swarms can out-predict individuals across a variety of real-world projections.

In popular culture

Hugo-winning writer John Brunner's 1975 science fiction novel The Shockwave Rider includes an elaborate planet-wide information futures and betting pool called "Delphi" based on the Delphi method.

Illusionist Derren Brown claimed to use the 'Wisdom of Crowds' concept to explain how he correctly predicted the UK National Lottery results in September 2009. His explanation was met with criticism on-line, by people who argued that the concept was misapplied. The methodology employed was too flawed; the sample of people could not have been totally objective and free in thought, because they were gathered multiple times and socialised with each other too much; a condition Surowiecki tells us is corrosive to pure independence and the diversity of mind required (Surowiecki 2004:38). Groups thus fall into groupthink where they increasingly make decisions based on influence of each other and are thus less accurate. However, other commentators have suggested that, given the entertainment nature of the show, Brown's misapplication of the theory may have been a deliberate smokescreen to conceal his true method.

This was also shown in the television series East of Eden where a social network of roughly 10,000 individuals came up with ideas to stop missiles in a very short span of time.

Wisdom of Crowds would have a significant influence on the naming of the crowdsourcing creative company Tongal, which is an anagram for Galton, the last name of the social-scientist highlighted in the introduction to Surowiecki's book. Francis Galton recognized the ability of a crowd's averaged weight-guesses for oxen to exceed the accuracy of experts.

Criticism

In his book Embracing the Wide Sky, Daniel Tammet finds fault with this notion. Tammet points out the potential for problems in systems which have poorly defined means of pooling knowledge: Subject matter experts can be overruled and even wrongly punished by less knowledgeable persons in crowd sourced systems, citing a case of this on Wikipedia. Furthermore, Tammet mentions the assessment of the accuracy of Wikipedia as described in a study mentioned in Nature in 2005, outlining several flaws in the study's methodology which included that the study made no distinction between minor errors and large errors.

Tammet also cites the Kasparov versus the World, an online competition that pitted the brainpower of tens of thousands of online chess players choosing moves in a match against Garry Kasparov, which was won by Kasparov, not the "crowd". Although Kasparov did say, "It is the greatest game in the history of chess. The sheer number of ideas, the complexity, and the contribution it has made to chess make it the most important game ever played."

In his book You Are Not a Gadget, Jaron Lanier argues that crowd wisdom is best suited for problems that involve optimization, but ill-suited for problems that require creativity or innovation. In the online article Digital Maoism, Lanier argues that the collective is more likely to be smart only when

1. it is not defining its own questions,

2. the goodness of an answer can be evaluated by a simple result (such as a single numeric value), and

3. the information system which informs the collective is filtered by a quality control mechanism that relies on individuals to a high degree.

Lanier argues that only under those circumstances can a collective be smarter than a person. If any of these conditions are broken, the collective becomes unreliable or worse.

Iain Couzin, a professor in Princeton's Department of Ecology and Evolutionary Biology, and Albert Kao, his student, in a 2014 article, in the journal Proceedings of the Royal Society, argue that "the conventional view of the wisdom of crowds may not be informative in complex and realistic environments, and that being in small groups can maximize decision accuracy across many contexts." By "small groups," Couzin and Kao mean fewer than a dozen people. They conclude and say that “the decisions of very large groups may be highly accurate when the information used is independently sampled, but they are particularly susceptible to the negative effects of correlated information, even when only a minority of the group uses such information.”

Free energy principle

From Wikipedia, the free encyclopedia

https://en.wikipedia.org/wiki/Free_energy_principle

The free energy principle is a theoretical framework suggesting that the brain reduces surprise or uncertainty by making predictions based on internal models and updating them using sensory input. It highlights the brain's objective of aligning its internal model with the external world to enhance prediction accuracy. This principle integrates Bayesian inference with active inference, where actions are guided by predictions and sensory feedback refines them. It has wide-ranging implications for comprehending brain function, perception, and action.

Overview

In biophysics and cognitive science, the free energy principle is a mathematical principle describing a formal account of the representational capacities of physical systems: that is, why things that exist look as if they track properties of the systems to which they are coupled.

It establishes that the dynamics of physical systems minimise a quantity known as surprisal (which is just the negative log probability of some outcome); or equivalently, its variational upper bound, called free energy. The principle is used especially in Bayesian approaches to brain function, but also some approaches to artificial intelligence; it is formally related to variational Bayesian methods and was originally introduced by Karl Friston as an explanation for embodied perception-action loops in neuroscience.

The free energy principle models the behaviour of systems that are distinct from, but coupled to, another system (e.g., an embedding environment), where the degrees of freedom that implement the interface between the two systems is known as a Markov blanket. More formally, the free energy principle says that if a system has a "particular partition" (i.e., into particles, with their Markov blankets), then subsets of that system will track the statistical structure of other subsets (which are known as internal and external states or paths of a system).

The free energy principle is based on the Bayesian idea of the brain as an “inference engine.” Under the free energy principle, systems pursue paths of least surprise, or equivalently, minimize the difference between predictions based on their model of the world and their sense and associated perception. This difference is quantified by variational free energy and is minimized by continuous correction of the world model of the system, or by making the world more like the predictions of the system. By actively changing the world to make it closer to the expected state, systems can also minimize the free energy of the system. Friston assumes this to be the principle of all biological reaction. Friston also believes his principle applies to mental disorders as well as to artificial intelligence. AI implementations based on the active inference principle have shown advantages over other methods.

The free energy principle is a mathematical principle of information physics: much like the principle of maximum entropy or the principle of least action, it is true on mathematical grounds. To attempt to falsify the free energy principle is a category mistake, akin to trying to falsify calculus by making empirical observations. (One cannot invalidate a mathematical theory in this way; instead, one would need to derive a formal contradiction from the theory.) In a 2018 interview, Friston explained what it entails for the free energy principle to not be subject to falsification: "I think it is useful to make a fundamental distinction at this point—that we can appeal to later. The distinction is between a state and process theory; i.e., the difference between a normative principle that things may or may not conform to, and a process theory or hypothesis about how that principle is realized. Under this distinction, the free energy principle stands in stark distinction to things like predictive coding and the Bayesian brain hypothesis. This is because the free energy principle is what it is — a principle. Like Hamilton's principle of stationary action, it cannot be falsified. It cannot be disproven. In fact, there’s not much you can do with it, unless you ask whether measurable systems conform to the principle. On the other hand, hypotheses that the brain performs some form of Bayesian inference or predictive coding are what they are—hypotheses. These hypotheses may or may not be supported by empirical evidence." There are many examples of these hypotheses being supported by empirical evidence.

Background

The notion that self-organising biological systems – like a cell or brain – can be understood as minimising variational free energy is based upon Helmholtz’s work on unconscious inference and subsequent treatments in psychology and machine learning. Variational free energy is a function of observations and a probability density over their hidden causes. This variational density is defined in relation to a probabilistic model that generates predicted observations from hypothesized causes. In this setting, free energy provides an approximation to Bayesian model evidence. Therefore, its minimisation can be seen as a Bayesian inference process. When a system actively makes observations to minimise free energy, it implicitly performs active inference and maximises the evidence for its model of the world.

However, free energy is also an upper bound on the self-information of outcomes, where the long-term average of surprise is entropy. This means that if a system acts to minimise free energy, it will implicitly place an upper bound on the entropy of the outcomes – or sensory states – it samples.

Relationship to other theories

Active inference is closely related to the good regulator theorem and related accounts of self-organisation, such as self-assembly, pattern formation, autopoiesis and practopoiesis. It addresses the themes considered in cybernetics, synergetics and embodied cognition. Because free energy can be expressed as the expected energy of observations under the variational density minus its entropy, it is also related to the maximum entropy principle. Finally, because the time average of energy is action, the principle of minimum variational free energy is a principle of least action. Active inference allowing for scale invariance has also been applied to other theories and domains. For instance, it has been applied to sociology, linguistics and communication, semiotics, and epidemiology among others.

Negative free energy is formally equivalent to the evidence lower bound, which is commonly used in machine learning to train generative models, such as variational autoencoders.

Action and perception

These schematics illustrate the partition of states into internal and hidden or external states that are separated by a Markov blanket – comprising sensory and active states. The lower panel shows this partition as it would be applied to action and perception in the brain; where active and internal states minimise a free energy functional of sensory states. The ensuing self-organisation of internal states then correspond perception, while action couples brain states back to external states. The upper panel shows exactly the same dependencies but rearranged so that the internal states are associated with the intracellular states of a cell, while the sensory states become the surface states of the cell membrane overlying active states (e.g., the actin filaments of the cytoskeleton). — **Figure 1:** These schematics illustrate the partition of states into the internal states $μ (t)$ and external (hidden, latent) states $ψ (t)$ that are separated by a Markov blanket – comprising sensory states $s (t)$ and active states $a (t)$ . The upper panel shows exactly the same dependencies but rearranged so that the internal states are associated with the intracellular states of a cell, while the sensory states become the surface states of the cell membrane overlying active states (e.g., the actin filaments of the cytoskeleton). The lower panel shows this partition as it would be applied to action and perception in the brain; where active and internal states minimise a free energy functional of sensory states. The ensuing self-organisation of internal states then correspond to perception, while action couples brain states back to external states.

Active inference applies the techniques of approximate Bayesian inference to infer the causes of sensory data from a 'generative' model of how that data is caused and then uses these inferences to guide action. Bayes' rule characterizes the probabilistically optimal inversion of such a causal model, but applying it is typically computationally intractable, leading to the use of approximate methods. In active inference, the leading class of such approximate methods are variational methods, for both practical and theoretical reasons: practical, as they often lead to simple inference procedures; and theoretical, because they are related to fundamental physical principles, as discussed above.

These variational methods proceed by minimizing an upper bound on the divergence between the Bayes-optimal inference (or 'posterior') and its approximation according to the method. This upper bound is known as the free energy, and we can accordingly characterize perception as the minimization of the free energy with respect to inbound sensory information, and action as the minimization of the same free energy with respect to outbound action information. This holistic dual optimization is characteristic of active inference, and the free energy principle is the hypothesis that all systems which perceive and act can be characterized in this way.

In order to exemplify the mechanics of active inference via the free energy principle, a generative model must be specified, and this typically involves a collection of probability density functions which together characterize the causal model. One such specification is as follows. The system is modelled as inhabiting a state space $X$ , in the sense that its states form the points of this space. The state space is then factorized according to $X = Ψ \times S \times A \times R$ , where $Ψ$ is the space of 'external' states that are 'hidden' from the agent (in the sense of not being directly perceived or accessible), $S$ is the space of sensory states that are directly perceived by the agent, $A$ is the space of the agent's possible actions, and $R$ is a space of 'internal' states that are private to the agent.

Keeping with the Figure 1, note that in the following the $\dot{ψ}, ψ, s, a$ and $μ$ are functions of (continuous) time $t$ . The generative model is the specification of the following density functions:

A sensory model, $p_{S} : S \times Ψ \times A \to R$ , often written as $p_{S} (s ∣ ψ, a)$ , characterizing the likelihood of sensory data given external states and actions;
a stochastic model of the environmental dynamics, $p_{Ψ} : Ψ \times Ψ \times A \to R$ , often written $p_{Ψ} (\dot{ψ} ∣ ψ, a)$ , characterizing how the external states are expected by the agent to evolve over time $t$ , given the agent's actions;
an action model, $p_{A} : A \times R \times S \to R$ , written $p_{A} (a ∣ μ, s)$ , characterizing how the agent's actions depend upon its internal states and sensory data; and
an internal model, $p_{R} : R \times S \to R$ , written $p_{R} (μ ∣ s)$ , characterizing how the agent's internal states depend upon its sensory data.

These density functions determine the factors of a "joint model", which represents the complete specification of the generative model, and which can be written as

p_{Bayes} (\dot{ψ}, s, a, μ ∣ ψ) = p_{S} (s ∣ ψ, a) p_{Ψ} (\dot{ψ} ∣ ψ, a) p_{A} (a ∣ μ, s) p_{R} (μ ∣ s)

Bayes' rule then determines the "posterior density" $p (\dot{ψ} | s, a, μ, ψ)$ , which expresses a probabilistically optimal belief about the external state $ψ$ given the preceding state and the agent's actions, sensory signals, and internal states. Since computing $p_{Bayes}$ is computationally intractable, the free energy principle asserts the existence of a "variational density" $q (\dot{ψ} | s, a, μ, ψ)$ , where $q$ is an approximation to $p_{Bayes}$ . One then defines the free energy as

\begin{aligned} \underset{f r e e - e n e r g y}{\underset{⏟}{F (μ, a; s)}} & = \underset{expected energy}{\underset{⏟}{E_{q (\dot{ψ})} [- \log p (\dot{ψ}, s, a, μ ∣ ψ)]}} - \underset{e n t r o p y}{\underset{⏟}{H [q (\dot{ψ} ∣ s, a, μ, ψ)]}} \\ = \underset{s u r p r i s e}{\underset{⏟}{- \log p (s)}} + \underset{d i v e r g e n c e}{\underset{⏟}{K L [q (\dot{ψ} ∣ s, a, μ, ψ) ∥ p_{Bayes} (\dot{ψ} ∣ s, a, μ, ψ)]}} \\ \geq \underset{s u r p r i s e}{\underset{⏟}{- \log p (s)}} \end{aligned}

and defines action and perception as the joint optimization problem

\begin{aligned} μ^{*} & = \underset{μ}{a r g m i n} {F (μ, a; s))} \\ a^{*} & = \underset{a}{a r g m i n} {F (μ^{*}, a; s)} \end{aligned}

where the internal states $μ$ are typically taken to encode the parameters of the 'variational' density $q$ and hence the agent's "best guess" about the posterior belief over $Ψ$ . Note that the free energy is also an upper bound on a measure of the agent's (marginal, or average) sensory surprise, and hence free energy minimization is often motivated by the minimization of surprise.

Free energy minimisation

Free energy minimisation and self-organisation

Free energy minimisation has been proposed as a hallmark of self-organising systems when cast as random dynamical systems. This formulation rests on a Markov blanket (comprising action and sensory states) that separates internal and external states. If internal states and action minimise free energy, then they place an upper bound on the entropy of sensory states:

lim_{T \to \infty} \frac{1}{T} \underset{free-action}{\underset{⏟}{\int_{0}^{T} F (s (t), μ (t)) d t}} \geq lim_{T \to \infty} \frac{1}{T} \int_{0}^{T} \underset{surprise}{\underset{⏟}{- \log p (s (t) ∣ m)}} d t = H [p (s ∣ m)]

This is because – under ergodic assumptions – the long-term average of surprise is entropy. This bound resists a natural tendency to disorder – of the sort associated with the second law of thermodynamics and the fluctuation theorem. However, formulating a unifying principle for the life sciences in terms of concepts from statistical physics, such as random dynamical system, non-equilibrium steady state and ergodicity, places substantial constraints on the theoretical and empirical study of biological systems with the risk of obscuring all features that make biological systems interesting kinds of self-organizing systems.

Free energy minimisation and Bayesian inference

All Bayesian inference can be cast in terms of free energy minimisation. When free energy is minimised with respect to internal states, the Kullback–Leibler divergence between the variational and posterior density over hidden states is minimised. This corresponds to approximate Bayesian inference – when the form of the variational density is fixed – and exact Bayesian inference otherwise. Free energy minimisation therefore provides a generic description of Bayesian inference and filtering (e.g., Kalman filtering). It is also used in Bayesian model selection, where free energy can be usefully decomposed into complexity and accuracy:

\underset{free-energy}{\underset{⏟}{F (s, μ)}} = \underset{complexity}{\underset{⏟}{D_{K L} [q (ψ ∣ μ) ∥ p (ψ ∣ m)]}} - \underset{a c c u r a c y}{\underset{⏟}{E_{q} [\log p (s ∣ ψ, m)]}}

Models with minimum free energy provide an accurate explanation of data, under complexity costs (c.f., Occam's razor and more formal treatments of computational costs). Here, complexity is the divergence between the variational density and prior beliefs about hidden states (i.e., the effective degrees of freedom used to explain the data).

Free energy minimisation and thermodynamics

Variational free energy is an information-theoretic functional and is distinct from thermodynamic (Helmholtz) free energy. However, the complexity term of variational free energy shares the same fixed point as Helmholtz free energy (under the assumption the system is thermodynamically closed but not isolated). This is because if sensory perturbations are suspended (for a suitably long period of time), complexity is minimised (because accuracy can be neglected). At this point, the system is at equilibrium and internal states minimise Helmholtz free energy, by the principle of minimum energy.

Free energy minimisation and information theory

Free energy minimisation is equivalent to maximising the mutual information between sensory states and internal states that parameterise the variational density (for a fixed entropy variational density). This relates free energy minimization to the principle of minimum redundancy.

Free energy minimisation in neuroscience

Free energy minimisation provides a useful way to formulate normative (Bayes optimal) models of neuronal inference and learning under uncertainty and therefore subscribes to the Bayesian brain hypothesis. The neuronal processes described by free energy minimisation depend on the nature of hidden states: $Ψ = X \times Θ \times Π$ that can comprise time-dependent variables, time-invariant parameters and the precision (inverse variance or temperature) of random fluctuations. Minimising variables, parameters, and precision correspond to inference, learning, and the encoding of uncertainty, respectively.

Perceptual inference and categorisation

Free energy minimisation formalises the notion of unconscious inference in perception and provides a normative (Bayesian) theory of neuronal processing. The associated process theory of neuronal dynamics is based on minimising free energy through gradient descent. This corresponds to generalised Bayesian filtering (where ~ denotes a variable in generalised coordinates of motion and $D$ is a derivative matrix operator):

\dot{\tilde{μ}} = D \tilde{μ} - \partial_{μ} F (s, μ) |_{μ = \tilde{μ}}

Usually, the generative models that define free energy are non-linear and hierarchical (like cortical hierarchies in the brain). Special cases of generalised filtering include Kalman filtering, which is formally equivalent to predictive coding – a popular metaphor for message passing in the brain. Under hierarchical models, predictive coding involves the recurrent exchange of ascending (bottom-up) prediction errors and descending (top-down) predictions that is consistent with the anatomy and physiology of sensory and motor systems.

Perceptual learning and memory

In predictive coding, optimising model parameters through a gradient descent on the time integral of free energy (free action) reduces to associative or Hebbian plasticity and is associated with synaptic plasticity in the brain.

Perceptual precision, attention and salience

Optimizing the precision parameters corresponds to optimizing the gain of prediction errors (c.f., Kalman gain). In neuronally plausible implementations of predictive coding, this corresponds to optimizing the excitability of superficial pyramidal cells and has been interpreted in terms of attentional gain.

Simulation of the results achieved from a selective attention task carried out by the Bayesian reformulation of the SAIM entitled PE-SAIM in multiple objects environment. The graphs show the time course of the activation for the FOA and the two template units in the Knowledge Network.

With regard to the top-down vs. bottom-up controversy, which has been addressed as a major open problem of attention, a computational model has succeeded in illustrating the circular nature of the interplay between top-down and bottom-up mechanisms. Using an established emergent model of attention, namely SAIM, the authors proposed a model called PE-SAIM, which, in contrast to the standard version, approaches selective attention from a top-down position. The model takes into account the transmission of prediction errors to the same level or a level above, in order to minimise the energy function that indicates the difference between the data and its cause, or, in other words, between the generative model and the posterior. To increase validity, they also incorporated neural competition between stimuli into their model. A notable feature of this model is the reformulation of the free energy function only in terms of prediction errors during task performance:

$\frac{\partial E^{t o t a l} (Y^{V P}, X^{S N}, x^{C N}, y^{K N})}{\partial y_{m n}^{S N}} = x_{m n}^{C N} - b^{C N} ε_{n m}^{C N} + b^{C N} \sum_{k} (ε_{k n m}^{K N})$

where $E^{t o t a l}$ is the total energy function of the neural networks entail, and $ε_{k n m}^{K N}$ is the prediction error between the generative model (prior) and posterior changing over time. Comparing the two models reveals a notable similarity between their respective results while also highlighting a remarkable discrepancy, whereby – in the standard version of the SAIM – the model's focus is mainly upon the excitatory connections, whereas in the PE-SAIM, the inhibitory connections are leveraged to make an inference. The model has also proved to be fit to predict the EEG and fMRI data drawn from human experiments with high precision. In the same vein, Yahya et al. also applied the free energy principle to propose a computational model for template matching in covert selective visual attention that mostly relies on SAIM. According to this study, the total free energy of the whole state-space is reached by inserting top-down signals in the original neural networks, whereby we derive a dynamical system comprising both feed-forward and backward prediction error.

Active inference

When gradient descent is applied to action $\dot{a} = - \partial_{a} F (s, \tilde{μ})$ , motor control can be understood in terms of classical reflex arcs that are engaged by descending (corticospinal) predictions. This provides a formalism that generalizes the equilibrium point solution – to the degrees of freedom problem – to movement trajectories.

Active inference and optimal control

Active inference is related to optimal control by replacing value or cost-to-go functions with prior beliefs about state transitions or flow. This exploits the close connection between Bayesian filtering and the solution to the Bellman equation. However, active inference starts with (priors over) flow $f = Γ \cdot \nabla V + \nabla \times W$ that are specified with scalar $V (x)$ and vector $W (x)$ value functions of state space (c.f., the Helmholtz decomposition). Here, $Γ$ is the amplitude of random fluctuations and cost is $c (x) = f \cdot \nabla V + \nabla \cdot Γ \cdot V$ . The priors over flow $p (\tilde{x} ∣ m)$ induce a prior over states $p (x ∣ m) = \exp (V (x))$ that is the solution to the appropriate forward Kolmogorov equations. In contrast, optimal control optimises the flow, given a cost function, under the assumption that $W = 0$ (i.e., the flow is curl free or has detailed balance). Usually, this entails solving backward Kolmogorov equations.

Active inference and optimal decision (game) theory

Optimal decision problems (usually formulated as partially observable Markov decision processes) are treated within active inference by absorbing utility functions into prior beliefs. In this setting, states that have a high utility (low cost) are states an agent expects to occupy. By equipping the generative model with hidden states that model control, policies (control sequences) that minimise variational free energy lead to high utility states.

Neurobiologically, neuromodulators such as dopamine are considered to report the precision of prediction errors by modulating the gain of principal cells encoding prediction error. This is closely related to – but formally distinct from – the role of dopamine in reporting prediction errors per se and related computational accounts.

Active inference and cognitive neuroscience

Active inference has been used to address a range of issues in cognitive neuroscience, brain function and neuropsychiatry, including action observation, mirror neurons, saccades and visual search, eye movements, sleep, illusions, attention, action selection, consciousness, hysteria and psychosis. Explanations of action in active inference often depend on the idea that the brain has 'stubborn predictions' that it cannot update, leading to actions that cause these predictions to come true.

A Medley of Potpourri

Search This Blog