The method consists of choosing a "trial wavefunction" depending on one or more parameters, and finding the values of these parameters for which the expectation value
of the energy is the lowest possible. The wavefunction obtained by
fixing the parameters to such values is then an approximation to the
ground state wavefunction, and the expectation value of the energy in
that state is an upper bound to the ground state energy. The Hartree–Fock method, Density matrix renormalization group, and Ritz method apply the variational method.
Once again ignoring complications involved with a continuous spectrum of , suppose the spectrum of is bounded from below and that its greatest lower bound is E0. The expectation value of in a state is then
If we were to vary over all possible states with norm 1 trying to minimize the expectation value of , the lowest value would be and the corresponding state would be the ground state, as well as an eigenstate of .
Varying over the entire Hilbert space is usually too complicated for
physical calculations, and a subspace of the entire Hilbert space is
chosen, parametrized by some (real) differentiable parameters αi (i = 1, 2, ..., N). The choice of the subspace is called the ansatz. Some choices of ansatzes lead to better approximations than others, therefore the choice of ansatz is important.
Let's assume there is some overlap between the ansatz and the ground state (otherwise, it's a bad ansatz). We wish to normalize the ansatz, so we have the constraints
and we wish to minimize
This, in general, is not an easy task, since we are looking for a global minimum and finding the zeroes of the partial derivatives of ε over all αi is not sufficient. If ψ(α) is expressed as a linear combination of other functions (αi being the coefficients), as in the Ritz method, there is only one minimum and the problem is straightforward. There are other, non-linear methods, however, such as the Hartree–Fock method, that are also not characterized by a multitude of minima and are therefore comfortable in calculations.
There is an additional complication in the calculations described. As ε tends toward E0
in minimization calculations, there is no guarantee that the
corresponding trial wavefunctions will tend to the actual wavefunction.
This has been demonstrated by calculations using a modified harmonic
oscillator as a model system, in which an exactly solvable system is
approached using the variational method. A wavefunction different from
the exact one is obtained by use of the method described above.
Although usually limited to calculations of the ground state
energy, this method can be applied in certain cases to calculations of
excited states as well. If the ground state wavefunction is known,
either by the method of variation or by direct calculation, a subset of
the Hilbert space can be chosen which is orthogonal to the ground state
wavefunction.
The resulting minimum is usually not as accurate as for the ground state, as any difference between the true ground state and results in a lower excited energy. This defect is worsened with each higher excited state.
In another formulation:
This holds for any trial φ since, by definition, the ground state
wavefunction has the lowest energy, and any trial wavefunction will
have energy greater than or equal to it.
Proof:
φ can be expanded as a linear
combination of the actual eigenfunctions of the Hamiltonian (which we
assume to be normalized and orthogonal):
Then, to find the expectation value of the Hamiltonian:
Now, the ground state energy is the lowest energy possible, i.e., . Therefore, if the guessed wave function φ is normalized:
In general
For a hamiltonian H that describes the studied system and any normalizable function Ψ with arguments appropriate for the unknown wave function of the system, we define the functional
The variational principle states that
, where is the lowest energy eigenstate (ground state) of the hamiltonian
if and only if is exactly equal to the wave function of the ground state of the studied system.
Another facet in variational principles in quantum mechanics is that since and
can be varied separately (a fact arising due to the complex nature of
the wave function), the quantities can be varied in principle just one
at a time.
Helium atom ground state
The helium atom consists of two electrons with mass m and electric charge −e, around an essentially fixed nucleus of mass M ≫ m and charge +2e. The Hamiltonian for it, neglecting the fine structure, is:
where ħ is the reduced Planck constant, ε0 is the vacuum permittivity, ri (for i = 1, 2) is the distance of the i-th electron from the nucleus, and |r1 − r2| is the distance between the two electrons.
If the term Vee = e2/(4πε0|r1 − r2|), representing the repulsion between the two electrons, were excluded, the Hamiltonian would become the sum of two hydrogen-like atom Hamiltonians with nuclear charge +2e. The ground state energy would then be 8E1 = −109 eV, where E1 is the Rydberg constant, and its ground state wavefunction would be the product of two wavefunctions for the ground state of hydrogen-like atoms:
where a0 is the Bohr radius and Z = 2, helium's nuclear charge. The expectation value of the total Hamiltonian H (including the term Vee) in the state described by ψ0 will be an upper bound for its ground state energy. ⟨Vee⟩ is −5E1/2 = 34 eV, so ⟨H⟩ is 8E1 − 5E1/2 = −75 eV.
A tighter upper bound can be found by using a better trial
wavefunction with 'tunable' parameters. Each electron can be thought to
see the nuclear charge partially "shielded" by the other electron, so we
can use a trial wavefunction equal with an "effective" nuclear charge Z < 2: The expectation value of H in this state is:
This is minimal for Z = 27/16 implying shielding reduces the effective charge to ~1.69. Substituting this value of Z into the expression for H yields 729E1/128 = −77.5 eV, within 2% of the experimental value, −78.975 eV.
Even closer estimations of this energy have been found using more
complicated trial wave functions with more parameters. This is done in
physical chemistry via variational Monte Carlo.
The Wisdom of Crowds: Why the Many Are Smarter Than the Few and
How Collective Wisdom Shapes Business, Economies, Societies and Nations, published in 2004, is a book written by James Surowiecki
about the aggregation of information in groups, resulting in decisions
that, he argues, are often better than could have been made by any
single member of the group. The book presents numerous case studies and anecdotes to illustrate its argument, and touches on several fields, primarily economics and psychology.
The opening anecdote relates Francis Galton's surprise that the crowd at a county fair accurately guessed the weight of an ox
when their individual guesses were averaged (the average was closer to
the ox's true butchered weight than the estimates of most crowd
members).
The book relates to diverse collections of independently deciding individuals, rather than crowd psychology
as traditionally understood. Its central thesis, that a diverse
collection of independently deciding individuals is likely to make
certain types of decisions and predictions better than individuals or
even experts, draws many parallels with statistical sampling; however, there is little overt discussion of statistics in the book.
Surowiecki breaks down the advantages he sees in disorganized decisions into three main types, which he classifies as
Cognition
Thinking and information processing, such as market judgment, which he argues can be much faster, more reliable, and less subject to political forces than the deliberations of experts or expert committees.
Coordination
Coordination of behavior includes optimizing the utilization of a
popular bar and not colliding in moving traffic flows. The book is
replete with examples from experimental economics, but this section relies more on naturally occurring experiments such as pedestrians optimizing the pavement flow or the extent of crowding in popular restaurants. He examines how common understanding within a culture allows remarkably accurate judgments about specific reactions of other members of the culture.
Cooperation
How groups of people can form networks of trust without a central system controlling their behavior or directly enforcing their compliance. This section is especially pro free market.
Five elements required to form a wise crowd
Not all crowds (groups) are wise. Consider, for example, mobs or crazed investors in a stock market bubble. According to Surowiecki, these key criteria separate wise crowds from irrational ones:
Each person should have private information even if it is just an eccentric interpretation of the known facts. (Chapter 2)
Independence
People's opinions are not determined by the opinions of those around them. (Chapter 3)
Decentralization
People are able to specialize and draw on local knowledge. (Chapter 4)
Aggregation
Some mechanism exists for turning private judgements into a collective decision. (Chapter 5)
Trust
Each person trusts the collective group to be fair. (Chapter 6)
Based on Surowiecki's book, Oinas-Kukkonen captures the wisdom of crowds approach with the following eight conjectures:
It is possible to describe how people in a group think as a whole.
In some cases, groups are remarkably intelligent and are often smarter than the smartest people in them.
The three conditions for a group to be intelligent are diversity, independence, and decentralization.
The best decisions are a product of disagreement and contest.
Too much communication can make the group as a whole less intelligent.
Information aggregation functionality is needed.
The right information needs to be delivered to the right people in the right place, at the right time, and in the right way.
There is no need to chase the expert.
Failures of crowd intelligence
Surowiecki studies situations (such as rational bubbles)
in which the crowd produces very bad judgment, and argues that in these
types of situations their cognition or cooperation failed because (in
one way or another) the members of the crowd were too conscious of the
opinions of others and began to emulate each other and conform rather
than think differently. Although he gives experimental details of crowds
collectively swayed by a persuasive speaker, he says that the main
reason that groups of people intellectually conform is that the system
for making decisions has a systematic flaw.
Causes and detailed case histories of such failures include:
Extreme
Description
Homogeneity
Surowiecki stresses the need for diversity within a crowd to ensure
enough variance in approach, thought process, and private information.
Centralization
The 2003 Space Shuttle Columbia disaster, which he blames on a hierarchical NASA management bureaucracy that was totally closed to the wisdom of low-level engineers.
Division
The United States Intelligence Community, the 9/11 Commission Report claims, failed to prevent the 11 September 2001 attacks partly because information held by one subdivision was not accessible by another. Surowiecki's argument is that crowds (of intelligenceanalysts in this case) work best when they choose for themselves what to work on and what information they need. (He cites the SARS-virus
isolation as an example in which the free flow of data enabled
laboratories around the world to coordinate research without a central
point of control.)
Where choices are visible and made in sequence, an "information cascade"
can form in which only the first few decision makers gain anything by
contemplating the choices available: once past decisions have become
sufficiently informative, it pays for later decision makers to simply
copy those around them. This can lead to fragile social outcomes.
At the 2005 O'ReillyEmerging Technology Conference Surowiecki presented a session entitled Independent Individuals and Wise Crowds, or Is It Possible to Be Too Connected?
The question for all of us is, how can you have interaction without information cascades, without losing the independence that's such a key factor in group intelligence?
He recommends:
Keep your ties loose.
Keep yourself exposed to as many diverse sources of information as possible.
Surowiecki is a strong advocate of the benefits of decision markets and regrets the failure of DARPA's controversial Policy Analysis Market
to get off the ground. He points to the success of public and internal
corporate markets as evidence that a collection of people with varying
points of view but the same motivation (to make a good guess) can
produce an accurate aggregate prediction. According to Surowiecki, the
aggregate predictions have been shown to be more reliable than the
output of any think tank. He advocates extensions of the existing futures markets even into areas such as terrorist activity and prediction markets within companies.
To illustrate this thesis, he says that his publisher can publish
a more compelling output by relying on individual authors under one-off
contracts bringing book ideas to them. In this way, they are able to
tap into the wisdom of a much larger crowd than would be possible with
an in-house writing team.
Will Hutton
has argued that Surowiecki's analysis applies to value judgments as
well as factual issues, with crowd decisions that "emerge of our own
aggregated free will [being] astonishingly... decent". He concludes that
"There's no better case for pluralism, diversity and democracy, along
with a genuinely independent press."
The most common application is the prediction market, a speculative
or betting market created to make verifiable predictions. Surowiecki
discusses the success of prediction markets. Similar to Delphi methods but unlike opinion polls,
prediction (information) markets ask questions like, "Who do you think
will win the election?" and predict outcomes rather well. Answers to the
question, "Who will you vote for?" are not as predictive.
Assets are cash values tied to specific outcomes (e.g., Candidate
X will win the election) or parameters (e.g., Next quarter's revenue).
The current market prices are interpreted as predictions of the
probability of the event or the expected value of the parameter. Betfair is the world's biggest prediction exchange, with around $28 billion traded in 2007. NewsFutures is an international prediction market that generates consensus probabilities for news events. Intrade.com,
which operated a person to person prediction market based in Dublin
Ireland achieved very high media attention in 2012 related to the US
Presidential Elections, with more than 1.5 million search references to
Intrade and Intrade data. Several companies now offer enterprise class
prediction marketplaces to predict project completion dates, sales, or
the market potential for new ideas.
A number of Web-based quasi-prediction marketplace companies have
sprung up to offer predictions primarily on sporting events and stock
markets but also on other topics. The principle of the prediction market
is also used in project management software to let team members predict a project's "real" deadline and budget.
The Delphi method is a systematic, interactive forecasting
method which relies on a panel of independent experts. The carefully
selected experts answer questionnaires in two or more rounds. After each
round, a facilitator provides an anonymous summary of the experts'
forecasts from the previous round as well as the reasons they provided
for their judgments. Thus, participants are encouraged to revise their
earlier answers in light of the replies of other members of the group.
It is believed that during this process the range of the answers will
decrease and the group will converge towards the "correct" answer. Many
of the consensus forecasts have proven to be more accurate than
forecasts made by individuals.
Human Swarming
Designed
as an optimized method for unleashing the wisdom of crowds, this
approach implements real-time feedback loops around synchronous groups
of users with the goal of achieving more accurate insights from fewer
numbers of users. Human Swarming (sometimes referred to as Social
Swarming) is modeled after biological processes in birds, fish, and
insects, and is enabled among networked users by using mediating
software such as the UNU
collective intelligence platform. As published by Rosenberg (2015),
such real-time control systems enable groups of human participants to
behave as a unified collective intelligence.
When logged into the UNU platform, for example, groups of distributed
users can collectively answer questions, generate ideas, and make
predictions as a singular emergent entity. Early testing shows that human swarms can out-predict individuals across a variety of real-world projections.
Illusionist Derren Brown claimed to use the 'Wisdom of Crowds' concept to explain how he correctly predicted the UK National Lottery
results in September 2009. His explanation was met with criticism
on-line, by people who argued that the concept was misapplied.
The methodology employed was too flawed; the sample of people could not
have been totally objective and free in thought, because they were
gathered multiple times and socialised with each other too much; a
condition Surowiecki tells us is corrosive to pure independence and the
diversity of mind required (Surowiecki 2004:38). Groups thus fall into groupthink where they increasingly make decisions based on influence of each other and are thus less
accurate. However, other commentators have suggested that, given the
entertainment nature of the show, Brown's misapplication of the theory
may have been a deliberate smokescreen to conceal his true method.
This was also shown in the television series East of Eden where a
social network of roughly 10,000 individuals came up with ideas to stop
missiles in a very short span of time.
Wisdom of Crowds would have a significant influence on the naming of the crowdsourcing creative company Tongal, which is an anagram for Galton, the last name of the social-scientist highlighted in the introduction to Surowiecki's book. Francis Galton recognized the ability of a crowd's averaged weight-guesses for oxen to exceed the accuracy of experts.
Criticism
In his book Embracing the Wide Sky, Daniel Tammet
finds fault with this notion. Tammet points out the potential for
problems in systems which have poorly defined means of pooling
knowledge: Subject matter experts can be overruled and even wrongly
punished by less knowledgeable persons in crowd sourced systems, citing a
case of this on Wikipedia. Furthermore, Tammet mentions the assessment
of the accuracy of Wikipedia as described in a study mentioned in Nature
in 2005, outlining several flaws in the study's methodology which
included that the study made no distinction between minor errors and
large errors.
Tammet also cites the Kasparov versus the World,
an online competition that pitted the brainpower of tens of thousands
of online chess players choosing moves in a match against Garry Kasparov,
which was won by Kasparov, not the "crowd". Although Kasparov did say,
"It is the greatest game in the history of chess. The sheer number of
ideas, the complexity, and the contribution it has made to chess make it
the most important game ever played."
In his book You Are Not a Gadget, Jaron Lanier
argues that crowd wisdom is best suited for problems that involve
optimization, but ill-suited for problems that require creativity or
innovation. In the online article Digital Maoism, Lanier argues that the collective is more likely to be smart only when
1. it is not defining its own questions,
2. the goodness of an answer can be evaluated by a simple result (such as a single numeric value), and
3. the information system which informs the collective is filtered
by a quality control mechanism that relies on individuals to a high
degree.
Lanier argues that only under those circumstances can a collective be
smarter than a person. If any of these conditions are broken, the
collective becomes unreliable or worse.
Iain Couzin, a professor in Princeton's Department of Ecology and Evolutionary Biology, and Albert Kao, his student, in a 2014 article,
in the journal Proceedings of the Royal Society, argue that "the
conventional view of the wisdom of crowds may not be informative in
complex and realistic environments, and that being in small groups can
maximize decision accuracy across many contexts." By "small groups,"
Couzin and Kao mean fewer than a dozen people.
They conclude and say that “the decisions of very large groups may be
highly accurate when the information used is independently sampled, but
they are particularly susceptible to the negative effects of correlated
information, even when only a minority of the group uses such
information.”
In biophysics and cognitive science, the free energy principle is a mathematical principle describing a formal
account of the representational capacities of physical systems: that
is, why things that exist look as if they track properties of the
systems to which they are coupled.
The free energy principle models the behaviour of systems that
are distinct from, but coupled to, another system (e.g., an embedding
environment), where the degrees of freedom that implement the interface
between the two systems is known as a Markov blanket.
More formally, the free energy principle says that if a system has a
"particular partition" (i.e., into particles, with their Markov
blankets), then subsets of that system will track the statistical
structure of other subsets (which are known as internal and external
states or paths of a system).
The free energy principle is based on the Bayesian idea of the brain as an “inference engine.” Under the free energy principle, systems pursue paths of least surprise, or equivalently, minimize the difference between predictions based on their model of the world and their sense and associated perception.
This difference is quantified by variational free energy and is
minimized by continuous correction of the world model of the system, or
by making the world more like the predictions of the system. By actively
changing the world to make it closer to the expected state, systems can
also minimize the free energy of the system. Friston assumes this to be
the principle of all biological reaction. Friston also believes his principle applies to mental disorders as well as to artificial intelligence. AI implementations based on the active inference principle have shown advantages over other methods.
The free energy principle is a mathematical principle of
information physics: much like the principle of maximum entropy or the
principle of least action, it is true on mathematical grounds. To
attempt to falsify the free energy principle is a category mistake, akin
to trying to falsify calculus
by making empirical observations. (One cannot invalidate a mathematical
theory in this way; instead, one would need to derive a formal
contradiction from the theory.) In a 2018 interview, Friston explained
what it entails for the free energy principle to not be subject to falsification:
"I think it is useful to make a fundamental distinction at this
point—that we can appeal to later. The distinction is between a state
and process theory; i.e., the difference between a normative principle
that things may or may not conform to, and a process theory or
hypothesis about how that principle is realized. Under this distinction,
the free energy principle stands in stark distinction to things like predictive coding and the Bayesian brain hypothesis. This is because the free energy principle is what it is — a principle. Like Hamilton's principle of stationary action,
it cannot be falsified. It cannot be disproven. In fact, there’s not
much you can do with it, unless you ask whether measurable systems
conform to the principle. On the other hand, hypotheses that the brain
performs some form of Bayesian inference or predictive coding are what
they are—hypotheses. These hypotheses may or may not be supported by
empirical evidence." There are many examples of these hypotheses being supported by empirical evidence.
Background
The notion that self-organising biological systems – like a cell or brain – can be understood as minimising variational free energy is based upon Helmholtz’s work on unconscious inference and subsequent treatments in psychology and machine learning. Variational free energy is a function of observations and a probability density over their hidden causes. This variational
density is defined in relation to a probabilistic model that generates
predicted observations from hypothesized causes. In this setting, free
energy provides an approximation to Bayesian model evidence.
Therefore, its minimisation can be seen as a Bayesian inference
process. When a system actively makes observations to minimise free
energy, it implicitly performs active inference and maximises the
evidence for its model of the world.
However, free energy is also an upper bound on the self-information of outcomes, where the long-term average of surprise
is entropy. This means that if a system acts to minimise free energy,
it will implicitly place an upper bound on the entropy of the outcomes –
or sensory states – it samples.
Relationship to other theories
Active inference is closely related to the good regulator theorem and related accounts of self-organisation, such as self-assembly, pattern formation, autopoiesis and practopoiesis. It addresses the themes considered in cybernetics, synergetics and embodied cognition.
Because free energy can be expressed as the expected energy of
observations under the variational density minus its entropy, it is also
related to the maximum entropy principle. Finally, because the time average of energy is action, the principle of minimum variational free energy is a principle of least action.
Active inference allowing for scale invariance has also been applied to
other theories and domains. For instance, it has been applied to
sociology, linguistics and communication, semiotics, and epidemiology among others.
Active inference applies the techniques of approximate Bayesian inference to infer the causes of sensory data from a 'generative' model of how that data is caused and then uses these inferences to guide action.
Bayes' rule
characterizes the probabilistically optimal inversion of such a causal
model, but applying it is typically computationally intractable, leading
to the use of approximate methods.
In active inference, the leading class of such approximate methods are variational methods,
for both practical and theoretical reasons: practical, as they often
lead to simple inference procedures; and theoretical, because they are
related to fundamental physical principles, as discussed above.
These variational methods proceed by minimizing an upper bound on the divergence between the Bayes-optimal inference (or 'posterior') and its approximation according to the method.
This upper bound is known as the free energy, and we can
accordingly characterize perception as the minimization of the free
energy with respect to inbound sensory information, and action as the
minimization of the same free energy with respect to outbound action
information.
This holistic dual optimization is characteristic of active inference,
and the free energy principle is the hypothesis that all systems which
perceive and act can be characterized in this way.
In order to exemplify the mechanics of active inference via the
free energy principle, a generative model must be specified, and this
typically involves a collection of probability density functions which together characterize the causal model.
One such specification is as follows.
The system is modelled as inhabiting a state space , in the sense that its states form the points of this space.
The state space is then factorized according to , where
is the space of 'external' states that are 'hidden' from the agent (in
the sense of not being directly perceived or accessible), is the space of sensory states that are directly perceived by the agent, is the space of the agent's possible actions, and is a space of 'internal' states that are private to the agent.
Keeping with the Figure 1, note that in the following the and are functions of (continuous) time . The generative model is the specification of the following density functions:
A sensory model, , often written as , characterizing the likelihood of sensory data given external states and actions;
a stochastic model of the environmental dynamics, , often written , characterizing how the external states are expected by the agent to evolve over time , given the agent's actions;
an action model, , written , characterizing how the agent's actions depend upon its internal states and sensory data; and
an internal model, , written , characterizing how the agent's internal states depend upon its sensory data.
These density functions determine the factors of a "joint model", which represents the complete specification of the generative model, and which can be written as
.
Bayes' rule then determines the "posterior density" , which expresses a probabilistically optimal belief about the external state given the preceding state and the agent's actions, sensory signals, and internal states.
Since computing is computationally intractable, the free energy principle asserts the existence of a "variational density" , where is an approximation to .
One then defines the free energy as
and defines action and perception as the joint optimization problem
where the internal states are typically taken to encode the parameters of the 'variational' density and hence the agent's "best guess" about the posterior belief over .
Note that the free energy is also an upper bound on a measure of the agent's (marginal, or average) sensory surprise, and hence free energy minimization is often motivated by the minimization of surprise.
Free energy minimisation
Free energy minimisation and self-organisation
Free energy minimisation has been proposed as a hallmark of self-organising systems when cast as random dynamical systems. This formulation rests on a Markov blanket
(comprising action and sensory states) that separates internal and
external states. If internal states and action minimise free energy,
then they place an upper bound on the entropy of sensory states:
This is because – under ergodic
assumptions – the long-term average of surprise is entropy. This bound
resists a natural tendency to disorder – of the sort associated with the
second law of thermodynamics and the fluctuation theorem.
However, formulating a unifying principle for the life sciences in
terms of concepts from statistical physics, such as random dynamical
system, non-equilibrium steady state and ergodicity, places substantial
constraints on the theoretical and empirical study of biological systems
with the risk of obscuring all features that make biological systems
interesting kinds of self-organizing systems.
Free energy minimisation and Bayesian inference
All Bayesian inference can be cast in terms of free energy minimisation. When free energy is minimised with respect to internal states, the Kullback–Leibler divergence between the variational and posterior density over hidden states is minimised. This corresponds to approximate Bayesian inference
– when the form of the variational density is fixed – and exact
Bayesian inference otherwise. Free energy minimisation therefore
provides a generic description of Bayesian inference and filtering
(e.g., Kalman filtering). It is also used in Bayesian model selection, where free energy can be usefully decomposed into complexity and accuracy:
Models with minimum free energy provide an accurate explanation of data, under complexity costs (c.f., Occam's razor and more formal treatments of computational costs).
Here, complexity is the divergence between the variational density and
prior beliefs about hidden states (i.e., the effective degrees of
freedom used to explain the data).
Free energy minimisation and thermodynamics
Variational free energy is an information-theoretic functional and is distinct from thermodynamic (Helmholtz) free energy.
However, the complexity term of variational free energy shares the same
fixed point as Helmholtz free energy (under the assumption the system
is thermodynamically closed but not isolated). This is because if
sensory perturbations are suspended (for a suitably long period of
time), complexity is minimised (because accuracy can be neglected). At
this point, the system is at equilibrium and internal states minimise
Helmholtz free energy, by the principle of minimum energy.
Free energy minimisation and information theory
Free energy minimisation is equivalent to maximising the mutual information
between sensory states and internal states that parameterise the
variational density (for a fixed entropy variational density). This
relates free energy minimization to the principle of minimum redundancy.
Free energy minimisation in neuroscience
Free
energy minimisation provides a useful way to formulate normative (Bayes
optimal) models of neuronal inference and learning under uncertainty and therefore subscribes to the Bayesian brain hypothesis. The neuronal processes described by free energy minimisation depend on the nature of hidden states:
that can comprise time-dependent variables, time-invariant parameters
and the precision (inverse variance or temperature) of random
fluctuations. Minimising variables, parameters, and precision correspond
to inference, learning, and the encoding of uncertainty, respectively.
Perceptual inference and categorisation
Free energy minimisation formalises the notion of unconscious inference in perception
and provides a normative (Bayesian) theory of neuronal processing. The
associated process theory of neuronal dynamics is based on minimising
free energy through gradient descent. This corresponds to generalised Bayesian filtering (where ~ denotes a variable in generalised coordinates of motion and is a derivative matrix operator):
Usually, the generative models that define free energy are non-linear
and hierarchical (like cortical hierarchies in the brain). Special
cases of generalised filtering include Kalman filtering, which is formally equivalent to predictive coding
– a popular metaphor for message passing in the brain. Under
hierarchical models, predictive coding involves the recurrent exchange
of ascending (bottom-up) prediction errors and descending (top-down)
predictions that is consistent with the anatomy and physiology of sensory and motor systems.
Perceptual learning and memory
In
predictive coding, optimising model parameters through a gradient
descent on the time integral of free energy (free action) reduces to
associative or Hebbian plasticity and is associated with synaptic plasticity in the brain.
Perceptual precision, attention and salience
Optimizing
the precision parameters corresponds to optimizing the gain of
prediction errors (c.f., Kalman gain). In neuronally plausible
implementations of predictive coding,
this corresponds to optimizing the excitability of superficial
pyramidal cells and has been interpreted in terms of attentional gain.
With regard to the top-down vs. bottom-up controversy, which has been
addressed as a major open problem of attention, a computational model
has succeeded in illustrating the circular nature of the interplay
between top-down and bottom-up mechanisms. Using an established emergent
model of attention, namely SAIM, the authors proposed a model called
PE-SAIM, which, in contrast to the standard version, approaches
selective attention from a top-down position. The model takes into
account the transmission of prediction errors to the same level or a
level above, in order to minimise the energy function that indicates the
difference between the data and its cause, or, in other words, between
the generative model and the posterior. To increase validity, they also
incorporated neural competition between stimuli into their model. A
notable feature of this model is the reformulation of the free energy
function only in terms of prediction errors during task performance:
where is the total energy function of the neural networks entail, and is the prediction error between the generative model (prior) and posterior changing over time.
Comparing the two models reveals a notable similarity between their
respective results while also highlighting a remarkable discrepancy,
whereby – in the standard version of the SAIM – the model's focus is
mainly upon the excitatory connections, whereas in the PE-SAIM, the
inhibitory connections are leveraged to make an inference. The model has
also proved to be fit to predict the EEG and fMRI data drawn from human
experiments with high precision. In the same vein, Yahya et al. also
applied the free energy principle to propose a computational model for
template matching in covert selective visual attention that mostly
relies on SAIM.
According to this study, the total free energy of the whole state-space
is reached by inserting top-down signals in the original neural
networks, whereby we derive a dynamical system comprising both
feed-forward and backward prediction error.
Active inference
When gradient descent is applied to action ,
motor control can be understood in terms of classical reflex arcs that
are engaged by descending (corticospinal) predictions. This provides a
formalism that generalizes the equilibrium point solution – to the degrees of freedom problem – to movement trajectories.
Active inference and optimal control
Active inference is related to optimal control by replacing value or cost-to-go functions with prior beliefs about state transitions or flow. This exploits the close connection between Bayesian filtering and the solution to the Bellman equation. However, active inference starts with (priors over) flow that are specified with scalar and vector value functions of state space (c.f., the Helmholtz decomposition). Here, is the amplitude of random fluctuations and cost is . The priors over flow induce a prior over states that is the solution to the appropriate forward Kolmogorov equations. In contrast, optimal control optimises the flow, given a cost function, under the assumption that (i.e., the flow is curl free or has detailed balance). Usually, this entails solving backward Kolmogorov equations.
Active inference and optimal decision (game) theory
Optimal decision problems (usually formulated as partially observable Markov decision processes) are treated within active inference by absorbing utility functions
into prior beliefs. In this setting, states that have a high utility
(low cost) are states an agent expects to occupy. By equipping the
generative model with hidden states that model control, policies
(control sequences) that minimise variational free energy lead to high
utility states.
Neurobiologically, neuromodulators such as dopamine
are considered to report the precision of prediction errors by
modulating the gain of principal cells encoding prediction error. This is closely related to – but formally distinct from – the role of dopamine in reporting prediction errors per se and related computational accounts.
Active inference and cognitive neuroscience
Active inference has been used to address a range of issues in cognitive neuroscience, brain function and neuropsychiatry, including action observation, mirror neurons, saccades and visual search, eye movements, sleep, illusions, attention, action selection, consciousness, hysteria and psychosis.
Explanations of action in active inference often depend on the idea
that the brain has 'stubborn predictions' that it cannot update, leading
to actions that cause these predictions to come true.