A Medley of Potpourri

Thursday, November 15, 2018

Automata-based programming

From Wikipedia, the free encyclopedia

Automata-based programming is a programming paradigm in which the program or part of it is thought of as a model of a finite state machine (FSM) or any other (often more complicated) formal automaton. Sometimes a potentially infinite set of possible states is introduced, and such a set can have a complicated structure, not just an enumeration.

FSM-based programming is generally the same, but, formally speaking, doesn't cover all possible variants, as FSM stands for finite state machine, and automata-based programming doesn't necessarily employ FSMs in the strict sense.

The following properties are key indicators for automata-based programming:

The time period of the program's execution is clearly separated down to the steps of the automaton. Each of the steps is effectively an execution of a code section (same for all the steps), which has a single entry point. Such a section can be a function or other routine, or just a cycle body. The step section might be divided down to subsections to be executed depending on different states, although this is not necessary.
Any communication between the steps is only possible via the explicitly noted set of variables named the state. Between any two steps, the program (or its part created using the automata-based technique) can not have implicit components of its state, such as local (stack) variables' values, return addresses, the current instruction pointer, etc. That is, the state of the whole program, taken at any two moments of entering the step of the automaton, can only differ in the values of the variables being considered as the state of the automaton.

The whole execution of the automata-based code is a (possibly explicit) cycle of the automaton's steps.

Another reason for using the notion of automata-based programming is that the programmer's style of thinking about the program in this technique is very similar to the style of thinking used to solve mathematical tasks using Turing machines, Markov algorithms, etc.

Example

Consider a program in C that reads a text from standard input stream, line by line, and prints the first word of each line. It is clear we need first to read and skip the leading spaces, if any, then read characters of the first word and print them until the word ends, and then read and skip all the remaining characters until the end-of-line character is encountered. Upon reaching the end of line character (regardless of the stage), we restart the algorithm from the beginning, and upon encountering the end of file condition (regardless of the stage), we terminate the program.

Traditional (imperative) program in C

The program which solves the example task in traditional (imperative) style can look something like this:

#include 
#include 
int main(void)
{
    int c;
    do {
        do {
            c = getchar();
        } while(c == ' ');
        while(c != EOF && !isspace(c) && c != '\n') {
            putchar(c);
            c = getchar();
        }
        putchar('\n');
        while(c != EOF && c != '\n')
            c = getchar();
    } while(c != EOF);
    return 0;
}

Automata-based style program

The same task can be solved by thinking in terms of finite state machines. Note that line parsing has three stages: skipping the leading spaces, printing the word and skipping the trailing characters. Let's call them states before, inside and after. The program may now look like this:

#include 
#include 
int main(void)
{
    enum states {
        before, inside, after
    } state;
    int c;
    state = before;
    while((c = getchar()) != EOF) {
        switch(state) {
            case before:
                if(c != ' ') {
                    putchar(c);
                    if(c != '\n')
                        state = inside;
                }
                break; 
            case inside:
                if(!isspace(c))
                    putchar(c);
                else {
                    putchar('\n');
                    if(c == '\n')
                        state = before;
                    else
                        state = after;
                }
                break;
            case after:
                if(c == '\n')
                    state = before;
        }
    }
    return 0;
}

Although the code now looks longer, it has at least one significant advantage: there's only one reading (that is, call to the getchar() function) instruction in the program. Besides that, there's only one loop instead of the four the previous versions had.

In this program, the body of the while loop is the automaton step, and the loop itself is the cycle of the automaton's work.

The program implements (models) the work of a finite state machine shown on the picture. The N denotes the end of line character, the S denotes spaces, and the A stands for all the other characters. The automaton follows exactly one arrow on each step depending on the current state and the encountered character. Some state switches are accompanied with printing the character; such arrows are marked with asterisks.

It is not absolutely necessary to divide the code down to separate handlers for each unique state. Furthermore, in some cases the very notion of the state can be composed of several variables' values, so that it could be impossible to handle each possible state explicitly. In the discussed program it is possible to reduce the code length by noticing that the actions taken in response to the end of line character are the same for all the possible states. The following program is equal to the previous one but is a bit shorter:

#include 
#include 
int main(void)
{
    enum states {
        before, inside, after
    } state;
    int c;
    state = before;
    while((c = getchar()) != EOF) {
        if(c == '\n') {
            putchar('\n');
            state = before;
        } else
        switch(state) {
            case before:
                if(c != ' ') {
                    putchar(c);
                    state = inside;
                }
                break;
            case inside:
                if(c == ' ') {
                    state = after;
                } else {
                    putchar(c);
                }
                break;
            case after:
                break;
        }
    }
    if(state != before)
        putchar('\n');
    return 0;
}

A separate function for the automation step

The most important property of the previous program is that the automaton step code section is clearly localized. With a separate function for it, we can better demonstrate this property:

#include 
enum states { before, inside, after };
void step(enum states *state, int c)
{
    if(c == '\n') {
        putchar('\n');
        *state = before;
    } else
    switch(*state) {
        case before:
            if(c != ' ') {
                putchar(c);
                *state = inside;
            }
            break;
        case inside:
            if(c == ' ') {
                *state = after;
            } else {
                putchar(c);
            }
            break;
        case after:
            break;
    }
} 
int main(void)
{
    int c;
    enum states state = before;
    while((c = getchar()) != EOF) {
        step(&state, c);
    }
    if(state != before)
        putchar('\n');
    return 0;
}

This example clearly demonstrates the basic properties of automata-based code:

time periods of automaton step executions may not overlap
the only information passed from the previous step to the next is the explicitly specified automaton state

Explicit state transition table

A finite automaton can be defined by an explicit state transition table. Generally speaking, an automata-based program code can naturally reflect this approach. In the program below there's an array named the_table, which defines the table. The rows of the table stand for three states, while columns reflect the input characters (first for spaces, second for the end of line character, and the last is for all the other characters).

For every possible combination, the table contains the new state number and the flag, which determines whether the automaton must print the symbol. In a real life task, this could be more complicated; e.g., the table could contain pointers to functions to be called on every possible combination of conditions.

#include 
enum states { before = 0, inside = 1, after = 2 };
struct branch {
    unsigned char new_state:2; //BIT [1:0]
    unsigned char should_putchar:1; //BIT [2]
};
struct branch the_table[3][3] = {
                 /* ' '         '\n'        others */
    /* before */ { {before,0}, {before,1}, {inside,1} },
    /* inside */ { {after, 1}, {before,1}, {inside,1} },
    /* after  */ { {after, 0}, {before,1}, {after, 0} }
};
void step(enum states *state, int c)
{
    int idx2 = (c == ' ') ? 0 : (c == '\n') ? 1 : 2;
    struct branch *b = & the_table[*state][idx2];
    *state = (enum states)(b->new_state);
    if(b->should_putchar) putchar(c);
}

Automation and Automata

Automata-based programming indeed closely matches the programming needs found in the field of automation.

A production cycle is commonly modeled as:

A sequence of stages stepping according to input data (from captors).
A set of actions performed depending on the current stage.

Various dedicated programming languages allow expressing such a model in more or less sophisticated ways.

Example Program

The example presented above could be expressed according to this view like in the following program. Here pseudo-code uses such conventions:

'set' and 'reset' respectively activate & inactivate a logic variable (here a stage)
':' is assignment, '=' is equality test

SPC : ' '
EOL : '\n'

states : (before, inside, after, end, endplusnl)

setState(c) {
    if c=EOF then if inside or after then set endplusnl else set end
    if before and (c!=SPC and c!=EOL) then set inside
    if inside and (c=SPC or c=EOL) then set after
    if after and c=EOL then set before
}

doAction(c) {
    if inside then write(c)
    else if c=EOL or endplusnl then write(EOL)
}

cycle {
    set before
    loop {
        c : readCharacter
        setState(c)
        doAction(c)
    }
    until end or endplusnl
}

The separation of routines expressing cycle progression on one side, and actual action on the other (matching input & output) allows clearer and simpler code.

Automation & Events

In the field of automation, stepping from step to step depends on input data coming from the machine itself. This is represented in the program by reading characters from a text. In reality, those data inform about position, speed, temperature, etc. of critical elements of a machine.

Like in GUI programming, changes in the machine state can thus be considered as events causing the passage from a state to another, until the final one is reached. The combination of possible states can generate a wide variety of events, thus defining a more complex production cycle. As a consequence, cycles are usually far to be simple linear sequences. There are commonly parallel branches running together and alternatives selected according to different events, schematically represented below:

   s:stage   c:condition
   
   s1
   |
   |-c2
   |
   s2
   |
   ----------
   |        |
   |-c31    |-c32
   |        |
  s31       s32
   |        |
   |-c41    |-c42
   |        |
   ----------
   |
   s4

Using object-oriented capabilities

If the implementation language supports object-oriented programming, a simple refactoring is to encapsulate the automaton into an object, thus hiding its implementation details. For example, an object-oriented version in C++ of the same program is below. A more sophisticated refactoring could employ the State pattern.

#include 
class StateMachine {
    enum states { before = 0, inside = 1, after = 2 } state;
    struct branch {
        unsigned char new_state:2;
        unsigned char should_putchar:1;
    };
    static struct branch the_table[3][3];
public:
    StateMachine() : state(before) {}
    void FeedChar(int c) {
        int idx2 = (c == ' ') ? 0 : (c == '\n') ? 1 : 2;
        struct branch *b = & the_table[state][idx2];
        state = (enum states)(b->new_state);
        if(b->should_putchar) putchar(c);
    }
};
struct StateMachine::branch StateMachine::the_table[3][3] = {
                 /* ' '         '\n'        others */
    /* before */ { {before,0}, {before,1}, {inside,1} },
    /* inside */ { {after, 0}, {before,1}, {inside,1} },
    /* after  */ { {after, 0}, {before,1}, {after, 0} }
};
int main(void)
{
    int c;
    StateMachine machine;
    while((c = getchar()) != EOF)
        machine.FeedChar(c);
    return 0;
}

Note: To minimize changes not directly related to the subject of the article, the input/output functions from the standard library of C are being used. Note the use of the ternary operator, which could also be implemented as if-else.

Applications

Automata-based programming is widely used in lexical and syntactic analyses.

Besides that, thinking in terms of automata (that is, breaking the execution process down to automaton steps and passing information from step to step through the explicit state) is necessary for event-driven programming as the only alternative to using parallel processes or threads.

The notions of states and state machines are often used in the field of formal specification. For instance, UML-based software architecture development uses state diagrams to specify the behaviour of the program. Also various communication protocols are often specified using the explicit notion of state.

Thinking in terms of automata (steps and states) can also be used to describe semantics of some programming languages. For example, the execution of a program written in the Refal language is described as a sequence of steps of a so-called abstract Refal machine; the state of the machine is a view (an arbitrary Refal expression without variables).

Continuations in the Scheme language require thinking in terms of steps and states, although Scheme itself is in no way automata-related (it is recursive). To make it possible the call/cc feature to work, implementation needs to be able to catch a whole state of the executing program, which is only possible when there's no implicit part in the state. Such a caught state is the very thing called continuation, and it can be considered as the state of a (relatively complicated) automaton. The step of the automaton is deducing the next continuation from the previous one, and the execution process is the cycle of such steps.

Alexander Ollongren in his book explains the so-called Vienna method of programming languages semantics description which is fully based on formal automata.

The STAT system is a good example of using the automata-based approach; this system, besides other features, includes an embedded language called STATL which is purely automata-oriented.

History

Automata-based techniques were used widely in the domains where there are algorithms based on automata theory, such as formal language analyses.

One of the early papers on this is by Johnson et al., 1968.

One of the earliest mentions of automata-based programming as a general technique is found in the paper by Peter Naur, 1963. The author calls the technique Turing machine approach, however no real Turing machine is given in the paper; instead, the technique based on states and steps is described.

Compared against imperative and procedural programming

The notion of state is not exclusive property of automata-based programming. Generally speaking, state (or program state) appears during execution of any computer program, as a combination of all information that can change during the execution. For instance, a state of a traditional imperative program consists of

values of all variables and the information stored within dynamic memory
values stored in registers
stack contents (including local variables' values and return addresses)
current value of the instruction pointer

These can be divided to the explicit part (such as values stored in variables) and the implicit part (return addresses and the instruction pointer).

Having said this, an automata-based program can be considered as a special case of an imperative program, in which implicit part of the state is minimized. The state of the whole program taken at the two distinct moments of entering the step code section can differ in the automaton state only. This simplifies the analysis of the program.

Object-oriented programming relationship

In the theory of object-oriented programming an object is said to have an internal state and is capable of receiving messages, responding to them, sending messages to other objects and changing the internal state during message handling. In more practical terminology, to call an object's method is considered the same as to send a message to the object.

Thus, on the one hand, objects from object-oriented programming can be considered as automata (or models of automata) whose state is the combination of internal fields, and one or more methods are considered to be the step. Such methods must not call each other nor themselves, neither directly nor indirectly, otherwise the object can not be considered to be implemented in an automata-based manner.

On the other hand, object is good for implementing a model of an automaton. When the automata-based approach is used within an object-oriented language, an automaton model is usually implemented by a class, the state is represented with internal (private) fields of the class, and the step is implemented as a method; such a method is usually the only non-constant public method of the class (besides constructors and destructors). Other public methods could query the state but don't change it. All the secondary methods (such as particular state handlers) are usually hidden within the private part of the class.

Automata theory

From Wikipedia, the free encyclopedia

Classes of automata

(Clicking on each layer will take you to an article on that subject)

The study of the mathematical properties of such automata is automata theory. The picture is a visualization of an automaton that recognizes strings containing an even number of 0s. The automaton starts in state S1, and transitions to the non-accepting state S2 upon reading the symbol 0. Reading another 0 causes the automaton to transition back to the accepting state S1. In both states the symbol 1 is ignored by making a transition to the current state.

Automata theory is the study of abstract machines and automata, as well as the computational problems that can be solved using them. It is a theory in theoretical computer science and discrete mathematics (a subject of study in both mathematics and computer science). The word automata (the plural of automaton) comes from the Greek word αὐτόματα, which means "self-acting".

The figure above illustrates a finite-state machine, which belongs to a well-known type of automaton. This automaton consists of states (represented in the figure by circles) and transitions (represented by arrows). As the automaton sees a symbol of input, it makes a transition (or jump) to another state, according to its transition function, which takes the current state and the recent symbol as its inputs.

Automata theory is closely related to formal language theory. An automaton is a finite representation of a formal language that may be an infinite set. Automata are often classified by the class of formal languages they can recognize, typically illustrated by the Chomsky hierarchy, which describes the relations between various languages and kinds of formalized logic.

Automata play a major role in theory of computation, compiler construction, artificial intelligence, parsing and formal verification.

Automata

Following is an introductory definition of one type of automaton, which attempts to help one grasp the essential concepts involved in automata theory/theories.

Very informal description

An automaton is a construct made of states designed to determine if the input should be accepted or rejected. It looks a lot like a basic board game where each space on the board represents a state. Each state has information about what to do when an input is received by the machine (again, rather like what to do when you land on the Jail spot in a popular board game). As the machine receives a new input, it looks at the state and picks a new spot based on the information on what to do when it receives that input at that state. When there are no more inputs, the automaton stops and the space it is on when it completes determining whether the automaton accepts or rejects that particular set of inputs.

Informal description

An automaton runs when it is given some sequence of inputs in discrete (individual) time steps or steps. An automaton processes one input picked from a set of symbols or letters, which is called an alphabet. The symbols received by the automaton as input at any step are a finite sequence of symbols called words. An automaton has a finite set of states. At each moment during a run of the automaton, the automaton is in one of its states. When the automaton receives new input it moves to another state (or transitions) based on a function that takes the current state and symbol as parameters. This function is called the transition function. The automaton reads the symbols of the input word one after another and transitions from state to state according to the transition function until the word is read completely. Once the input word has been read, the automaton is said to have stopped. The state at which the automaton stops is called the final state. Depending on the final state, it's said that the automaton either accepts or rejects an input word. There is a subset of states of the automaton, which is defined as the set of accepting states. If the final state is an accepting state, then the automaton accepts the word. Otherwise, the word is rejected. The set of all the words accepted by an automaton is called the "language of that automaton". Any subset of the language of an automaton is a language recognized by that automaton.

In short, an automaton is a mathematical object that takes a word as input and decides whether to accept it or reject it. Since all computational problems are reducible into the accept/reject question on inputs, (all problem instances can be represented in a finite length of symbols)^{[citation needed]}, automata theory plays a crucial role in computational theory.

Formal definition

Definition of finite state automata

A deterministic finite automaton is represented formally by a 5-tuple Σ

, δ,q₀,F>, where:

Q is a finite set of states.
Σ is a finite set of symbols, called the alphabet of the automaton.
δ is the transition function, that is, δ: Q × Σ → Q.
q₀ is the start state, that is, the state of the automaton before any input has been processed, where q₀∈ Q.
F is a set of states of Q (i.e. F⊆Q) called accept states.

Input word: An automaton reads a finite string of symbols a₁,a₂,...., a_n , where a_i ∈ Σ, which is called an input word. The set of all words is denoted by Σ*.
Run: A sequence of states q₀,q₁,q₂,...., q_n, where q_i ∈ Q such that q₀ is the start state and q_i = δ(q_i-1,a_i) for 0 < i ≤ n, is a run of the automaton on an input word w = a₁,a₂,...., a_n ∈ Σ*. In other words, at first the automaton is at the start state q₀, and then the automaton reads symbols of the input word in sequence. When the automaton reads symbol a_i it jumps to state q_i = δ(q_i-1,a_i). q_n is said to be the final state of the run.

Accepting word: A word w ∈ Σ* is accepted by the automaton if q_n ∈ F.

Recognized language: An automaton can recognize a formal language. The language L ⊆ Σ* recognized by an automaton is the set of all the words that are accepted by the automaton.

Recognizable languages: The recognizable languages are the set of languages that are recognized by some automaton. For the above definition of automata the recognizable languages are regular languages. For different definitions of automata, the recognizable languages are different.

Variant definitions of automata

Automata are defined to study useful machines under mathematical formalism. So, the definition of an automaton is open to variations according to the "real world machine", which we want to model using the automaton. People have studied many variations of automata. The most standard variant, which is described above, is called a deterministic finite automaton. The following are some popular variations in the definition of different components of automata.

Input

Finite input: An automaton that accepts only finite sequence of symbols. The above introductory definition only encompasses finite words.
Infinite input: An automaton that accepts infinite words (ω-words). Such automata are called ω-automata.
Tree word input: The input may be a tree of symbols instead of sequence of symbols. In this case after reading each symbol, the automaton reads all the successor symbols in the input tree. It is said that the automaton makes one copy of itself for each successor and each such copy starts running on one of the successor symbols from the state according to the transition relation of the automaton. Such an automaton is called a tree automaton.
Infinite tree input : The two extensions above can be combined, so the automaton reads a tree structure with (in)finite branches. Such an automaton is called an infinite tree automaton

States

Finite states: An automaton that contains only a finite number of states. The above introductory definition describes automata with finite numbers of states.
Infinite states: An automaton that may not have a finite number of states, or even a countable number of states. For example, the quantum finite automaton or topological automaton has uncountable infinity of states.
Stack memory: An automaton may also contain some extra memory in the form of a stack in which symbols can be pushed and popped. This kind of automaton is called a pushdown automaton

Transition function

Deterministic: For a given current state and an input symbol, if an automaton can only jump to one and only one state then it is a deterministic automaton.
Nondeterministic: An automaton that, after reading an input symbol, may jump into any of a number of states, as licensed by its transition relation. Notice that the term transition function is replaced by transition relation: The automaton non-deterministically decides to jump into one of the allowed choices. Such automata are called nondeterministic automata.
Alternation: This idea is quite similar to tree automaton, but orthogonal. The automaton may run its multiple copies on the same next read symbol. Such automata are called alternating automata. Acceptance condition must satisfy all runs of such copies to accept the input.

Acceptance condition

Acceptance of finite words: Same as described in the informal definition above.
Acceptance of infinite words: an omega automaton cannot have final states, as infinite words never terminate. Rather, acceptance of the word is decided by looking at the infinite sequence of visited states during the run.
Probabilistic acceptance: An automaton need not strictly accept or reject an input. It may accept the input with some probability between zero and one. For example, quantum finite automaton, geometric automaton and metric automaton have probabilistic acceptance.

Different combinations of the above variations produce many classes of automaton.

Automata theory is a subject matter that studies properties of various types of automata. For example, the following questions are studied about a given type of automata.

Which class of formal languages is recognizable by some type of automata? (Recognizable languages)
Are certain automata closed under union, intersection, or complementation of formal languages? (Closure properties)
How expressive is a type of automata in terms of recognizing a class of formal languages? And, their relative expressive power? (Language hierarchy)

Automata theory also studies the existence or nonexistence of any effective algorithms to solve problems similar to the following list:

Does an automaton accept any input word? (Emptiness checking)
Is it possible to transform a given non-deterministic automaton into deterministic automaton without changing the recognizable language? (Determinization)
For a given formal language, what is the smallest automaton that recognizes it? (Minimization)

Classes of automata

The following is an incomplete list of types of automata.

Automaton	Recognizable language
Nondeterministic/Deterministic Finite state machine (FSM)	regular languages
Deterministic pushdown automaton (DPDA)	deterministic context-free languages
Pushdown automaton (PDA)	context-free languages
Linear bounded automaton (LBA)	context-sensitive languages
Turing machine	recursively enumerable languages
Deterministic Büchi automaton	ω-limit languages
Nondeterministic Büchi automaton	ω-regular languages
Rabin automaton, Streett automaton, Parity automaton, Muller automaton	ω-regular languages

Discrete, continuous, and hybrid automata

Normally automata theory describes the states of abstract machines but there are analog automata or continuous automata or hybrid discrete-continuous automata, which use analog data, continuous time, or both.

Hierarchy in terms of powers

The following is an incomplete hierarchy in terms of powers of different types of virtual machines. The hierarchy reflects the nested categories of languages the machines are able to accept.

Automaton
Deterministic Finite Automaton (DFA) -- Lowest Power (same power) $\|\|$ (same power) Nondeterministic Finite Automaton (NFA) (above is weaker) $\cap$ (below is stronger) Deterministic Push Down Automaton (DPDA-I) with 1 push-down store $\cap$ Nondeterministic Push Down Automaton (NPDA-I) with 1 push-down store $\cap$ Linear Bounded Automaton (LBA) $\cap$ Deterministic Push Down Automaton (DPDA-II) with 2 push-down stores $\|\|$ Nondeterministic Push Down Automaton (NPDA-II) with 2 push-down stores $\|\|$ Deterministic Turing Machine (DTM) $\|\|$ Nondeterministic Turing Machine (NTM) $\|\|$ Probabilistic Turing Machine (PTM) $\|\|$ Multitape Turing Machine (MTM) $\|\|$ Multidimensional Turing Machine

Applications

Each model in automata theory plays important roles in several applied areas. Finite automata are used in text processing, compilers, and hardware design. Context-free grammar (CFGs) are used in programming languages and artificial intelligence. Originally, CFGs were used in the study of the human languages. Cellular automata are used in the field of biology, the most common example being John Conway's Game of Life. Some other examples which could be explained using automata theory in biology include mollusk and pine cones growth and pigmentation patterns. Going further, a theory suggesting that the whole universe is computed by some sort of a discrete automaton, is advocated by some scientists. The idea originated in the work of Konrad Zuse, and was popularized in America by Edward Fredkin. Automata also appear in the theory of finite fields: the set of irreducible polynomials which can be written as composition of degree two polynomials is in fact a regular language.

Automata simulators

Automata simulators are pedagogical tools used to teach, learn and research automata theory. An automata simulator takes as input the description of an automaton and then simulates its working for an arbitrary input string. The description of the automaton can be entered in several ways. An automaton can be defined in a symbolic language or its specification may be entered in a predesigned form or its transition diagram may be drawn by clicking and dragging the mouse. Well known automata simulators include Turing's World, JFLAP, VAS, TAGS and SimStudio.

Connection to category theory

One can define several distinct categories of automata following the automata classification into different types described in the previous section. The mathematical category of deterministic automata, sequential machines or sequential automata, and Turing machines with automata homomorphisms defining the arrows between automata is a Cartesian closed category, it has both categorical limits and colimits. An automata homomorphism maps a quintuple of an automaton A_i onto the quintuple of another automaton A_j. Automata homomorphisms can also be considered as automata transformations or as semigroup homomorphisms, when the state space, S, of the automaton is defined as a semigroup S_g. Monoids are also considered as a suitable setting for automata in monoidal categories.

Categories of variable automata

One could also define a variable automaton, in the sense of Norbert Wiener in his book on The Human Use of Human Beings via the endomorphisms

A_{i}\to A_{i}

. Then, one can show that such variable automata homomorphisms form a mathematical group. In the case of non-deterministic, or other complex kinds of automata, the latter set of endomorphisms may become, however, a variable automaton groupoid. Therefore, in the most general case, categories of variable automata of any kind are categories of groupoids or groupoid categories. Moreover, the category of reversible automata is then a 2-category, and also a subcategory of the 2-category of groupoids, or the groupoid category.

Bayesian programming

From Wikipedia, the free encyclopedia

Bayesian programming is a formalism and a methodology for having a technique to specify probabilistic models and solve problems when less than the necessary information is available. Bayes’ Theorem is the central concept behind this programming approach, which states that the probability of something occurring in the future can be inferred by past conditions related to the event.

Edwin T. Jaynes proposed that probability could be considered as an alternative and an extension of logic for rational reasoning with incomplete and uncertain information. In his founding book Probability Theory: The Logic of Science he developed this theory and proposed what he called “the robot,” which was not a physical device, but an inference engine to automate probabilistic reasoning—a kind of Prolog for probability instead of logic. Bayesian programming is a formal and concrete implementation of this "robot".

Bayesian programming may also be seen as an algebraic formalism to specify graphical models such as, for instance, Bayesian networks, dynamic Bayesian networks, Kalman filters or hidden Markov models. Indeed, Bayesian Programming is more general than Bayesian networks and has a power of expression equivalent to probabilistic factor graphs.

Formalism

A Bayesian program is a means of specifying a family of probability distributions.

The constituent elements of a Bayesian program are presented below:

{\displaystyle {\text{Program}}{\begin{cases}{\text{Description}}{\begin{cases}{\text{Specification}}(\pi ){\begin{cases}{\text{Variables}}\\{\text{Decomposition}}\\{\text{Forms}}\\\end{cases}}\\{\text{Identification (based on }}\delta )\end{cases}}\\{\text{Question}}\end{cases}}}

A program is constructed from a description and a question.
A description is constructed using some specification ( $\pi$ ) as given by the programmer and an identification or learning process for the parameters not completely specified by the specification, using a data set ( $\delta$ ).
A specification is constructed from a set of pertinent variables, a decomposition and a set of forms.
Forms are either parametric forms or questions to other Bayesian programs.
A question specifies which probability distribution has to be computed.

Description

The purpose of a description is to specify an effective method of computing a joint probability distribution on a set of variables

\left\{X_{1},X_{2},\cdots ,X_{N}\right\}

given a set of experimental data

\delta

and some specification

\pi

. This joint distribution is denoted as:

P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)

.

To specify preliminary knowledge

\pi

, the programmer must undertake the following:

Define the set of relevant variables $\left\{X_{1},X_{2},\cdots ,X_{N}\right\}$ on which the joint distribution is defined.
Decompose the joint distribution (break it into relevant independent or conditional probabilities).
Define the forms of each of the distributions (e.g., for each variable, one of the list of probability distributions).

Decomposition

Given a partition of

\left\{X_{1},X_{2},\ldots ,X_{N}\right\}

containing

K

subsets,

K

variables are defined

L_{1},\cdots ,L_{K}

, each corresponding to one of these subsets. Each variable

L_{k}

is obtained as the conjunction of the variables

\left\{X_{k_{1}},X_{k_{2}},\cdots \right\}

belonging to the

k^{th}

subset. Recursive application of Bayes' theorem leads to:

{\displaystyle {\begin{aligned}&P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)\\={}&P\left(L_{1}\wedge \cdots \wedge L_{K}\mid \delta \wedge \pi \right)\\={}&P\left(L_{1}\mid \delta \wedge \pi \right)\times P\left(L_{2}\mid L_{1}\wedge \delta \wedge \pi \right)\times \cdots \times P\left(L_{K}\mid L_{K-1}\wedge \cdots \wedge L_{1}\wedge \delta \wedge \pi \right)\end{aligned}}}

Conditional independence hypotheses then allow further simplifications. A conditional independence hypothesis for variable

L_{k}

is defined by choosing some variable

X_{n}

among the variables appearing in the conjunction

L_{k-1}\wedge \cdots \wedge L_{2}\wedge L_{1}

, labelling

R_{k}

as the conjunction of these chosen variables and setting:

{\displaystyle P\left(L_{k}\mid L_{k-1}\wedge \cdots \wedge L_{1}\wedge \delta \wedge \pi \right)=P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)}

We then obtain:

{\displaystyle {\begin{aligned}&P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)\\={}&P\left(L_{1}\mid \delta \wedge \pi \right)\times P\left(L_{2}\mid R_{2}\wedge \delta \wedge \pi \right)\times \cdots \times P\left(L_{K}\mid R_{K}\wedge \delta \wedge \pi \right)\end{aligned}}}

Such a simplification of the joint distribution as a product of simpler distributions is called a decomposition, derived using the chain rule.

This ensures that each variable appears at the most once on the left of a conditioning bar, which is the necessary and sufficient condition to write mathematically valid decompositions.

Forms

Each distribution

P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)

appearing in the product is then associated with either a parametric form (i.e., a function

f_{\mu }\left(L_{k}\right)

) or a question to another Bayesian program

{\displaystyle P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)=P\left(L\mid R\wedge {\widehat {\delta }}\wedge {\widehat {\pi }}\right)}

.

When it is a form

f_{\mu }\left(L_{k}\right)

, in general,

\mu

is a vector of parameters that may depend on

R_{k}

\delta

or both. Learning takes place when some of these parameters are computed using the data set

\delta

.

An important feature of Bayesian Programming is this capacity to use questions to other Bayesian programs as components of the definition of a new Bayesian program.

P\left(L_{k}\mid R_{k}\wedge \delta \wedge \pi \right)

is obtained by some inferences done by another Bayesian program defined by the specifications

{\widehat {\pi }}

and the data

{\widehat {\delta }}

. This is similar to calling a subroutine in classical programming and provides an easy way to build hierarchical models.

Question

Given a description (i.e.,

P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)

), a question is obtained by partitioning

\left\{X_{1},X_{2},\cdots ,X_{N}\right\}

into three sets: the searched variables, the known variables and the free variables.

The 3 variables

Searched

Known

and

Free

are defined as the conjunction of the variables belonging to these sets.

A question is defined as the set of distributions:

P\left(Searched\mid {\text{Known}}\wedge \delta \wedge \pi \right)

made of many "instantiated questions" as the cardinal of

Known

, each instantiated question being the distribution:

P\left({\text{Searched}}\mid {\text{Known}}\wedge \delta \wedge \pi \right)

Inference

Given the joint distribution

P\left(X_{1}\wedge X_{2}\wedge \cdots \wedge X_{N}\mid \delta \wedge \pi \right)

, it is always possible to compute any possible question using the following general inference:

{\displaystyle {\begin{aligned}&P\left({\text{Searched}}\mid {\text{Known}}\wedge \delta \wedge \pi \right)\\={}&\sum _{\text{Free}}\left[P\left({\text{Searched}}\wedge {\text{Free}}\mid {\text{Known}}\wedge \delta \wedge \pi \right)\right]\\={}&{\frac {\displaystyle \sum _{\text{Free}}\left[P\left({\text{Searched}}\wedge {\text{Free}}\wedge {\text{Known}}\mid \delta \wedge \pi \right)\right]}{\displaystyle P\left({\text{Known}}\mid \delta \wedge \pi \right)}}\\={}&{\frac {\displaystyle \sum _{\text{Free}}\left[P\left({\text{Searched}}\wedge {\text{Free}}\wedge {\text{Known}}\mid \delta \wedge \pi \right)\right]}{\displaystyle \sum _{{\text{Free}}\wedge {\text{Searched}}}\left[P\left({\text{Searched}}\wedge {\text{Free}}\wedge {\text{Known}}\mid \delta \wedge \pi \right)\right]}}\\={}&{\frac {1}{Z}}\times \sum _{\text{Free}}\left[P\left({\text{Searched}}\wedge {\text{Free}}\wedge {\text{Known}}\mid \delta \wedge \pi \right)\right]\end{aligned}}}

where the first equality results from the marginalization rule, the second results from Bayes' theorem and the third corresponds to a second application of marginalization. The denominator appears to be a normalization term and can be replaced by a constant

Z

.

Theoretically, this allows to solve any Bayesian inference problem. In practice, however, the cost of computing exhaustively and exactly

P\left({\text{Searched}}\mid {\text{Known}}\wedge \delta \wedge \pi \right)

is too great in almost all cases.

Replacing the joint distribution by its decomposition we get:

{\displaystyle {\begin{aligned}&P\left({\text{Searched}}\mid {\text{Known}}\wedge \delta \wedge \pi \right)\\={}&{\frac {1}{Z}}\sum _{\text{Free}}\left[\prod _{k=1}^{K}\left[P\left(L_{i}\mid K_{i}\wedge \pi \right)\right]\right]\end{aligned}}}

which is usually a much simpler expression to compute, as the dimensionality of the problem is considerably reduced by the decomposition into a product of lower dimension distributions.

Example

Bayesian spam detection

The purpose of Bayesian spam filtering is to eliminate junk e-mails.

The problem is very easy to formulate. E-mails should be classified into one of two categories: non-spam or spam. The only available information to classify the e-mails is their content: a set of words.
Using these words without taking the order into account is commonly called a bag of words model.

The classifier should furthermore be able to adapt to its user and to learn from experience. Starting from an initial standard setting, the classifier should modify its internal parameters when the user disagrees with its own decision. It will hence adapt to the user’s criteria to differentiate between non-spam and spam. It will improve its results as it encounters increasingly classified e-mails.

Variables

The variables necessary to write this program are as follows:

$Spam$ : a binary variable, false if the e-mail is not spam and true otherwise.
$W_{0},W_{1},\ldots ,W_{N-1}$ : $N$ binary variables. $W_{n}$ is true if the $n^{th}$ word of the dictionary is present in the text.

These

N+1

binary variables sum up all the information about an e-mail.

Decomposition

Starting from the joint distribution and applying recursively Bayes' theorem we obtain:

{\displaystyle {\begin{aligned}&P({\text{Spam}}\wedge W_{0}\wedge \cdots \wedge W_{N-1})\\={}&P({\text{Spam}})\times P(W_{0}\mid {\text{Spam}})\times P(W_{1}\mid {\text{Spam}}\wedge W_{0})\\&\times \cdots \\&\times P\left(W_{N-1}\mid {\text{Spam}}\wedge W_{0}\wedge \cdots \wedge W_{N-2}\right)\end{aligned}}}

This is an exact mathematical expression.

It can be drastically simplified by assuming that the probability of appearance of a word knowing the nature of the text (spam or not) is independent of the appearance of the other words. This is the naive Bayes assumption and this makes this spam filter a naive Bayes model.
For instance, the programmer can assume that:

P(W_{1}\mid {\text{Spam}}\land W_{0})=P(W_{1}\mid {\text{Spam}})

to finally obtain:

{\displaystyle P({\text{Spam}}\land W_{0}\land \ldots \land W_{N-1})=P({\text{Spam}})\prod _{n=0}^{N-1}[P(W_{n}\mid {\text{Spam}})]}

This kind of assumption is known as the naive Bayes' assumption. It is "naive" in the sense that the independence between words is clearly not completely true. For instance, it completely neglects that the appearance of pairs of words may be more significant than isolated appearances. However, the programmer may assume this hypothesis and may develop the model and the associated inferences to test how reliable and efficient it is.

Parametric forms

To be able to compute the joint distribution, the programmer must now specify the

N+1

distributions appearing in the decomposition:

$P({\text{Spam}})$ is a prior defined, for instance, by $P([{\text{Spam}}=1])=0.75$
Each of the $N$ $N$ forms $P(W_{n}\mid {\text{Spam}})$ $P(W_{n}\mid {\text{Spam}})$ may be specified using Laplace rule of succession (this is a pseudocounts-based smoothing technique to counter the zero-frequency problem of words never-seen-before):
1. $P(W_{n}\mid [{\text{Spam}}={\text{false}}])={\frac {1+a_{f}^{n}}{2+a_{f}}}$
2. $P(W_{n}\mid [{\text{Spam}}={\text{true}}])={\frac {1+a_{t}^{n}}{2+a_{t}}}$

where

a_{f}^{n}

stands for the number of appearances of the

n^{th}

word in non-spam e-mails and

a_{f}

stands for the total number of non-spam e-mails. Similarly,

a_{t}^{n}

stands for the number of appearances of the

n^{th}

word in spam e-mails and

a_{t}

stands for the total number of spam e-mails.

Identification

The

N

forms

P(W_{n}\mid {\text{Spam}})

are not yet completely specified because the

2N+2

parameters

a_{f}^{n=0,\ldots ,N-1}

a_{t}^{n=0,\ldots ,N-1}

a_{f}

and

a_{t}

have no values yet.

The identification of these parameters could be done either by batch processing a series of classified e-mails or by an incremental updating of the parameters using the user's classifications of the e-mails as they arrive.

Both methods could be combined: the system could start with initial standard values of these parameters issued from a generic database, then some incremental learning customizes the classifier to each individual user.

Question

The question asked to the program is: "what is the probability for a given text to be spam knowing which words appear and don't appear in this text?" It can be formalized by:

P({\text{Spam}}\mid w_{0}\wedge \cdots \wedge w_{N-1})

which can be computed as follows:

{\displaystyle {\begin{aligned}&P({\text{Spam}}\mid w_{0}\wedge \cdots \wedge w_{N-1})\\={}&{\frac {\displaystyle P({\text{Spam}})\prod _{n=0}^{N-1}[P(w_{n}\mid {\text{Spam}})]}{\displaystyle \sum _{\text{Spam}}[P({\text{Spam}})\prod _{n=0}^{N-1}[P(w_{n}\mid {\text{Spam}})]]}}\end{aligned}}}

The denominator appears to be a normalization constant. It is not necessary to compute it to decide if we are dealing with spam. For instance, an easy trick is to compute the ratio:

{\displaystyle {\begin{aligned}&{\frac {P([{\text{Spam}}={\text{true}}]\mid w_{0}\wedge \cdots \wedge w_{N-1})}{P([{\text{Spam}}={\text{false}}]\mid w_{0}\wedge \cdots \wedge w_{N-1})}}\\={}&{\frac {P([{\text{Spam}}={\text{true}}])}{P([{\text{Spam}}={\text{false}}])}}\times \prod _{n=0}^{N-1}\left[{\frac {P(w_{n}\mid [{\text{Spam}}={\text{true}}])}{P(w_{n}\mid [{\text{Spam}}={\text{false}}])}}\right]\end{aligned}}}

This computation is faster and easier because it requires only

2N

products.

Bayesian program

The Bayesian spam filter program is completely defined by:

{\displaystyle \Pr {\begin{cases}Ds{\begin{cases}Sp(\pi ){\begin{cases}Va:{\text{Spam}},W_{0},W_{1}\ldots W_{N-1}\\Dc:{\begin{cases}P({\text{Spam}}\land W_{0}\land \ldots \land W_{n}\land \ldots \land W_{N-1})\\=P({\text{Spam}})\prod _{n=0}^{N-1}P(W_{n}\mid {\text{Spam}})\end{cases}}\\Fo:{\begin{cases}P({\text{Spam}}):{\begin{cases}P([{\text{Spam}}={\text{false}}])=0.25\\P([{\text{Spam}}={\text{true}}])=0.75\end{cases}}\\P(W_{n}\mid {\text{Spam}}):{\begin{cases}P(W_{n}\mid [{\text{Spam}}={\text{false}}])\\={\frac {1+a_{f}^{n}}{2+a_{f}}}\\P(W_{n}\mid [{\text{Spam}}={\text{true}}])\\={\frac {1+a_{t}^{n}}{2+a_{t}}}\end{cases}}\\\end{cases}}\\\end{cases}}\\{\text{Identification (based on }}\delta )\end{cases}}\\Qu:P({\text{Spam}}\mid w_{0}\land \ldots \land w_{n}\land \ldots \land w_{N-1})\end{cases}}}

Bayesian filter, Kalman filter and hidden Markov model

Bayesian filters (often called Recursive Bayesian estimation) are generic probabilistic models for time evolving processes. Numerous models are particular instances of this generic approach, for instance: the Kalman filter or the Hidden Markov model (HMM).

Variables

Variables $S^{0},\ldots ,S^{T}$ are a time series of state variables considered to be on a time horizon ranging from $0$ to $T$ .
Variables $O^{0},\ldots ,O^{T}$ are a time series of observation variables on the same horizon.

Decomposition

The decomposition is based:

on $P(S^{t}\mid S^{t-1})$ , called the system model, transition model or dynamic model, which formalizes the transition from the state at time $t-1$ to the state at time $t$ ;
on $P(O^{t}\mid S^{t})$ , called the observation model, which expresses what can be observed at time $t$ when the system is in state $S^{t}$ ;
on an initial state at time $0$ : $P(S^{0}\wedge O^{0})$ .

Parametrical forms

The parametrical forms are not constrained and different choices lead to different well-known models: see Kalman filters and Hidden Markov models just below.

Question

The typical question for such models is

P\left(S^{t+k}\mid O^{0}\wedge \cdots \wedge O^{t}\right)

: what is the probability distribution for the state at time

t+k

knowing the observations from instant

0

t

?
The most common case is Bayesian filtering where

k=0

, which searches for the present state, knowing past observations.

However it is also possible

(k>0)

, to extrapolate a future state from past observations, or to do smoothing

{\displaystyle (k<0 annotation="">

, to recover a past state from observations made either before or after that instant.
More complicated questions may also be asked as shown below in the HMM section.

Bayesian filters

(k=0)

have a very interesting recursive property, which contributes greatly to their attractiveness.

P\left(S^{t}|O^{0}\wedge \cdots \wedge O^{t}\right)

may be computed simply from

P\left(S^{t1}\mid O^{0}\wedge \cdots \wedge O^{t-1}\right)

with the following formula:

{\displaystyle {\begin{array}{ll}&P\left(S^{t}|O^{0}\wedge \cdots \wedge O^{t}\right)\\=&P\left(O^{t}|S^{t}\right)\times \sum _{S^{t-1}}\left[P\left(S^{t}|S^{t-1}\right)\times P\left(S^{t-1}|O^{0}\wedge \cdots \wedge O^{t-1}\right)\right]\end{array}}}

Another interesting point of view for this equation is to consider that there are two phases: a prediction phase and an estimation phase:

During the prediction phase, the state is predicted using the dynamic model and the estimation of the state at the previous moment:

{\displaystyle {\begin{array}{ll}&P\left(S^{t}|O^{0}\wedge \cdots \wedge O^{t-1}\right)\\=&\sum _{S^{t-1}}\left[P\left(S^{t}|S^{t-1}\right)\times P\left(S^{t-1}|O^{0}\wedge \cdots \wedge O^{t-1}\right)\right]\end{array}}}

During the estimation phase, the prediction is either confirmed or invalidated using the last observation:

{\displaystyle {\begin{aligned}&P\left(S^{t}\mid O^{0}\wedge \cdots \wedge O^{t}\right)\\={}&P\left(O^{t}\mid S^{t}\right)\times P\left(S^{t}|O^{0}\wedge \cdots \wedge O^{t-1}\right)\end{aligned}}}

Bayesian program

{\displaystyle Pr{\begin{cases}Ds{\begin{cases}Sp(\pi ){\begin{cases}Va:\\S^{0},\cdots ,S^{T},O^{0},\cdots ,O^{T}\\Dc:\\{\begin{cases}&P\left(S^{0}\wedge \cdots \wedge S^{T}\wedge O^{0}\wedge \cdots \wedge O^{T}|\pi \right)\\=&P\left(S^{0}\wedge O^{0}\right)\times \prod _{t=1}^{T}\left[P\left(S^{t}|S^{t-1}\right)\times P\left(O^{t}|S^{t}\right)\right]\end{cases}}\\Fo:\\{\begin{cases}P\left(S^{0}\wedge O^{0}\right)\\P\left(S^{t}|S^{t-1}\right)\\P\left(O^{t}|S^{t}\right)\end{cases}}\end{cases}}\\Id\end{cases}}\\Qu:\\{\begin{cases}{\begin{array}{l}P\left(S^{t+k}|O^{0}\wedge \cdots \wedge O^{t}\right)\\\left(k=0\right)\equiv {\text{Filtering}}\\\left(k>0\right)\equiv {\text{Prediction}}\\\left(k<0 annotation="" array="" cases="" end="" equiv="" moothing="" right="" text="">

Kalman filter

The very well-known Kalman filters are a special case of Bayesian filters.

They are defined by the following Bayesian program:

{\displaystyle Pr{\begin{cases}Ds{\begin{cases}Sp(\pi ){\begin{cases}Va:\\S^{0},\cdots ,S^{T},O^{0},\cdots ,O^{T}\\Dc:\\{\begin{cases}&P\left(S^{0}\wedge \cdots \wedge O^{T}|\pi \right)\\=&\left[{\begin{array}{c}P\left(S^{0}\wedge O^{0}|\pi \right)\\\prod _{t=1}^{T}\left[P\left(S^{t}|S^{t-1}\wedge \pi \right)\times P\left(O^{t}|S^{t}\wedge \pi \right)\right]\end{array}}\right]\end{cases}}\\Fo:\\{\begin{cases}P\left(S^{t}\mid S^{t-1}\wedge \pi \right)\equiv G\left(S^{t},A\bullet S^{t-1},Q\right)\\P\left(O^{t}\mid S^{t}\wedge \pi \right)\equiv G\left(O^{t},H\bullet S^{t},R\right)\end{cases}}\end{cases}}\\Id\end{cases}}\\Qu:\\P\left(S^{T}\mid O^{0}\wedge \cdots \wedge O^{T}\wedge \pi \right)\end{cases}}}

Variables are continuous.
The transition model $P(S^{t}\mid S^{t-1}\wedge \pi )$ and the observation model $P(O^{t}\mid S^{t}\wedge \pi )$ are both specified using Gaussian laws with means that are linear functions of the conditioning variables.

With these hypotheses and by using the recursive formula, it is possible to solve the inference problem analytically to answer the usual

P(S^{T}\mid O^{0}\wedge \cdots \wedge O^{T}\wedge \pi )

question. This leads to an extremely efficient algorithm, which explains the popularity of Kalman filters and the number of their everyday applications.

When there are no obvious linear transition and observation models, it is still often possible, using a first-order Taylor's expansion, to treat these models as locally linear. This generalization is commonly called the extended Kalman filter.

Hidden Markov model

Hidden Markov models (HMMs) are another very popular specialization of Bayesian filters.

They are defined by the following Bayesian program:

{\displaystyle \Pr {\begin{cases}Ds{\begin{cases}Sp(\pi ){\begin{cases}Va:\\S^{0},\ldots ,S^{T},O^{0},\ldots ,O^{T}\\Dc:\\{\begin{cases}&P\left(S^{0}\wedge \cdots \wedge O^{T}\mid \pi \right)\\=&\left[{\begin{array}{c}P\left(S^{0}\wedge O^{0}\mid \pi \right)\\\prod _{t=1}^{T}\left[P\left(S^{t}\mid S^{t-1}\wedge \pi \right)\times P\left(O^{t}\mid S^{t}\wedge \pi \right)\right]\end{array}}\right]\end{cases}}\\Fo:\\{\begin{cases}P\left(S^{0}\wedge O^{0}\mid \pi \right)\equiv {\text{Matrix}}\\P\left(S^{t}\mid S^{t-1}\wedge \pi \right)\equiv {\text{Matrix}}\\P\left(O^{t}\mid S^{t}\wedge \pi \right)\equiv {\text{Matrix}}\end{cases}}\end{cases}}\\Id\end{cases}}\\Qu:\\\max _{S^{1}\wedge \cdots \wedge S^{T-1}}\left[P\left(S^{1}\wedge \cdots \wedge S^{T-1}\mid S^{T}\wedge O^{0}\wedge \cdots \wedge O^{T}\wedge \pi \right)\right]\end{cases}}}

Variables are treated as being discrete.
The transition model $P\left(S^{t}\mid S^{t-1}\wedge \pi \right)$ and the observation model $P\left(O^{t}\mid S^{t}\wedge \pi \right)$ are

both specified using probability matrices.

The question most frequently asked of HMMs is:

{\displaystyle \max _{S^{1}\wedge \cdots \wedge S^{T-1}}\left[P\left(S^{1}\wedge \cdots \wedge S^{T-1}\mid S^{T}\wedge O^{0}\wedge \cdots \wedge O^{T}\wedge \pi \right)\right]}

What is the most probable series of states that leads to the present state, knowing the past observations?

This particular question may be answered with a specific and very efficient algorithm called the Viterbi algorithm.

The Baum–Welch algorithm has been developed for HMMs.

Applications

Academic applications

Since 2000, Bayesian programming has been used to develop both robotics applications and life sciences models.

Robotics

In robotics, bayesian programming was applied to autonomous robotics, robotic CAD systems, advanced driver-assistance systems, robotic arm control, mobile robotics, human-robot interaction, human-vehicle interaction (Bayesian autonomous driver models) video game avatar programming and training and real-time strategy games (AI).

Life sciences

In life sciences, bayesian programming was used in vision to reconstruct shape from motion, to model visuo-vestibular interaction and to study saccadic eye movements; in speech perception and control to study early speech acquisition and the emergence of articulatory-acoustic systems; and to model handwriting perception and control.

Pattern recognition

Bayesian program learning has potential applications voice recognition and synthesis, image recognition and natural language processing. It employs the principles of compositionality (building abstract representations from parts), causality (building complexity from parts) and learning to learn (using previously recognized concepts to ease the creation of new concepts).

Possibility theories

The comparison between probabilistic approaches (not only bayesian programming) and possibility theories continues to be debated.

Possibility theories like, for instance, fuzzy sets, fuzzy logic and possibility theory are alternatives to probability to model uncertainty. They argue that probability is insufficient or inconvenient to model certain aspects of incomplete/uncertain knowledge.

The defense of probability is mainly based on Cox's theorem, which starts from four postulates concerning rational reasoning in the presence of uncertainty. It demonstrates that the only mathematical framework that satisfies these postulates is probability theory. The argument is that any approach other than probability necessarily infringes one of these postulates and the value of that infringement.

Probabilistic programming

The purpose of probabilistic programming is to unify the scope of classical programming languages with probabilistic modeling (especially bayesian networks) to deal with uncertainty while profiting from the programming languages' expressiveness to encode complexity.

Extended classical programming languages include logical languages as proposed in Probabilistic Horn Abduction, Independent Choice Logic, PRISM, and ProbLog which proposes an extension of Prolog.

It can also be extensions of functional programming languages (essentially Lisp and Scheme) such as IBAL or CHURCH. The underlying programming languages can be object-oriented as in BLOG and FACTORIE or more standard ones as in CES and FIGARO.

The purpose of Bayesian programming is different. Jaynes' precept of "probability as logic" argues that probability is an extension of and an alternative to logic above which a complete theory of rationality, computation and programming can be rebuilt. Bayesian programming attempts to replace classical languages with a programming approach based on probability that considers incompleteness and uncertainty.

The precise comparison between the semantics and power of expression of Bayesian and probabilistic programming is an open question.

Search This Blog

Thursday, November 15, 2018

Automata-based programming

Example

Traditional (imperative) program in C

Automata-based style program

A separate function for the automation step

Explicit state transition table

Automation and Automata

Example Program

Automation & Events

Using object-oriented capabilities

Applications

History

Compared against imperative and procedural programming

Object-oriented programming relationship

Automata theory

Automata

Very informal description

Informal description

Formal definition

Definition of finite state automata

Variant definitions of automata

Classes of automata

Discrete, continuous, and hybrid automata

Hierarchy in terms of powers

Applications

Automata simulators

Connection to category theory

Bayesian programming

Formalism

Description

Decomposition

Forms

Question

Inference

Example

Bayesian spam detection

Variables

Decomposition

Parametric forms

Identification

Question

Bayesian program

Bayesian filter, Kalman filter and hidden Markov model

Variables

Decomposition

Parametrical forms

Question

Bayesian program

Kalman filter

Hidden Markov model

Applications

Academic applications

Robotics

Life sciences

Pattern recognition

Possibility theories

Probabilistic programming

Political psychology