Search This Blog

Thursday, November 2, 2023

Perceptron

From Wikipedia, the free encyclopedia

History

Mark I Perceptron machine, the first implementation of the perceptron algorithm. It was connected to a camera with 20×20 cadmium sulfide photocells to make a 400-pixel image. The main visible feature is the sensory-to-association plugboard, which sets different combinations of input features. To the right are arrays of potentiometers that implemented the adaptive weights.
The Mark 1 Perceptron. Sensory units at left, association units in center, and control panel and response units at far right. The sensory-to-association plugboard is behind the closed panel to the right of the operator. The letter "C" on the front panel is a display of the current state of the sensory input.

The perceptron was invented in 1943 by Warren McCulloch and Walter Pitts. The first hardware implementation was Mark I Perceptron machine built in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt, funded by the Information Systems Branch of the United States Office of Naval Research and the Rome Air Development Center. It was first publicly demonstrated on 23 June 1960. The machine was "part of a previously secret four-year NPIC [the US' National Photographic Interpretation Center] effort from 1963 through 1966 to develop this algorithm into a useful tool for photo-interpreters".

Rosenblatt described the details of the perceptron in a 1958 paper. His organization of a perceptron is constructed of three kinds of cells ("units"): AI, AII, R, which stand for "projection", "association" and "response".

Mark I Perceptron machine

Organization of a biological brain and a perceptron.

The perceptron was intended to be a machine, rather than a program, and while its first implementation was in software for the IBM 704, it was subsequently implemented in custom-built hardware as the "Mark I perceptron", designed for image recognition. The machine is currently in Smithsonian National Museum of American History.

The Mark I Perceptron has 3 layers.

  • An array of 400 photocells arranged in a 20x20 grid, named "sensory units" (S-units), or "input retina". Each S-unit can connect to up to 40 A-units.
  • A hidden layer of 512 perceptrons, named "association units" (A-units).
  • An output layer of 8 perceptrons, named "response units" (R-units).

Rosenblatt called this three-layered perceptron network the alpha-perceptron, to distinguish it from other perceptron models he experimented with.

The S-units are connected to the A-units randomly (according to a table of random numbers) via a plugboard (see photo), to "eliminate any particular intentional bias in the perceptron". The connection weights are fixed, not learned.

The A-units are connected to the R-units, with adjustable weights encoded in potentiometers, and weight updates during learning were performed by electric motors. The hardware details are in an operators' manual.

In a 1958 press conference organized by the US Navy, Rosenblatt made statements about the perceptron that caused a heated controversy among the fledgling AI community; based on Rosenblatt's statements, The New York Times reported the perceptron to be "the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence."

Rosenblatt described his experiments with many variants of the Perceptron machine in Principles of Neurodynamics (1962). Among the variants are:

  • "cross-coupling" (connections between units within the same layer) with possibly closed loops,
  • "back-coupling" (connections from units in a later layer to units in a previous layer),
  • four-layer perceptrons where the last two layers have adjustible weights (and thus a proper multilayer perceptron),
  • analyzing audio (instead of images).

The machine was shipped from Cornell to Smithsonian in 1967, under a government transfer administered by the Office of Naval Research.

Perceptrons (1969)

Although the perceptron initially seemed promising, it was quickly proved that perceptrons could not be trained to recognise many classes of patterns. This caused the field of neural network research to stagnate for many years, before it was recognised that a feedforward neural network with two or more layers (also called a multilayer perceptron) had greater processing power than perceptrons with one layer (also called a single-layer perceptron).

Single-layer perceptrons are only capable of learning linearly separable patterns. For a classification task with some step activation function, a single node will have a single line dividing the data points forming the patterns. More nodes can create more dividing lines, but those lines must somehow be combined to form more complex classifications. A second layer of perceptrons, or even linear nodes, are sufficient to solve many otherwise non-separable problems.

In 1969, a famous book entitled Perceptrons by Marvin Minsky and Seymour Papert showed that it was impossible for these classes of network to learn an XOR function. It is often believed (incorrectly) that they also conjectured that a similar result would hold for a multi-layer perceptron network. However, this is not true, as both Minsky and Papert already knew that multi-layer perceptrons were capable of producing an XOR function. (See the page on Perceptrons (book) for more information.) Nevertheless, the often-miscited Minsky/Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until neural network research experienced a resurgence in the 1980s. This text was reprinted in 1987 as "Perceptrons - Expanded Edition" where some errors in the original text are shown and corrected.

Subsequent work

The kernel perceptron algorithm was already introduced in 1964 by Aizerman et al. Margin bounds guarantees were given for the Perceptron algorithm in the general non-separable case first by Freund and Schapire (1998), and more recently by Mohri and Rostamizadeh (2013) who extend previous results and give new L1 bounds.

The perceptron is a simplified model of a biological neuron. While the complexity of biological neuron models is often required to fully understand neural behavior, research suggests a perceptron-like linear model can produce some behavior seen in real neurons.

The solution spaces of decision boundaries for all binary functions and learning behaviors are studied in .

Definition

The appropriate weights are applied to the inputs, and the resulting weighted sum passed to a function that produces the output o.

In the modern sense, the perceptron is an algorithm for learning a binary classifier called a threshold function: a function that maps its input (a real-valued vector) to an output value (a single binary value):

where is the heaviside step-function, is a vector of real-valued weights, is the dot product , where m is the number of inputs to the perceptron, and b is the bias. The bias shifts the decision boundary away from the origin and does not depend on any input value.

Equivalently, since , we can add a coordinate to each input, and then write it as a linear classifier that passes the origin:

The value of (0 or 1) is used to perform binary classification on as either a positive or a negative instance. Spatially, the bias shifts the position (though not the orientation) of the planar decision boundary.

In the context of neural networks, a perceptron is an artificial neuron using the Heaviside step function as the activation function. The perceptron algorithm is also termed the single-layer perceptron, to distinguish it from a multilayer perceptron, which is a misnomer for a more complicated neural network. As a linear classifier, the single-layer perceptron is the simplest feedforward neural network.

Power of representation

Information Theory

From an information theory point of view, a single perceptron with K inputs has a capacity of 2K bits of information. This result is due to Thomas Cover.

Specifically let be the number of ways to linearly separate N points in K dimensions, then

When K is large, is very close to one when , but very close to zero when . In words, one perceptron unit can almost certainly memorize a random assignment of binary labels on N points when , but almost certainly not when .

Boolean function

When operating on only binary inputs, a perceptron is called a linearly separable Boolean function, or threshold Boolean function. The sequence of numbers of threshold Boolean functions on n inputs is OEIS A000609. The value is only known exactly up to case, but the order of magnitude is known quite exactly: it has upper bound and lower bound .

Any Boolean linear threshold function can be implemented with only integer weights. Furthermore, the number of bits necessary and sufficient for representing a single integer weight parameter is .

Universal approximation theorem

A single perceptron can learn to classify any half-space. It cannot solve any linearly nonseparable vectors, such as the Boolean exclusive-or problem (the famous "XOR problem").

A perceptron network with one hidden layer can learn to classify any compact subset arbitrarily closely. Similarly, it can also approximate any compactly-supported continuous function arbitrarily closely. This is essentially a special case of the theorems by George Cybenko and Kurt Hornik.

Conjunctively local perceptron

Perceptrons (Minsky and Papert, 1969) studied the kind of perceptron networks necessary to learn various boolean functions.

Consider a perceptron network with input units, one hidden layer, and one output, similar to the Mark I Perceptron machine. It computes a boolean function of type . They call a function conjuctively local of order , iff there exists a perceptron network such that each unit in the hidden layer connects to at most input units.

Theorem. (Theorem 3.1.1): The parity function is conjuctively local of order .

Theorem. (Section 5.5): The connectedness function is conjuctively local of order .

Learning algorithm for a single-layer perceptron

A diagram showing a perceptron updating its linear boundary as more training examples are added

Below is an example of a learning algorithm for a single-layer perceptron with a single output unit. For a single-layer perceptron with multiple output units, since the weights of one output unit are completely separate from all the others', the same algorithm can be run for each output unit.

For multilayer perceptrons, where a hidden layer exists, more sophisticated algorithms such as backpropagation must be used. If the activation function or the underlying process being modeled by the perceptron is nonlinear, alternative learning algorithms such as the delta rule can be used as long as the activation function is differentiable. Nonetheless, the learning algorithm described in the steps below will often work, even for multilayer perceptrons with nonlinear activation functions.

When multiple perceptrons are combined in an artificial neural network, each output neuron operates independently of all the others; thus, learning each output can be considered in isolation.

Definitions

We first define some variables:

  • is the learning rate of the perceptron. Learning rate is a positive number usually chosen to be less than 1. The larger the value, the greater the chance for volatility in the weight changes.
  • denotes the output from the perceptron for an input vector .
  • is the training set of samples, where:
    • is the -dimensional input vector.
    • is the desired output value of the perceptron for that input.

We show the values of the features as follows:

  • is the value of the th feature of the th training input vector.
  • .

To represent the weights:

  • is the th value in the weight vector, to be multiplied by the value of the th input feature.
  • Because , the is effectively a bias that we use instead of the bias constant .

To show the time-dependence of , we use:

  • is the weight at time .

Steps

  1. Initialize the weights. Weights may be initialized to 0 or to a small random value. In the example below, we use 0.
  2. For each example j in our training set D, perform the following steps over the input and desired output :
    1. Calculate the actual output:
    2. Update the weights:
      , for all features , is the learning rate.
  3. For offline learning, the second step may be repeated until the iteration error is less than a user-specified error threshold , or a predetermined number of iterations have been completed, where s is again the size of the sample set.

The algorithm updates the weights after every training sample in step 2b.

Convergence of one perceptron on a linearly separable dataset

A single perceptron is a linear classifier, therefore it can only reach a stable state if all input vectors classified correctly if the training set D is not linearly separable, i.e. if the positive examples cannot be separated from the negative examples by a hyperplane. In this case, the algorithm would not converge since there is no solution. Hence, if linear separability of the training set is not known a priori, one of the training variants below should be used. Detailed analysis and extensions to the convergence theorem are in Chapter 11 of Perceptrons (1969).

Linear separability is testable in time , where is the number of data points, and is the dimension of each point.

If the training set is linearly separable, then the perceptron is guaranteed to converge after making finitely many mistakes. The theorem is proved by Rosenblatt et al.

Perceptron convergence theorem — Given a dataset , such that , and it is linearly separable by some unit vector , with margin :

Then the perceptron 0-1 learning algorithm converges after making at most mistakes, for any learning rate, and any method of sampling from the dataset.

The following simple proof is due to Novikoff (1962). The idea of the proof is that the weight vector is always adjusted by a bounded amount in a direction with which it has a negative dot product, and thus can be bounded above by O(t), where t is the number of changes to the weight vector. However, it can also be bounded below by O(t) because if there exists an (unknown) satisfactory weight vector, then every change makes progress in this (unknown) direction by a positive amount that depends only on the input vector.

Proof

Suppose at step , the perceptron with weight makes a mistake on data point , then it updates to .

If , the argument is symmetric, so we omit it.

WLOG, , then , , and .

By assumption, we have separation with margins:

Thus,

Also

and since the perceptron made a mistake, , and so

Since we started with , after making mistakes,

but also

Combining the two, we have

Two classes of points, and two of the infinitely many linear boundaries that separate them. Even though the boundaries are at nearly right angles to one another, the perceptron algorithm has no way of choosing between them.

While the perceptron algorithm is guaranteed to converge on some solution in the case of a linearly separable training set, it may still pick any solution and problems may admit many solutions of varying quality. The perceptron of optimal stability, nowadays better known as the linear support-vector machine, was designed to solve this problem (Krauth and Mezard, 1987).

Perceptron cycling theorem

When the dataset is not linearly separable, then there is no way for a single perceptron to converge. However, we still have

Perceptron cycling theorem — If the dataset has only finitely many points, then there exists an upper bound number , such that for any starting weight vector all weight vector has norm bounded by

This is proved first by Bradley Efron.

Learning a Boolean function

Consider a dataset where the are from , that is, the vertices of an n-dimensional hypercube centered at origin, and . That is, all data points with positive have , and vice versa. By the perceptron convergence theorem, a perceptron would converge after making at most mistakes.

If we were to write a logical program to perform the same task, each positive example shows that one of the coordinates is the right one, and each negative example shows that its complement is a positive example. By collecting all the known positive examples, we eventually eliminate all but one coordinate, at which point the dataset is learned.

This bound is asymptotically tight in terms of the worst-case. In the worst-case, the first presented example is entirely new, and gives bits of information, but each subsequent example would differ minimally from previous examples, and gives 1 bit each. After examples, there are bits of information, which is sufficient for the perceptron (with bits of information).

However, it is not tight in terms of expectation if the examples are presented uniformly at random, since the first would give bits, the second bits, and so on, taking examples in total.

Variants

The pocket algorithm with ratchet (Gallant, 1990) solves the stability problem of perceptron learning by keeping the best solution seen so far "in its pocket". The pocket algorithm then returns the solution in the pocket, rather than the last solution. It can be used also for non-separable data sets, where the aim is to find a perceptron with a small number of misclassifications. However, these solutions appear purely stochastically and hence the pocket algorithm neither approaches them gradually in the course of learning, nor are they guaranteed to show up within a given number of learning steps.

The Maxover algorithm (Wendemuth, 1995) is "robust" in the sense that it will converge regardless of (prior) knowledge of linear separability of the data set. In the linearly separable case, it will solve the training problem – if desired, even with optimal stability (maximum margin between the classes). For non-separable data sets, it will return a solution with a small number of misclassifications. In all cases, the algorithm gradually approaches the solution in the course of learning, without memorizing previous states and without stochastic jumps. Convergence is to global optimality for separable data sets and to local optimality for non-separable data sets.

The Voted Perceptron (Freund and Schapire, 1999), is a variant using multiple weighted perceptrons. The algorithm starts a new perceptron every time an example is wrongly classified, initializing the weights vector with the final weights of the last perceptron. Each perceptron will also be given another weight corresponding to how many examples do they correctly classify before wrongly classifying one, and at the end the output will be a weighted vote on all perceptrons.

In separable problems, perceptron training can also aim at finding the largest separating margin between the classes. The so-called perceptron of optimal stability can be determined by means of iterative training and optimization schemes, such as the Min-Over algorithm (Krauth and Mezard, 1987) or the AdaTron (Anlauf and Biehl, 1989)). AdaTron uses the fact that the corresponding quadratic optimization problem is convex. The perceptron of optimal stability, together with the kernel trick, are the conceptual foundations of the support-vector machine.

The -perceptron further used a pre-processing layer of fixed random weights, with thresholded output units. This enabled the perceptron to classify analogue patterns, by projecting them into a binary space. In fact, for a projection space of sufficiently high dimension, patterns can become linearly separable.

Another way to solve nonlinear problems without using multiple layers is to use higher order networks (sigma-pi unit). In this type of network, each element in the input vector is extended with each pairwise combination of multiplied inputs (second order). This can be extended to an n-order network.

It should be kept in mind, however, that the best classifier is not necessarily that which classifies all the training data perfectly. Indeed, if we had the prior constraint that the data come from equi-variant Gaussian distributions, the linear separation in the input space is optimal, and the nonlinear solution is overfitted.

Other linear classification algorithms include Winnow, support-vector machine, and logistic regression.

Multiclass perceptron

Like most other techniques for training linear classifiers, the perceptron generalizes naturally to multiclass classification. Here, the input and the output are drawn from arbitrary sets. A feature representation function maps each possible input/output pair to a finite-dimensional real-valued feature vector. As before, the feature vector is multiplied by a weight vector , but now the resulting score is used to choose among many possible outputs:

Learning again iterates over the examples, predicting an output for each, leaving the weights unchanged when the predicted output matches the target, and changing them when it does not. The update becomes:

This multiclass feedback formulation reduces to the original perceptron when is a real-valued vector, is chosen from , and .

For certain problems, input/output representations and features can be chosen so that can be found efficiently even though is chosen from a very large or even infinite set.

Since 2002, perceptron training has become popular in the field of natural language processing for such tasks as part-of-speech tagging and syntactic parsing (Collins, 2002). It has also been applied to large-scale machine learning problems in a distributed computing setting.

Wednesday, November 1, 2023

Fuzzy logic

From Wikipedia, the free encyclopedia

Fuzzy logic is a form of many-valued logic in which the truth value of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1.

The term fuzzy logic was introduced with the 1965 proposal of fuzzy set theory by Azerbaijani mathematician Lotfi Zadeh. Fuzzy logic had, however, been studied since the 1920s, as infinite-valued logic—notably by Łukasiewicz and Tarski.

Fuzzy logic is based on the observation that people make decisions based on imprecise and non-numerical information. Fuzzy models or fuzzy sets are mathematical means of representing vagueness and imprecise information (hence the term fuzzy). These models have the capability of recognising, representing, manipulating, interpreting, and using data and information that are vague and lack certainty.

Fuzzy logic has been applied to many fields, from control theory to artificial intelligence.

Overview

Classical logic only permits conclusions that are either true or false. However, there are also propositions with variable answers, such as one might find when asking a group of people to identify a color. In such instances, the truth appears as the result of reasoning from inexact or partial knowledge in which the sampled answers are mapped on a spectrum.

Both degrees of truth and probabilities range between 0 and 1 and hence may seem similar at first, but fuzzy logic uses degrees of truth as a mathematical model of vagueness, while probability is a mathematical model of ignorance.

Applying truth values

A basic application might characterize various sub-ranges of a continuous variable. For instance, a temperature measurement for anti-lock brakes might have several separate membership functions defining particular temperature ranges needed to control the brakes properly. Each function maps the same temperature value to a truth value in the 0 to 1 range. These truth values can then be used to determine how the brakes should be controlled. Fuzzy set theory provides a means for representing uncertainty.

Linguistic variables

In fuzzy logic applications, non-numeric values are often used to facilitate the expression of rules and facts.

A linguistic variable such as age may accept values such as young and its antonym old. Because natural languages do not always contain enough value terms to express a fuzzy value scale, it is common practice to modify linguistic values with adjectives or adverbs. For example, we can use the hedges rather and somewhat to construct the additional values rather old or somewhat young.

Fuzzy systems

Mamdani

The most well-known system is the Mamdani rule-based one. It uses the following rules:

  1. Fuzzify all input values into fuzzy membership functions.
  2. Execute all applicable rules in the rulebase to compute the fuzzy output functions.
  3. De-fuzzify the fuzzy output functions to get "crisp" output values.

Fuzzification

Fuzzification is the process of assigning the numerical input of a system to fuzzy sets with some degree of membership. This degree of membership may be anywhere within the interval [0,1]. If it is 0 then the value does not belong to the given fuzzy set, and if it is 1 then the value completely belongs within the fuzzy set. Any value between 0 and 1 represents the degree of uncertainty that the value belongs in the set. These fuzzy sets are typically described by words, and so by assigning the system input to fuzzy sets, we can reason with it in a linguistically natural manner.

For example, in the image below the meanings of the expressions cold, warm, and hot are represented by functions mapping a temperature scale. A point on that scale has three "truth values"—one for each of the three functions. The vertical line in the image represents a particular temperature that the three arrows (truth values) gauge. Since the red arrow points to zero, this temperature may be interpreted as "not hot"; i.e. this temperature has zero membership in the fuzzy set "hot". The orange arrow (pointing at 0.2) may describe it as "slightly warm" and the blue arrow (pointing at 0.8) "fairly cold". Therefore, this temperature has 0.2 membership in the fuzzy set "warm" and 0.8 membership in the fuzzy set "cold". The degree of membership assigned for each fuzzy set is the result of fuzzification.

Fuzzy logic temperature

Fuzzy sets are often defined as triangle or trapezoid-shaped curves, as each value will have a slope where the value is increasing, a peak where the value is equal to 1 (which can have a length of 0 or greater) and a slope where the value is decreasing. They can also be defined using a sigmoid function. One common case is the standard logistic function defined as

,

which has the following symmetry property

From this it follows that

Fuzzy logic operators

Fuzzy logic works with membership values in a way that mimics Boolean logic. To this end, replacements for basic operators AND, OR, NOT must be available. There are several ways to this. A common replacement is called the Zadeh operators:

Boolean Fuzzy
AND(x,y) MIN(x,y)
OR(x,y) MAX(x,y)
NOT(x) 1 – x

For TRUE/1 and FALSE/0, the fuzzy expressions produce the same result as the Boolean expressions.

There are also other operators, more linguistic in nature, called hedges that can be applied. These are generally adverbs such as very, or somewhat, which modify the meaning of a set using a mathematical formula.

However, an arbitrary choice table does not always define a fuzzy logic function. In the paper (Zaitsev, et al), a criterion has been formulated to recognize whether a given choice table defines a fuzzy logic function and a simple algorithm of fuzzy logic function synthesis has been proposed based on introduced concepts of constituents of minimum and maximum. A fuzzy logic function represents a disjunction of constituents of minimum, where a constituent of minimum is a conjunction of variables of the current area greater than or equal to the function value in this area (to the right of the function value in the inequality, including the function value).

Another set of AND/OR operators is based on multiplication, where

x AND y = x*y
NOT x = 1 - x

Hence, 
x OR y = NOT( AND( NOT(x), NOT(y) ) )
x OR y = NOT( AND(1-x, 1-y) )
x OR y = NOT( (1-x)*(1-y) )
x OR y = 1-(1-x)*(1-y)
x OR y = x+y-xy

Given any two of AND/OR/NOT, it is possible to derive the third. The generalization of AND is an instance of a t-norm.

IF-THEN rules

IF-THEN rules map input or computed truth values to desired output truth values. Example:

IF temperature IS very cold THEN fan_speed is stopped
IF temperature IS cold THEN fan_speed is slow
IF temperature IS warm THEN fan_speed is moderate
IF temperature IS hot THEN fan_speed is high

Given a certain temperature, the fuzzy variable hot has a certain truth value, which is copied to the high variable.

Should an output variable occur in several THEN parts, then the values from the respective IF parts are combined using the OR operator.

Defuzzification

The goal is to get a continuous variable from fuzzy truth values.

This would be easy if the output truth values were exactly those obtained from fuzzification of a given number. Since, however, all output truth values are computed independently, in most cases they do not represent such a set of numbers. One has then to decide for a number that matches best the "intention" encoded in the truth value. For example, for several truth values of fan_speed, an actual speed must be found that best fits the computed truth values of the variables 'slow', 'moderate' and so on.

There is no single algorithm for this purpose.

A common algorithm is

  1. For each truth value, cut the membership function at this value
  2. Combine the resulting curves using the OR operator
  3. Find the center-of-weight of the area under the curve
  4. The x position of this center is then the final output.

Takagi–Sugeno–Kang (TSK)

The TSK system is similar to Mamdani, but the defuzzification process is included in the execution of the fuzzy rules. These are also adapted, so that instead the consequent of the rule is represented through a polynomial function (usually constant or linear). An example of a rule with a constant output would be:

IF temperature IS very cold = 2

In this case, the output will be equal to the constant of the consequent (e.g. 2). In most scenarios we would have an entire rule base, with 2 or more rules. If this is the case, the output of the entire rule base will be the average of the consequent of each rule i (Yi), weighted according to the membership value of its antecedent (hi):

An example of a rule with a linear output would be instead:

IF temperature IS very cold AND humidity IS high = 2 * temperature +  1 * humidity

In this case, the output of the rule will be the result of function in the consequent. The variables within the function represent the membership values after fuzzification, not the crisp values. Same as before, in case we have an entire rule base with 2 or more rules, the total output will be the weighted average between the output of each rule.

The main advantage of using TSK over Mamdani is that it is computationally efficient and works well within other algorithms, such as PID control and with optimization algorithms. It can also guarantee the continuity of the output surface. However, Mamdani is more intuitive and easier to work with by people. Hence, TSK is usually used within other complex methods, such as in adaptive neuro fuzzy inference systems.

Forming a consensus of inputs and fuzzy rules

Since the fuzzy system output is a consensus of all of the inputs and all of the rules, fuzzy logic systems can be well behaved when input values are not available or are not trustworthy. Weightings can be optionally added to each rule in the rulebase and weightings can be used to regulate the degree to which a rule affects the output values. These rule weightings can be based upon the priority, reliability or consistency of each rule. These rule weightings may be static or can be changed dynamically, even based upon the output from other rules.

Applications

Fuzzy logic is used in control systems to allow experts to contribute vague rules such as "if you are close to the destination station and moving fast, increase the train's brake pressure"; these vague rules can then be numerically refined within the system.

Many of the early successful applications of fuzzy logic were implemented in Japan. A first notable application was on the Sendai Subway 1000 series, in which fuzzy logic was able to improve the economy, comfort, and precision of the ride. It has also been used for handwriting recognition in Sony pocket computers, helicopter flight aids, subway system controls, improving automobile fuel efficiency, single-button washing machine controls, automatic power controls in vacuum cleaners, and early recognition of earthquakes through the Institute of Seismology Bureau of Meteorology, Japan.

Artificial intelligence

Neural network-based artificial intelligence and fuzzy logic, when analyzed, are the same thing—the underlying logic of neural networks is fuzzy. A neural network will take a variety of valued inputs, give them different weights in relation to each other, and arrive at a decision, which normally also has a value. Nowhere in that process is there anything like the sequences of either-or decisions which characterize non-fuzzy mathematics, almost all of computer programming, and digital electronics. In the 1980s, researchers were divided about the most effective approach to machine learning: deductive models or neural networks. The former approach requires large decision trees and uses binary logic, matching the hardware on which it runs. The physical devices might be limited to binary logic, but AI can use software for its calculations. Neural networks take this approach, which results in more accurate models of complex situations. Neural networks soon found their way onto a multitude of electronic devices.

Medical decision making

Fuzzy logic is an important concept in medical decision making. Since medical and healthcare data can be subjective or fuzzy, applications in this domain have a great potential to benefit a lot by using fuzzy-logic-based approaches.

Fuzzy logic can be used in many different aspects within the medical decision making framework. Such aspects include in medical image analysis, biomedical signal analysis, segmentation of images or signals, and feature extraction / selection of images or signals.

The biggest question in this application area is how much useful information can be derived when using fuzzy logic. A major challenge is how to derive the required fuzzy data. This is even more challenging when one has to elicit such data from humans (usually, patients). As has been said

"The envelope of what can be achieved and what cannot be achieved in medical diagnosis, ironically, is itself a fuzzy one"

— Seven Challenges, 2019.

How to elicit fuzzy data, and how to validate the accuracy of the data is still an ongoing effort, strongly related to the application of fuzzy logic. The problem of assessing the quality of fuzzy data is a difficult one. This is why fuzzy logic is a highly promising possibility within the medical decision making application area but still requires more research to achieve its full potential. Although the concept of using fuzzy logic in medical decision making is exciting, there are still several challenges that fuzzy approaches face within the medical decision making framework.

Image-based computer-aided diagnosis

One of the common application areas of fuzzy logic is image-based computer-aided diagnosis in medicine. Computer-aided diagnosis is a computerized set of inter-related tools that can be used to aid physicians in their diagnostic decision-making. For example, when a physician finds a lesion that is abnormal but still at a very early stage of development he/she may use computer-aided diagnosis to characterize the lesion and diagnose its nature. Fuzzy logic can be highly appropriate to describe key characteristics of this lesion.

Fuzzy databases

Once fuzzy relations are defined, it is possible to develop fuzzy relational databases. The first fuzzy relational database, FRDB, appeared in Maria Zemankova's dissertation (1983). Later, some other models arose like the Buckles-Petry model, the Prade-Testemale Model, the Umano-Fukami model or the GEFRED model by J. M. Medina, M. A. Vila et al.

Fuzzy querying languages have been defined, such as the SQLf by P. Bosc et al. and the FSQL by J. Galindo et al. These languages define some structures in order to include fuzzy aspects in the SQL statements, like fuzzy conditions, fuzzy comparators, fuzzy constants, fuzzy constraints, fuzzy thresholds, linguistic labels etc.

Logical analysis

In mathematical logic, there are several formal systems of "fuzzy logic", most of which are in the family of t-norm fuzzy logics.

Propositional fuzzy logics

The most important propositional fuzzy logics are:

  • Monoidal t-norm-based propositional fuzzy logic MTL is an axiomatization of logic where conjunction is defined by a left continuous t-norm and implication is defined as the residuum of the t-norm. Its models correspond to MTL-algebras that are pre-linear commutative bounded integral residuated lattices.
  • Basic propositional fuzzy logic BL is an extension of MTL logic where conjunction is defined by a continuous t-norm, and implication is also defined as the residuum of the t-norm. Its models correspond to BL-algebras.
  • Łukasiewicz fuzzy logic is the extension of basic fuzzy logic BL where standard conjunction is the Łukasiewicz t-norm. It has the axioms of basic fuzzy logic plus an axiom of double negation, and its models correspond to MV-algebras.
  • Gödel fuzzy logic is the extension of basic fuzzy logic BL where conjunction is the Gödel t-norm (that is, minimum). It has the axioms of BL plus an axiom of idempotence of conjunction, and its models are called G-algebras.
  • Product fuzzy logic is the extension of basic fuzzy logic BL where conjunction is the product t-norm. It has the axioms of BL plus another axiom for cancellativity of conjunction, and its models are called product algebras.
  • Fuzzy logic with evaluated syntax (sometimes also called Pavelka's logic), denoted by EVŁ, is a further generalization of mathematical fuzzy logic. While the above kinds of fuzzy logic have traditional syntax and many-valued semantics, in EVŁ syntax is also evaluated. This means that each formula has an evaluation. Axiomatization of EVŁ stems from Łukasziewicz fuzzy logic. A generalization of the classical Gödel completeness theorem is provable in EVŁ.

Predicate fuzzy logics

Similar to the way predicate logic is created from propositional logic, predicate fuzzy logics extend fuzzy systems by universal and existential quantifiers. The semantics of the universal quantifier in t-norm fuzzy logics is the infimum of the truth degrees of the instances of the quantified subformula, while the semantics of the existential quantifier is the supremum of the same.

Decidability Issues

The notions of a "decidable subset" and "recursively enumerable subset" are basic ones for classical mathematics and classical logic. Thus the question of a suitable extension of them to fuzzy set theory is a crucial one. The first proposal in such a direction was made by E. S. Santos by the notions of fuzzy Turing machine, Markov normal fuzzy algorithm and fuzzy program (see Santos 1970). Successively, L. Biacino and G. Gerla argued that the proposed definitions are rather questionable. For example, in  one shows that the fuzzy Turing machines are not adequate for fuzzy language theory since there are natural fuzzy languages intuitively computable that cannot be recognized by a fuzzy Turing Machine. Then they proposed the following definitions. Denote by Ü the set of rational numbers in [0,1]. Then a fuzzy subset s : S  [0,1] of a set S is recursively enumerable if a recursive map h : S×N Ü exists such that, for every x in S, the function h(x,n) is increasing with respect to n and s(x) = lim h(x,n). We say that s is decidable if both s and its complement –s are recursively enumerable. An extension of such a theory to the general case of the L-subsets is possible (see Gerla 2006). The proposed definitions are well related to fuzzy logic. Indeed, the following theorem holds true (provided that the deduction apparatus of the considered fuzzy logic satisfies some obvious effectiveness property).

Any "axiomatizable" fuzzy theory is recursively enumerable. In particular, the fuzzy set of logically true formulas is recursively enumerable in spite of the fact that the crisp set of valid formulas is not recursively enumerable, in general. Moreover, any axiomatizable and complete theory is decidable.

It is an open question to give support for a "Church thesis" for fuzzy mathematics, the proposed notion of recursive enumerability for fuzzy subsets is the adequate one. In order to solve this, an extension of the notions of fuzzy grammar and fuzzy Turing machine are necessary. Another open question is to start from this notion to find an extension of Gödel's theorems to fuzzy logic.

Compared to other logics

Probability

Fuzzy logic and probability address different forms of uncertainty. While both fuzzy logic and probability theory can represent degrees of certain kinds of subjective belief, fuzzy set theory uses the concept of fuzzy set membership, i.e., how much an observation is within a vaguely defined set, and probability theory uses the concept of subjective probability, i.e., frequency of occurrence or likelihood of some event or condition. The concept of fuzzy sets was developed in the mid-twentieth century at Berkeley as a response to the lack of a probability theory for jointly modelling uncertainty and vagueness.

Bart Kosko claims in Fuzziness vs. Probability that probability theory is a subtheory of fuzzy logic, as questions of degrees of belief in mutually-exclusive set membership in probability theory can be represented as certain cases of non-mutually-exclusive graded membership in fuzzy theory. In that context, he also derives Bayes' theorem from the concept of fuzzy subsethood. Lotfi A. Zadeh argues that fuzzy logic is different in character from probability, and is not a replacement for it. He fuzzified probability to fuzzy probability and also generalized it to possibility theory.

More generally, fuzzy logic is one of many different extensions to classical logic intended to deal with issues of uncertainty outside of the scope of classical logic, the inapplicability of probability theory in many domains, and the paradoxes of Dempster–Shafer theory.

Ecorithms

Computational theorist Leslie Valiant uses the term ecorithms to describe how many less exact systems and techniques like fuzzy logic (and "less robust" logic) can be applied to learning algorithms. Valiant essentially redefines machine learning as evolutionary. In general use, ecorithms are algorithms that learn from their more complex environments (hence eco-) to generalize, approximate and simplify solution logic. Like fuzzy logic, they are methods used to overcome continuous variables or systems too complex to completely enumerate or understand discretely or exactly. Ecorithms and fuzzy logic also have the common property of dealing with possibilities more than probabilities, although feedback and feed forward, basically stochastic weights, are a feature of both when dealing with, for example, dynamical systems.

Gödel G logic

Another logical system where truth values are real numbers between 0 and 1 and where AND & OR operators are replaced with MIN and MAX is Gödel's G logic. This logic has many similarities with fuzzy logic but defines negation differently and has an internal implication. Negation and implication are defined as follows:

which turns the resulting logical system into a model for intuitionistic logic, making it particularly well-behaved among all possible choices of logical systems with real numbers between 0 and 1 as truth values. In this case, implication may be interpreted as "x is less true than y" and negation as "x is less true than 0" or "x is strictly false", and for any and , we have that . In particular, in Gödel logic negation is no longer an involution and double negation maps any nonzero value to 1.

Compensatory fuzzy logic

Compensatory fuzzy logic (CFL) is a branch of fuzzy logic with modified rules for conjunction and disjunction. When the truth value of one component of a conjunction or disjunction is increased or decreased, the other component is decreased or increased to compensate. This increase or decrease in truth value may be offset by the increase or decrease in another component. An offset may be blocked when certain thresholds are met. Proponents claim that CFL allows for better computational semantic behaviors and mimic natural language.

According to Jesús Cejas Montero (2011) The Compensatory fuzzy logic consists of four continuous operators: conjunction (c); disjunction (d); fuzzy strict order (or); and negation (n). The conjunction is the geometric mean and its dual as conjunctive and disjunctive operators.

Markup language standardization

The IEEE 1855, the IEEE STANDARD 1855–2016, is about a specification language named Fuzzy Markup Language (FML) developed by the IEEE Standards Association. FML allows modelling a fuzzy logic system in a human-readable and hardware independent way. FML is based on eXtensible Markup Language (XML). The designers of fuzzy systems with FML have a unified and high-level methodology for describing interoperable fuzzy systems. IEEE STANDARD 1855–2016 uses the W3C XML Schema definition language to define the syntax and semantics of the FML programs.

Prior to the introduction of FML, fuzzy logic practitioners could exchange information about their fuzzy algorithms by adding to their software functions the ability to read, correctly parse, and store the result of their work in a form compatible with the Fuzzy Control Language (FCL) described and specified by Part 7 of IEC 61131.

Intelligent agent

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Intelligent_agent

In artificial intelligence, an intelligent agent (IA) is an agent acting in an intelligent manner; It perceives its environment, takes actions autonomously in order to achieve goals, and may improve its performance with learning or acquiring knowledge. An intelligent agent may be simple or complex: A thermostat or other control system is considered an example of an intelligent agent, as is a human being, as is any system that meets the definition, such as a firm, a state, or a biome.

Simple reflex agent diagram

Leading AI textbooks define "artificial intelligence" as the "study and design of intelligent agents", a definition that considers goal-directed behavior to be the essence of intelligence. Goal-directed agents are also described using a term borrowed from economics, "rational agent".

An agent has an "objective function" that encapsulates all the IA's goals. Such an agent is designed to create and execute whatever plan will, upon completion, maximize the expected value of the objective function. For example, a reinforcement learning agent has a "reward function" that allows the programmers to shape the IA's desired behavior, and an evolutionary algorithm's behavior is shaped by a "fitness function".

Intelligent agents in artificial intelligence are closely related to agents in economics, and versions of the intelligent agent paradigm are studied in cognitive science, ethics, the philosophy of practical reason, as well as in many interdisciplinary socio-cognitive modeling and computer social simulations.

Intelligent agents are often described schematically as an abstract functional system similar to a computer program. Abstract descriptions of intelligent agents are called abstract intelligent agents (AIA) to distinguish them from their real-world implementations. An autonomous intelligent agent is designed to function in the absence of human intervention. Intelligent agents are also closely related to software agents (an autonomous computer program that carries out tasks on behalf of users).

As a definition of artificial intelligence

Artificial Intelligence: A Modern Approach defines an "agent" as

"Anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators"

It defines a "rational agent" as:

"An agent that acts so as to maximize the expected value of a performance measure based on past experience and knowledge."

It also defines the field of "artificial intelligence research" as:

"The study and design of rational agents"

Padgham & Winikoff (2005) agree that an intelligent agent is situated in an environment and responds in a timely (though not necessarily real-time) manner to changes in the environment. However, intelligent agents must also proactively pursue goals in a flexible and robust way. Optional desiderata include that the agent be rational, and that the agent be capable of belief-desire-intention analysis.

Kaplan and Haenlein define artificial intelligence as "A system's ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation." This definition is closely related to that of an intelligent agent.

Advantages

Philosophically, this definition of artificial intelligence avoids several lines of criticism. Unlike the Turing test, it does not refer to human intelligence in any way. Thus, there is no need to discuss if it is "real" vs "simulated" intelligence (i.e., "synthetic" vs "artificial" intelligence) and does not indicate that such a machine has a mind, consciousness or true understanding (i.e., it does not imply John Searle's "strong AI hypothesis"). It also doesn't attempt to draw a sharp dividing line between behaviors that are "intelligent" and behaviors that are "unintelligent"—programs need only be measured in terms of their objective function.

More importantly, it has a number of practical advantages that have helped move AI research forward. It provides a reliable and scientific way to test programs; researchers can directly compare or even combine different approaches to isolated problems, by asking which agent is best at maximizing a given "goal function". It also gives them a common language to communicate with other fields—such as mathematical optimization (which is defined in terms of "goals") or economics (which uses the same definition of a "rational agent").

Objective function

An agent that is assigned an explicit "goal function" is considered more intelligent if it consistently takes actions that successfully maximize its programmed goal function. The goal can be simple ("1 if the IA wins a game of Go, 0 otherwise") or complex ("Perform actions mathematically similar to ones that succeeded in the past"). The "goal function" encapsulates all of the goals the agent is driven to act on; in the case of rational agents, the function also encapsulates the acceptable trade-offs between accomplishing conflicting goals. (Terminology varies; for example, some agents seek to maximize or minimize a "utility function", "objective function", or "loss function".)

Goals can be explicitly defined or induced. If the AI is programmed for "reinforcement learning", it has a "reward function" that encourages some types of behavior and punishes others. Alternatively, an evolutionary system can induce goals by using a "fitness function" to mutate and preferentially replicate high-scoring AI systems, similar to how animals evolved to innately desire certain goals such as finding food. Some AI systems, such as nearest-neighbor, instead of reason by analogy, these systems are not generally given goals, except to the degree that goals are implicit in their training data. Such systems can still be benchmarked if the non-goal system is framed as a system whose "goal" is to accomplish its narrow classification task.

Systems that are not traditionally considered agents, such as knowledge-representation systems, are sometimes subsumed into the paradigm by framing them as agents that have a goal of (for example) answering questions as accurately as possible; the concept of an "action" is here extended to encompass the "act" of giving an answer to a question. As an additional extension, mimicry-driven systems can be framed as agents who are optimizing a "goal function" based on how closely the IA succeeds in mimicking the desired behavior. In the generative adversarial networks of the 2010s, an "encoder"/"generator" component attempts to mimic and improvise human text composition. The generator is attempting to maximize a function encapsulating how well it can fool an antagonistic "predictor"/"discriminator" component.

While symbolic AI systems often accept an explicit goal function, the paradigm can also be applied to neural networks and to evolutionary computing. Reinforcement learning can generate intelligent agents that appear to act in ways intended to maximize a "reward function". Sometimes, rather than setting the reward function to be directly equal to the desired benchmark evaluation function, machine learning programmers will use reward shaping to initially give the machine rewards for incremental progress in learning. Yann LeCun stated in 2018 that "Most of the learning algorithms that people have come up with essentially consist of minimizing some objective function." AlphaZero chess had a simple objective function; each win counted as +1 point, and each loss counted as -1 point. An objective function for a self-driving car would have to be more complicated. Evolutionary computing can evolve intelligent agents that appear to act in ways intended to maximize a "fitness function" that influences how many descendants each agent is allowed to leave.

The theoretical and uncomputable AIXI design is a maximally intelligent agent in this paradigm; however, in the real world, the IA is constrained by finite time and hardware resources, and scientists compete to produce algorithms that can achieve progressively higher scores on benchmark tests with real-world hardware.

Classes of intelligent agents

Russel and Norvig's classification

Russell & Norvig (2003) group agents into five classes based on their degree of perceived intelligence and capability:

Simple reflex agents

Simple reflex agent

Simple reflex agents act only on the basis of the current percept, ignoring the rest of the percept history. The agent function is based on the condition-action rule: "if condition, then action".

This agent function only succeeds when the environment is fully observable. Some reflex agents can also contain information on their current state which allows them to disregard conditions whose actuators are already triggered.

Infinite loops are often unavoidable for simple reflex agents operating in partially observable environments. If the agent can randomize its actions, it may be possible to escape from infinite loops.

Model-based reflex agents

Model-based reflex agent

A model-based agent can handle partially observable environments. Its current state is stored inside the agent maintaining some kind of structure that describes the part of the world which cannot be seen. This knowledge about "how the world works" is called a model of the world, hence the name "model-based agent".

A model-based reflex agent should maintain some sort of internal model that depends on the percept history and thereby reflects at least some of the unobserved aspects of the current state. Percept history and impact of action on the environment can be determined by using the internal model. It then chooses an action in the same way as reflex agent.

An agent may also use models to describe and predict the behaviors of other agents in the environment.

Goal-based agents

Model-based, goal-based agent

Goal-based agents further expand on the capabilities of the model-based agents, by using "goal" information. Goal information describes situations that are desirable. This provides the agent a way to choose among multiple possibilities, selecting the one which reaches a goal state. Search and planning are the subfields of artificial intelligence devoted to finding action sequences that achieve the agent's goals.

Utility-based agents

Model-based, utility-based agent

Goal-based agents only distinguish between goal states and non-goal states. It is also possible to define a measure of how desirable a particular state is. This measure can be obtained through the use of a utility function which maps a state to a measure of the utility of the state. A more general performance measure should allow a comparison of different world states according to how well they satisfied the agent's goals. The term utility can be used to describe how "happy" the agent is.


A rational utility-based agent chooses the action that maximizes the expected utility of the action outcomes - that is, what the agent expects to derive, on average, given the probabilities and utilities of each outcome. A utility-based agent has to model and keep track of its environment, tasks that have involved a great deal of research on perception, representation, reasoning, and learning.

Learning agents

A general learning agent

Learning has the advantage that it allows the agents to initially operate in unknown environments and to become more competent than its initial knowledge alone might allow. The most important distinction is between the "learning element", which is responsible for making improvements, and the "performance element", which is responsible for selecting external actions.

The learning element uses feedback from the "critic" on how the agent is doing and determines how the performance element, or "actor", should be modified to do better in the future. The performance element is what we have previously considered to be the entire agent: it takes in percepts and decides on actions.

The last component of the learning agent is the "problem generator". It is responsible for suggesting actions that will lead to new and informative experiences.

Weiss's classification

Weiss (2013) defines four classes of agents:

  • Logic-based agents – in which the decision about what action to perform is made via logical deduction.
  • Reactive agents – in which decision making is implemented in some form of direct mapping from situation to action.
  • Belief-desire-intention agents – in which decision making depends upon the manipulation of data structures representing the beliefs, desires, and intentions of the agent; and finally,
  • Layered architectures – in which decision making is realized via various software layers, each of which is more or less explicitly reasoning about the environment at different levels of abstraction.

Other

In 2013, Alexander Wissner-Gross published a theory pertaining to Freedom and Intelligence for intelligent agents.

Hierarchies of agents

To actively perform their functions, Intelligent Agents today are normally gathered in a hierarchical structure containing many “sub-agents”. Intelligent sub-agents process and perform lower-level functions. Taken together, the intelligent agent and sub-agents create a complete system that can accomplish difficult tasks or goals with behaviors and responses that display a form of intelligence.

Generally, an agent can be constructed by separating the body into the sensors and actuators, and so that it operates with a complex perception system that takes the description of the world as input for a controller and outputs commands to the actuator. However, a hierarchy of controller layers is often necessary to balance the immediate reaction desired for low-level tasks and the slow reasoning about complex, high-level goals.

Agent function

A simple agent program can be defined mathematically as a function f (called the "agent function") which maps every possible percepts sequence to a possible action the agent can perform or to a coefficient, feedback element, function or constant that affects eventual actions:

Agent function is an abstract concept as it could incorporate various principles of decision making like calculation of utility of individual options, deduction over logic rules, fuzzy logic, etc.

The program agent, instead, maps every possible percept to an action.

We use the term percept to refer to the agent's perceptional inputs at any given instant. In the following figures, an agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.

Applications

Hallerbach et al. discussed the application of agent-based approaches for the development and validation of automated driving systems via a digital twin of the vehicle-under-test and microscopic traffic simulation based on independent agents. Waymo has created a multi-agent simulation environment Carcraft to test algorithms for self-driving cars. It simulates traffic interactions between human drivers, pedestrians and automated vehicles. People's behavior is imitated by artificial agents based on data of real human behavior. The basic idea of using agent-based modeling to understand self-driving cars was discussed as early as 2003.

Alternative definitions and uses

"Intelligent agent" is also often used as a vague marketing term, sometimes synonymous with "virtual personal assistant". Some 20th-century definitions characterize an agent as a program that aids a user or that acts on behalf of a user. These examples are known as software agents, and sometimes an "intelligent software agent" (that is, a software agent with intelligence) is referred to as an "intelligent agent".

According to Nikola Kasabov, IA systems should exhibit the following characteristics:

  • Accommodate new problem solving rules incrementally
  • Adapt online and in real time
  • Are able to analyze themselves in terms of behavior, error and success.
  • Learn and improve through interaction with the environment (embodiment)
  • Learn quickly from large amounts of data
  • Have memory-based exemplar storage and retrieval capacities
  • Have parameters to represent short- and long-term memory, age, forgetting, etc.

Bayesian inference

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Bayesian_inference Bayesian inference ( / ...