Search This Blog

Monday, January 29, 2024

Propositional calculus

From Wikipedia, the free encyclopedia
 
Propositional calculus is a branch of logic. It is also called propositional logic, statement logic, sentential calculus, sentential logic, or sometimes zeroth-order logic. It deals with propositions (which can be true or false) and relations between propositions, including the construction of arguments based on them. Compound propositions are formed by connecting propositions by logical connectives. Propositions that contain no logical connectives are called atomic propositions.

Unlike first-order logic, propositional logic does not deal with non-logical objects, predicates about them, or quantifiers. However, all the machinery of propositional logic is included in first-order logic and higher-order logics. In this sense, propositional logic is the foundation of first-order logic and higher-order logic.

Explanation

Logical connectives are found in natural languages. In English, some examples are "and" (conjunction), "or" (disjunction), "not" (negation) and "if" (but only when used to denote material conditional).

The following is an example of a very simple inference within the scope of propositional logic:

Premise 1: If it's raining then it's cloudy.
Premise 2: It's raining.
Conclusion: It's cloudy.

Both premises and the conclusion are propositions. The premises are taken for granted, and with the application of modus ponens (an inference rule), the conclusion follows.

As propositional logic is not concerned with the structure of propositions beyond the point where they cannot be decomposed any more by logical connectives, this inference can be restated replacing those atomic statements with statement letters, which are interpreted as variables representing statements:

Premise 1:
Premise 2:
Conclusion:

The same can be stated succinctly in the following way:

When P is interpreted as "It's raining" and Q as "it's cloudy" the above symbolic expressions can be seen to correspond exactly with the original expression in natural language. Not only that, but they will also correspond with any other inference of this form, which will be valid on the same basis this inference is.

Propositional logic may be studied through a formal system in which formulas of a formal language may be interpreted to represent propositions. A system of axioms and inference rules allows certain formulas to be derived. These derived formulas are called theorems and may be interpreted to be true propositions. A constructed sequence of such formulas is known as a derivation or proof and the last formula of the sequence is the theorem. The derivation may be interpreted as proof of the proposition represented by the theorem.

When a formal system is used to represent formal logic, only statement letters (usually capital roman letters such as , and ) are represented directly. The natural language propositions that arise when they're interpreted are outside the scope of the system, and the relation between the formal system and its interpretation is likewise outside the formal system itself.

In classical truth-functional propositional logic, formulas are interpreted as having precisely one of two possible truth values, the truth value of true or the truth value of false.[1] The principle of bivalence and the law of excluded middle are upheld. Truth-functional propositional logic defined as such and systems isomorphic to it are considered to be zeroth-order logic. However, alternative propositional logics are also possible. For more, see Other logical calculi below.

History

Although propositional logic (which is interchangeable with propositional calculus) had been hinted by earlier philosophers, it was developed into a formal logic (Stoic logic) by Chrysippus in the 3rd century BC and expanded by his successor Stoics. The logic was focused on propositions. This advancement was different from the traditional syllogistic logic, which was focused on terms. However, most of the original writings were lost and the propositional logic developed by the Stoics was no longer understood later in antiquity. Consequently, the system was essentially reinvented by Peter Abelard in the 12th century.

Propositional logic was eventually refined using symbolic logic. The 17th/18th-century mathematician Gottfried Leibniz has been credited with being the founder of symbolic logic for his work with the calculus ratiocinator. Although his work was the first of its kind, it was unknown to the larger logical community. Consequently, many of the advances achieved by Leibniz were recreated by logicians like George Boole and Augustus De Morgan—completely independent of Leibniz.

Just as propositional logic can be considered an advancement from the earlier syllogistic logic, Gottlob Frege's predicate logic can be also considered an advancement from the earlier propositional logic. One author describes predicate logic as combining "the distinctive features of syllogistic logic and propositional logic." Consequently, predicate logic ushered in a new era in logic's history; however, advances in propositional logic were still made after Frege, including natural deduction, truth trees and truth tables. Natural deduction was invented by Gerhard Gentzen and Stanisław Jaśkowski. Truth trees were invented by Evert Willem Beth. The invention of truth tables, however, is of uncertain attribution.

Within works by Frege and Bertrand Russell, are ideas influential to the invention of truth tables. The actual tabular structure (being formatted as a table), itself, is generally credited to either Ludwig Wittgenstein or Emil Post (or both, independently). Besides Frege and Russell, others credited with having ideas preceding truth tables include Philo, Boole, Charles Sanders Peirce, and Ernst Schröder. Others credited with the tabular structure include Jan Łukasiewicz, Alfred North Whitehead, William Stanley Jevons, John Venn, and Clarence Irving Lewis. Ultimately, some have concluded, like John Shosky, that "It is far from clear that any one person should be given the title of 'inventor' of truth-tables."

Terminology

In general terms, a calculus is a formal system that consists of a set of syntactic expressions (well-formed formulas), a distinguished subset of these expressions (axioms), plus a set of formal rules that define a specific binary relation, intended to be interpreted as logical equivalence, on the space of expressions.

When the formal system is intended to be a logical system, the expressions are meant to be interpreted as statements, and the rules, known to be inference rules, are typically intended to be truth-preserving. In this setting, the rules, which may include axioms, can then be used to derive ("infer") formulas representing true statements—from given formulas representing true statements.

The set of axioms may be empty, a nonempty finite set, or a countably infinite set (see axiom schema). A formal grammar recursively defines the expressions and well-formed formulas of the language. In addition a semantics may be given which defines truth and valuations (or interpretations).

The language of a propositional calculus consists of:

  1. a set of primitive symbols, variously referred to as atomic formulas, placeholders, proposition letters, or variables, and
  2. a set of operator symbols, variously interpreted as logical operators or logical connectives.

A well-formed formula is any atomic formula, or any formula that can be built up from atomic formulas by means of operator symbols according to the rules of the grammar.

Mathematicians sometimes distinguish between propositional constants, propositional variables, and schemata. Propositional constants represent some particular proposition, while propositional variables range over the set of all atomic propositions. Schemata, however, range over all propositions. It is common to represent propositional constants by A, B, and C, propositional variables by P, Q, and R, and schematic letters are often Greek letters, most often φ, ψ, and χ.

Basic concepts

The following outlines a standard propositional calculus. Many different formulations exist which are all more or less equivalent, but differ in the details of:

  1. their language (i.e., the particular collection of primitive symbols and operator symbols),
  2. the set of axioms, or distinguished formulas, and
  3. the set of inference rules.

Any given proposition may be represented with a letter called a 'propositional constant', analogous to representing a number by a letter in mathematics (e.g., a = 5). All propositions require exactly one of two truth-values: true or false. For example, let P be the proposition that it is raining outside. This will be true (P) if it is raining outside, and false otherwise (¬P).

  • We then define truth-functional operators, beginning with negation. ¬P represents the negation of P, which can be thought of as the denial of P. In the example above, ¬P expresses that it is not raining outside, or by a more standard reading: "It is not the case that it is raining outside." When P is true, ¬P is false; and when P is false, ¬P is true. As a result, ¬ ¬P always has the same truth-value as P.
  • Conjunction is a truth-functional connective which forms a proposition out of two simpler propositions, for example, P and Q. The conjunction of P and Q is written PQ, and expresses that each are true. We read PQ as "P and Q". For any two propositions, there are four possible assignments of truth values:
    1. P is true and Q is true
    2. P is true and Q is false
    3. P is false and Q is true
    4. P is false and Q is false
The conjunction of P and Q is true in case 1, and is false otherwise. Where P is the proposition that it is raining outside and Q is the proposition that a cold-front is over Kansas, PQ is true when it is raining outside and there is a cold-front over Kansas. If it is not raining outside, then P ∧ Q is false; and if there is no cold-front over Kansas, then PQ is also false.
  • Disjunction resembles conjunction in that it forms a proposition out of two simpler propositions. We write it PQ, and it is read "P or Q". It expresses that either P or Q is true. Thus, in the cases listed above, the disjunction of P with Q is true in all cases—except case 4. Using the example above, the disjunction expresses that it is either raining outside, or there is a cold front over Kansas. (Note, this use of disjunction is supposed to resemble the use of the English word "or". However, it is most like the English inclusive "or", which can be used to express the truth of at least one of two propositions. It is not like the English exclusive "or", which expresses the truth of exactly one of two propositions. In other words, the exclusive "or" is false when both P and Q are true (case 1), and similarly is false when both P and Q are false (case 4). An example of the exclusive or is: You may keep a cake (for later) or you may eat it all now, but you cannot both eat it all now and keep it for later. Often in natural language, given the appropriate context, the addendum "but not both" is omitted—but implied. In mathematics, however, "or" is always inclusive or; if exclusive or is meant it will be specified, possibly by "xor".)
  • Material conditional also joins two simpler propositions, and we write PQ, which is read "if P then Q". The proposition to the left of the arrow is called the antecedent, and the proposition to the right is called the consequent. (There is no such designation for conjunction or disjunction, since they are commutative operations.) It expresses that Q is true whenever P is true. Thus PQ is true in every case above except case 2, because this is the only case when P is true but Q is not. Using the example, if P then Q expresses that if it is raining outside, then there is a cold-front over Kansas. The material conditional is often confused with physical causation. The material conditional, however, only relates two propositions by their truth-values—which is not the relation of cause and effect. It is contentious in the literature whether the material implication represents logical causation.
  • Biconditional joins two simpler propositions, and we write PQ, which is read "P if and only if Q". It expresses that P and Q have the same truth-value, and in cases 1 and 4. 'P is true if and only if Q' is true, and is false otherwise.

It is very helpful to look at the truth tables for these different operators, as well as the method of analytic tableaux.

Closure under operations

Propositional logic is closed under truth-functional connectives. That is to say, for any proposition φ, ¬φ is also a proposition. Likewise, for any propositions φ and ψ, φψ is a proposition, and similarly for disjunction, conditional, and biconditional. This implies that, for instance, φψ is a proposition, and so it can be conjoined with another proposition. In order to represent this, we need to use parentheses to indicate which proposition is conjoined with which. For instance, PQR is not a well-formed formula, because we do not know if we are conjoining PQ with R or if we are conjoining P with QR. Thus we must write either (PQ) ∧ R to represent the former, or P ∧ (QR) to represent the latter. By evaluating the truth conditions, we see that both expressions have the same truth conditions (will be true in the same cases), and moreover that any proposition formed by arbitrary conjunctions will have the same truth conditions, regardless of the location of the parentheses. This means that conjunction is associative, however, one should not assume that parentheses never serve a purpose. For instance, the sentence P ∧ (QR) does not have the same truth conditions of (PQ) ∨ R, so they are different sentences distinguished only by the parentheses. One can verify this by the truth-table method referenced above.

Note: For any arbitrary number of propositional constants, we can form a finite number of cases which list their possible truth-values. A simple way to generate this is by truth-tables, in which one writes P, Q, ..., Z, for any list of k propositional constants—that is to say, any list of propositional constants with k entries. Below this list, one writes 2k rows, and below P one fills in the first half of the rows with true (or T) and the second half with false (or F). Below Q one fills in one-quarter of the rows with T, then one-quarter with F, then one-quarter with T and the last quarter with F. The next column alternates between true and false for each eighth of the rows, then sixteenths, and so on, until the last propositional constant varies between T and F for each row. This will give a complete listing of cases or truth-value assignments possible for those propositional constants.

Argument

The propositional calculus then defines an argument to be a list of propositions. A valid argument is a list of propositions, the last of which follows from—or is implied by—the rest. All other arguments are invalid. The simplest valid argument is modus ponens, one instance of which is the following list of propositions:

This is a list of three propositions, each line is a proposition, and the last follows from the rest. The first two lines are called premises, and the last line the conclusion. We say that any proposition C follows from any set of propositions , if C must be true whenever every member of the set is true. In the argument above, for any P and Q, whenever PQ and P are true, necessarily Q is true. Notice that, when P is true, we cannot consider cases 3 and 4 (from the truth table). When PQ is true, we cannot consider case 2. This leaves only case 1, in which Q is also true. Thus Q is implied by the premises.

This generalizes schematically. Thus, where φ and ψ may be any propositions at all,

Other argument forms are convenient, but not necessary. Given a complete set of axioms (see below for one such set), modus ponens is sufficient to prove all other argument forms in propositional logic, thus they may be considered to be a derivative. Note, this is not true of the extension of propositional logic to other logics like first-order logic. First-order logic requires at least one additional rule of inference in order to obtain completeness.

The significance of argument in formal logic is that one may obtain new truths from established truths. In the first example above, given the two premises, the truth of Q is not yet known or stated. After the argument is made, Q is deduced. In this way, we define a deduction system to be a set of all propositions that may be deduced from another set of propositions. For instance, given the set of propositions , we can define a deduction system, Γ, which is the set of all propositions which follow from A. Reiteration is always assumed, so . Also, from the first element of A, last element, as well as modus ponens, R is a consequence, and so . Because we have not included sufficiently complete axioms, though, nothing else may be deduced. Thus, even though most deduction systems studied in propositional logic are able to deduce , this one is too weak to prove such a proposition.

Generic description of a propositional calculus

A propositional calculus is a formal system , where:

  • The alpha set is a countably infinite set of elements called proposition symbols or propositional variables. Syntactically speaking, these are the most basic elements of the formal language , otherwise referred to as atomic formulas or terminal elements. In the examples to follow, the elements of are typically the letters p, q, r, and so on.
  • The omega set Ω is a finite set of elements called operator symbols or logical connectives. The set Ω is partitioned into disjoint subsets as follows:

    In this partition, is the set of operator symbols of arity j.

    In the more familiar propositional calculi, Ω is typically partitioned as follows:

    A frequently adopted convention treats the constant logical values as operators of arity zero, thus:

    Some writers use the tilde (~), or N, instead of ¬; and some use v instead of as well as the ampersand (&), the prefixed K, or instead of . Notation varies even more for the set of logical values, with symbols like {false, true}, {F, T}, or {0, 1} all being seen in various contexts instead of .
  • The zeta set is a finite set of transformation rules that are called inference rules when they acquire logical applications.
  • The iota set is a countable set of initial points that are called axioms when they receive logical interpretations.

The language of , also known as its set of formulas, well-formed formulas, is inductively defined by the following rules:

  1. Base: Any element of the alpha set is a formula of .
  2. If are formulas and is in , then is a formula.
  3. Closed: Nothing else is a formula of .

Repeated applications of these rules permits the construction of complex formulas. For example:

  • By rule 1, p is a formula.
  • By rule 2, is a formula.
  • By rule 1, q is a formula.
  • By rule 2, is a formula.

Example 1. Simple axiom system

Let , where , , , are defined as follows:

  • The set , the countably infinite set of symbols that serve to represent logical propositions:
  • The functionally complete set of logical operators (logical connectives and negation) is as follows. Of the three connectives for conjunction, disjunction, and implication (, and ), one can be taken as primitive and the other two can be defined in terms of it and negation (¬). Alternatively, all of the logical operators may be defined in terms of a sole sufficient operator, such as the Sheffer stroke (nand). The biconditional () can of course be defined in terms of conjunction and implication as .
    Adopting negation and implication as the two primitive operations of a propositional calculus is tantamount to having the omega set partition as follows:

Then is defined as , and is defined as .

  • The set (the set of initial points of logical deduction, i.e., logical axioms) is the axiom system proposed by Jan Łukasiewicz, and used as the propositional-calculus part of a Hilbert system. The axioms are all substitution instances of:
  • The set of transformation rules (rules of inference) is the sole rule modus ponens (i.e., from any formulas of the form and , infer ).

This system is used in Metamath set.mm formal proof database.

Example 2. Natural deduction system

Let , where , , , are defined as follows:

  • The alpha set , is a countably infinite set of symbols, for example:
  • The omega set partitions as follows:

In the following example of a propositional calculus, the transformation rules are intended to be interpreted as the inference rules of a so-called natural deduction system. The particular system presented here has no initial points, which means that its interpretation for logical applications derives its theorems from an empty axiom set.

  • The set of initial points is empty, that is, .
  • The set of transformation rules, , is described as follows:

Our propositional calculus has eleven inference rules. These rules allow us to derive other true formulas given a set of formulas that are assumed to be true. The first ten simply state that we can infer certain well-formed formulas from other well-formed formulas. The last rule however uses hypothetical reasoning in the sense that in the premise of the rule we temporarily assume an (unproven) hypothesis to be part of the set of inferred formulas to see if we can infer a certain other formula. Since the first ten rules do not do this they are usually described as non-hypothetical rules, and the last one as a hypothetical rule.

In describing the transformation rules, we may introduce a metalanguage symbol . It is basically a convenient shorthand for saying "infer that". The format is , in which Γ is a (possibly empty) set of formulas called premises, and ψ is a formula called conclusion. The transformation rule means that if every proposition in Γ is a theorem (or has the same truth value as the axioms), then ψ is also a theorem. Considering the following rule Conjunction introduction, we will know whenever Γ has more than one formula, we can always safely reduce it into one formula using conjunction. So for short, from that time on we may represent Γ as one formula instead of a set. Another omission for convenience is when Γ is an empty set, in which case Γ may not appear.

Negation introduction
From and , infer .
That is, .
Negation elimination
From , infer .
That is, .
Double negation elimination
From , infer p.
That is, .
Conjunction introduction
From p and q, infer .
That is, .
Conjunction elimination
From , infer p.
From , infer q.
That is, and .
Disjunction introduction
From p, infer .
From q, infer .
That is, and .
Disjunction elimination
From and and , infer r.
That is, .
Biconditional introduction
From and , infer .
That is, .
Biconditional elimination
From , infer .
From , infer .
That is, and .
Modus ponens (conditional elimination)
From p and , infer q.
That is, .
Conditional proof (conditional introduction)
From [accepting p allows a proof of q], infer .
That is, .

Basic and derived argument forms

Name Sequent Description
Modus Ponens If p then q; p; therefore q
Modus Tollens If p then q; not q; therefore not p
Hypothetical Syllogism If p then q; if q then r; therefore, if p then r
Disjunctive Syllogism Either p or q, or both; not p; therefore, q
Constructive Dilemma If p then q; and if r then s; but p or r; therefore q or s
Destructive Dilemma If p then q; and if r then s; but not q or not s; therefore not p or not r
Bidirectional Dilemma If p then q; and if r then s; but p or not s; therefore q or not r
Simplification p and q are true; therefore p is true
Conjunction p and q are true separately; therefore they are true conjointly
Addition p is true; therefore the disjunction (p or q) is true
Composition If p then q; and if p then r; therefore if p is true then q and r are true
De Morgan's Theorem (1) The negation of (p and q) is equiv. to (not p or not q)
De Morgan's Theorem (2) The negation of (p or q) is equiv. to (not p and not q)
Commutation (1) (p or q) is equiv. to (q or p)
Commutation (2) (p and q) is equiv. to (q and p)
Commutation (3) (p iff q) is equiv. to (q iff p)
Association (1) p or (q or r) is equiv. to (p or q) or r
Association (2) p and (q and r) is equiv. to (p and q) and r
Distribution (1) p and (q or r) is equiv. to (p and q) or (p and r)
Distribution (2) p or (q and r) is equiv. to (p or q) and (p or r)
Double Negation p is equivalent to the negation of not p
Transposition If p then q is equiv. to if not q then not p
Material Implication If p then q is equiv. to not p or q
Material Equivalence (1) (p iff q) is equiv. to (if p is true then q is true) and (if q is true then p is true)
Material Equivalence (2) (p iff q) is equiv. to either (p and q are true) or (both p and q are false)
Material Equivalence (3) (p iff q) is equiv to., both (p or not q is true) and (not p or q is true)
Exportation[13] from (if p and q are true then r is true) we can prove (if q is true then r is true, if p is true)
Importation If p then (if q then r) is equivalent to if p and q then r
Tautology (1) p is true is equiv. to p is true or p is true
Tautology (2) p is true is equiv. to p is true and p is true
Tertium non datur (Law of Excluded Middle) p or not p is true
Law of Non-Contradiction p and not p is false, is a true statement

Proofs in propositional calculus

One of the main uses of a propositional calculus, when interpreted for logical applications, is to determine relations of logical equivalence between propositional formulas. These relationships are determined by means of the available transformation rules, sequences of which are called derivations or proofs.

In the discussion to follow, a proof is presented as a sequence of numbered lines, with each line consisting of a single formula followed by a reason or justification for introducing that formula. Each premise of the argument, that is, an assumption introduced as a hypothesis of the argument, is listed at the beginning of the sequence and is marked as a "premise" in lieu of other justification. The conclusion is listed on the last line. A proof is complete if every line follows from the previous ones by the correct application of a transformation rule. (For a contrasting approach, see proof-trees).

Example of a proof in natural deduction system

  • To be shown that AA.
  • One possible proof of this (which, though valid, happens to contain more steps than are necessary) may be arranged as follows:
Example of a proof
Number Formula Reason
1 premise
2 From (1) by disjunction introduction
3 From (1) and (2) by conjunction introduction
4 From (3) by conjunction elimination
5 Summary of (1) through (4)
6 From (5) by conditional proof

Interpret as "Assuming A, infer A". Read as "Assuming nothing, infer that A implies A", or "It is a tautology that A implies A", or "It is always true that A implies A".

Example of a proof in a classical propositional calculus system

We now prove the same theorem in the axiomatic system by Jan Łukasiewicz described above, which is an example of a Hilbert-style deductive system for the classical propositional calculus.

The axioms are:

(A1)
(A2)
(A3)

And the proof is as follows:

  1.       (instance of (A1))
  2.       (instance of (A2))
  3.       (from (1) and (2) by modus ponens)
  4.       (instance of (A1))
  5.       (from (4) and (3) by modus ponens)

Soundness and completeness of the rules

The crucial properties of this set of rules are that they are sound and complete. Informally this means that the rules are correct and that no other rules are required. These claims can be made more formal as follows. The proofs for the soundness and completeness of the propositional logic are not themselves proofs in propositional logic ; these are theorems in ZFC used as a metatheory to prove properties of propositional logic.

We define a truth assignment as a function that maps propositional variables to true or false. Informally such a truth assignment can be understood as the description of a possible state of affairs (or possible world) where certain statements are true and others are not. The semantics of formulas can then be formalized by defining for which "state of affairs" they are considered to be true, which is what is done by the following definition.

We define when such a truth assignment A satisfies a certain well-formed formula with the following rules:

  • A satisfies the propositional variable P if and only if A(P) = true
  • A satisfies ¬φ if and only if A does not satisfy φ
  • A satisfies (φψ) if and only if A satisfies both φ and ψ
  • A satisfies (φψ) if and only if A satisfies at least one of either φ or ψ
  • A satisfies (φψ) if and only if it is not the case that A satisfies φ but not ψ
  • A satisfies (φψ) if and only if A satisfies both φ and ψ or satisfies neither one of them

With this definition we can now formalize what it means for a formula φ to be implied by a certain set S of formulas. Informally this is true if in all worlds that are possible given the set of formulas S the formula φ also holds. This leads to the following formal definition: We say that a set S of well-formed formulas semantically entails (or implies) a certain well-formed formula φ if all truth assignments that satisfy all the formulas in S also satisfy φ.

Finally we define syntactical entailment such that φ is syntactically entailed by S if and only if we can derive it with the inference rules that were presented above in a finite number of steps. This allows us to formulate exactly what it means for the set of inference rules to be sound and complete:

Soundness: If the set of well-formed formulas S syntactically entails the well-formed formula φ then S semantically entails φ.

Completeness: If the set of well-formed formulas S semantically entails the well-formed formula φ then S syntactically entails φ.

For the above set of rules this is indeed the case.

Sketch of a soundness proof

(For most logical systems, this is the comparatively "simple" direction of proof)

Notational conventions: Let G be a variable ranging over sets of sentences. Let A, B and C range over sentences. For "G syntactically entails A" we write "G proves A". For "G semantically entails A" we write "G implies A".

We want to show: (A)(G) (if G proves A, then G implies A).

We note that "G proves A" has an inductive definition, and that gives us the immediate resources for demonstrating claims of the form "If G proves A, then ...". So our proof proceeds by induction.

  1. Basis. Show: If A is a member of G, then G implies A.
  2. Basis. Show: If A is an axiom, then G implies A.
  3. Inductive step (induction on n, the length of the proof):
    1. Assume for arbitrary G and A that if G proves A in n or fewer steps, then G implies A.
    2. For each possible application of a rule of inference at step n + 1, leading to a new theorem B, show that G implies B.

Notice that Basis Step II can be omitted for natural deduction systems because they have no axioms. When used, Step II involves showing that each of the axioms is a (semantic) logical truth.

The Basis steps demonstrate that the simplest provable sentences from G are also implied by G, for any G. (The proof is simple, since the semantic fact that a set implies any of its members, is also trivial.) The Inductive step will systematically cover all the further sentences that might be provable—by considering each case where we might reach a logical conclusion using an inference rule—and shows that if a new sentence is provable, it is also logically implied. (For example, we might have a rule telling us that from "A" we can derive "A or B". In III.a We assume that if A is provable it is implied. We also know that if A is provable then "A or B" is provable. We have to show that then "A or B" too is implied. We do so by appeal to the semantic definition and the assumption we just made. A is provable from G, we assume. So it is also implied by G. So any semantic valuation making all of G true makes A true. But any valuation making A true makes "A or B" true, by the defined semantics for "or". So any valuation which makes all of G true makes "A or B" true. So "A or B" is implied.) Generally, the Inductive step will consist of a lengthy but simple case-by-case analysis of all the rules of inference, showing that each "preserves" semantic implication.

By the definition of provability, there are no sentences provable other than by being a member of G, an axiom, or following by a rule; so if all of those are semantically implied, the deduction calculus is sound.

Sketch of completeness proof

(This is usually the much harder direction of proof.)

We adopt the same notational conventions as above.

We want to show: If G implies A, then G proves A. We proceed by contraposition: We show instead that if G does not prove A then G does not imply A. If we show that there is a model where A does not hold despite G being true, then obviously G does not imply A. The idea is to build such a model out of our very assumption that G does not prove A.

  1. G does not prove A. (Assumption)
  2. If G does not prove A, then we can construct an (infinite) Maximal Set, G, which is a superset of G and which also does not prove A.
    1. Place an ordering (with order type ω) on all the sentences in the language (e.g., shortest first, and equally long ones in extended alphabetical ordering), and number them (E1, E2, ...)
    2. Define a series Gn of sets (G0, G1, ...) inductively:
      1. If proves A, then
      2. If does not prove A, then
    3. Define G as the union of all the Gn. (That is, G is the set of all the sentences that are in any Gn.)
    4. It can be easily shown that
      1. G contains (is a superset of) G (by (b.i));
      2. G does not prove A (because the proof would contain only finitely many sentences and when the last of them is introduced in some Gn, that Gn would prove A contrary to the definition of Gn); and
      3. G is a Maximal Set with respect to A: If any more sentences whatever were added to G, it would prove A. (Because if it were possible to add any more sentences, they should have been added when they were encountered during the construction of the Gn, again by definition)
  3. If G is a Maximal Set with respect to A, then it is truth-like{CITATION BUT ALSO THIS TERM IS CONTENTIOUS [see truth-assignment, truth-valuation's own wikipedia page ... ] --- especially here in this most-abstract example that speaks only the the assumed relation between some arb defined token-types jrp} . This means that it contains C if and only if it does not contain ¬C; If it contains C and contains "If C then B" then it also contains B; and so forth. In order to show this, one has to show the axiomatic system is strong enough for the following:
    • For any formulas C and D, if it proves both C and ¬C, then it proves D. From this it follows, that a Maximal Set with respect to A cannot prove both C and ¬C, as otherwise it would prove A.
    • For any formulas C and D, if it proves both CD and ¬CD, then it proves D. This is used, together with the deduction theorem, to show that for any formula, either it or its negation is in G: Let B be a formula not in G; then G with the addition of B proves A. Thus from the deduction theorem it follows that G proves BA. But suppose ¬B were also not in G, then by the same logic G also proves ¬BA; but then G proves A, which we have already shown to be false.
    • For any formulas C and D, if it proves C and D, then it proves CD.
    • For any formulas C and D, if it proves C and ¬D, then it proves ¬(CD).
    • For any formulas C and D, if it proves ¬C, then it proves CD.
    If additional logical operation (such as conjunction and/or disjunction) are part of the vocabulary as well, then there are additional requirement on the axiomatic system (e.g. that if it proves C and D, it would also prove their conjunction).
  4. If G is truth-like there is a G-Canonical valuation of the language: one that makes every sentence in G true and everything outside G false while still obeying the laws of semantic composition in the language. The requirement that it is truth-like is needed to guarantee that the laws of semantic composition in the language will be satisfied by this truth assignment.
  5. A G-canonical valuation will make our original set G all true, and make A false.
  6. If there is a valuation on which G are true and A is false, then G does not (semantically) imply A.

Thus every system that has modus ponens as an inference rule, and proves the following theorems (including substitutions thereof) is complete:

The first five are used for the satisfaction of the five conditions in stage III above, and the last three for proving the deduction theorem.

Example

As an example, it can be shown that as any other tautology, the three axioms of the classical propositional calculus system described earlier can be proven in any system that satisfies the above, namely that has modus ponens as an inference rule, and proves the above eight theorems (including substitutions thereof). Out of the eight theorems, the last two are two of the three axioms; the third axiom, , can be proven as well, as we now show.

For the proof we may use the hypothetical syllogism theorem (in the form relevant for this axiomatic system), since it only relies on the two axioms that are already in the above set of eight theorems. The proof then is as follows:

  1.       (instance of the 7th theorem)
  2.       (instance of the 7th theorem)
  3.       (from (1) and (2) by modus ponens)
  4.       (instance of the hypothetical syllogism theorem)
  5.       (instance of the 5th theorem)
  6.       (from (5) and (4) by modus ponens)
  7.       (instance of the 2nd theorem)
  8.       (instance of the 7th theorem)
  9.       (from (7) and (8) by modus ponens)
  10.       (instance of the 8th theorem)
  11.       (from (9) and (10) by modus ponens)
  12.       (from (3) and (11) by modus ponens)
  13.       (instance of the 8th theorem)
  14.       (from (12) and (13) by modus ponens)
  15.       (from (6) and (14) by modus ponens)

Verifying completeness for the classical propositional calculus system

We now verify that the classical propositional calculus system described earlier can indeed prove the required eight theorems mentioned above. We use several lemmas proven here:

(DN1) - Double negation (one direction)
(DN2) - Double negation (another direction)
(HS1) - one form of Hypothetical syllogism
(HS2) - another form of Hypothetical syllogism
(TR1) - Transposition
(TR2) - another form of transposition.
(L1)
(L3)

We also use the method of the hypothetical syllogism metatheorem as a shorthand for several proof steps.

  • - proof:
    1.       (instance of (A1))
    2.       (instance of (TR1))
    3.       (from (1) and (2) using the hypothetical syllogism metatheorem)
    4.       (instance of (DN1))
    5.       (instance of (HS1))
    6.       (from (4) and (5) using modus ponens)
    7.       (from (3) and (6) using the hypothetical syllogism metatheorem)
  • - proof:
    1.       (instance of (HS1))
    2.       (instance of (L3))
    3.       (instance of (HS1))
    4.       (from (2) and (3) by modus ponens)
    5.       (from (1) and (4) using the hypothetical syllogism metatheorem)
    6.       (instance of (TR2))
    7.       (instance of (HS2))
    8.       (from (6) and (7) using modus ponens)
    9.       (from (5) and (8) using the hypothetical syllogism metatheorem)
  • - proof:
    1.       (instance of (A1))
    2.       (instance of (A1))
    3.       (from (1) and (2) using modus ponens)
  • - proof:
    1.       (instance of (L1))
    2.       (instance of (TR1))
    3.       (from (1) and (2) using the hypothetical syllogism metatheorem)
  • - proof:
    1.       (instance of (A1))
    2.       (instance of (A3))
    3.       (from (1) and (2) using the hypothetical syllogism metatheorem)
  • - proof given in the proof example above
  • - axiom (A1)
  • - axiom (A2)

Another outline for a completeness proof

If a formula is a tautology, then there is a truth table for it which shows that each valuation yields the value true for the formula. Consider such a valuation. By mathematical induction on the length of the subformulas, show that the truth or falsity of the subformula follows from the truth or falsity (as appropriate for the valuation) of each propositional variable in the subformula. Then combine the lines of the truth table together two at a time by using "(P is true implies S) implies ((P is false implies S) implies S)". Keep repeating this until all dependencies on propositional variables have been eliminated. The result is that we have proved the given tautology. Since every tautology is provable, the logic is complete.

Interpretation of a truth-functional propositional calculus

An interpretation of a truth-functional propositional calculus is an assignment to each propositional symbol of of one or the other (but not both) of the truth values truth (T) and falsity (F), and an assignment to the connective symbols of of their usual truth-functional meanings. An interpretation of a truth-functional propositional calculus may also be expressed in terms of truth tables.

For distinct propositional symbols there are distinct possible interpretations. For any particular symbol , for example, there are possible interpretations:

  1. is assigned T, or
  2. is assigned F.

For the pair , there are possible interpretations:

  1. both are assigned T,
  2. both are assigned F,
  3. is assigned T and is assigned F, or
  4. is assigned F and is assigned T.[14]

Since has , that is, denumerably many propositional symbols, there are , and therefore uncountably many distinct possible interpretations of .

Interpretation of a sentence of truth-functional propositional logic

If φ and ψ are formulas of and is an interpretation of then the following definitions apply:

  • A sentence of propositional logic is true under an interpretation if assigns the truth value T to that sentence. If a sentence is true under an interpretation, then that interpretation is called a model of that sentence.
  • φ is false under an interpretation if φ is not true under .
  • A sentence of propositional logic is logically valid if it is true under every interpretation.
    φ means that φ is logically valid.
  • A sentence ψ of propositional logic is a semantic consequence of a sentence φ if there is no interpretation under which φ is true and ψ is false.
  • A sentence of propositional logic is consistent if it is true under at least one interpretation. It is inconsistent if it is not consistent.

Some consequences of these definitions:

  • For any given interpretation a given formula is either true or false.
  • No formula is both true and false under the same interpretation.
  • φ is false for a given interpretation iff is true for that interpretation; and φ is true under an interpretation iff is false under that interpretation.
  • If φ and are both true under a given interpretation, then ψ is true under that interpretation.
  • If and , then .
  • is true under iff φ is not true under .
  • is true under iff either φ is not true under or ψ is true under .
  • A sentence ψ of propositional logic is a semantic consequence of a sentence φ iff is logically valid, that is, iff .

Alternative calculus

It is possible to define another version of propositional calculus, which defines most of the syntax of the logical operators by means of axioms, and which uses only one inference rule.

Axioms

Let φ, χ, and ψ stand for well-formed formulas. (The well-formed formulas themselves would not contain any Greek letters, but only capital Roman letters, connective operators, and parentheses.) Then the axioms are as follows:

Axioms
Name Axiom Schema Description
THEN-1 Add hypothesis χ, implication introduction
THEN-2 Distribute hypothesis over implication
AND-1 Eliminate conjunction
AND-2  
AND-3 Introduce conjunction
OR-1 Introduce disjunction
OR-2  
OR-3 Eliminate disjunction
NOT-1 Introduce negation
NOT-2 Eliminate negation
NOT-3 Excluded middle, classical logic
IFF-1 Eliminate equivalence
IFF-2  
IFF-3 Introduce equivalence
  • Axiom THEN-2 may be considered to be a "distributive property of implication with respect to implication."
  • Axioms AND-1 and AND-2 correspond to "conjunction elimination". The relation between AND-1 and AND-2 reflects the commutativity of the conjunction operator.
  • Axiom AND-3 corresponds to "conjunction introduction."
  • Axioms OR-1 and OR-2 correspond to "disjunction introduction." The relation between OR-1 and OR-2 reflects the commutativity of the disjunction operator.
  • Axiom NOT-1 corresponds to "reductio ad absurdum."
  • Axiom NOT-2 says that "anything can be deduced from a contradiction."
  • Axiom NOT-3 is called "tertium non-datur" (Latin: "a third is not given") and reflects the semantic valuation of propositional formulas: a formula can have a truth-value of either true or false. There is no third truth-value, at least not in classical logic. Intuitionistic logicians do not accept the axiom NOT-3.

Inference rule

The inference rule is modus ponens:

.

Meta-inference rule

Let a demonstration be represented by a sequence, with hypotheses to the left of the turnstile and the conclusion to the right of the turnstile. Then the deduction theorem can be stated as follows:

If the sequence
has been demonstrated, then it is also possible to demonstrate the sequence
.

This deduction theorem (DT) is not itself formulated with propositional calculus: it is not a theorem of propositional calculus, but a theorem about propositional calculus. In this sense, it is a meta-theorem, comparable to theorems about the soundness or completeness of propositional calculus.

On the other hand, DT is so useful for simplifying the syntactical proof process that it can be considered and used as another inference rule, accompanying modus ponens. In this sense, DT corresponds to the natural conditional proof inference rule which is part of the first version of propositional calculus introduced in this article.

The converse of DT is also valid:

If the sequence
has been demonstrated, then it is also possible to demonstrate the sequence

in fact, the validity of the converse of DT is almost trivial compared to that of DT:

If
then
1:
2:
and from (1) and (2) can be deduced
3:
by means of modus ponens, Q.E.D.

The converse of DT has powerful implications: it can be used to convert an axiom into an inference rule. For example, by axiom AND-1 we have,

which can be transformed by means of the converse of the deduction theorem into

which tells us that the inference rule

is admissible. This inference rule is conjunction elimination, one of the ten inference rules used in the first version (in this article) of the propositional calculus.

Example of a proof

The following is an example of a (syntactical) demonstration, involving only axioms THEN-1 and THEN-2:

Prove: (Reflexivity of implication).

Proof:

  1. Axiom THEN-2 with
  2. Axiom THEN-1 with
  3. From (1) and (2) by modus ponens.
  4. Axiom THEN-1 with
  5. From (3) and (4) by modus ponens.

Equivalence to equational logics

The preceding alternative calculus is an example of a Hilbert-style deduction system. In the case of propositional systems the axioms are terms built with logical connectives and the only inference rule is modus ponens. Equational logic as standardly used informally in high school algebra is a different kind of calculus from Hilbert systems. Its theorems are equations and its inference rules express the properties of equality, namely that it is a congruence on terms that admits substitution.

Classical propositional calculus as described above is equivalent to Boolean algebra, while intuitionistic propositional calculus is equivalent to Heyting algebra. The equivalence is shown by translation in each direction of the theorems of the respective systems. Theorems of classical or intuitionistic propositional calculus are translated as equations of Boolean or Heyting algebra respectively. Conversely theorems of Boolean or Heyting algebra are translated as theorems of classical or intuitionistic calculus respectively, for which is a standard abbreviation. In the case of Boolean algebra can also be translated as , but this translation is incorrect intuitionistically.

In both Boolean and Heyting algebra, inequality can be used in place of equality. The equality is expressible as a pair of inequalities and . Conversely the inequality is expressible as the equality , or as . The significance of inequality for Hilbert-style systems is that it corresponds to the latter's deduction or entailment symbol . An entailment

is translated in the inequality version of the algebraic framework as

Conversely the algebraic inequality is translated as the entailment

.

The difference between implication and inequality or entailment or is that the former is internal to the logic while the latter is external. Internal implication between two terms is another term of the same kind. Entailment as external implication between two terms expresses a metatruth outside the language of the logic, and is considered part of the metalanguage. Even when the logic under study is intuitionistic, entailment is ordinarily understood classically as two-valued: either the left side entails, or is less-or-equal to, the right side, or it is not.

Similar but more complex translations to and from algebraic logics are possible for natural deduction systems as described above and for the sequent calculus. The entailments of the latter can be interpreted as two-valued, but a more insightful interpretation is as a set, the elements of which can be understood as abstract proofs organized as the morphisms of a category. In this interpretation the cut rule of the sequent calculus corresponds to composition in the category. Boolean and Heyting algebras enter this picture as special categories having at most one morphism per homset, i.e., one proof per entailment, corresponding to the idea that existence of proofs is all that matters: any proof will do and there is no point in distinguishing them.

Graphical calculi

It is possible to generalize the definition of a formal language from a set of finite sequences over a finite basis to include many other sets of mathematical structures, so long as they are built up by finitary means from finite materials. What's more, many of these families of formal structures are especially well-suited for use in logic.

For example, there are many families of graphs that are close enough analogues of formal languages that the concept of a calculus is quite easily and naturally extended to them. Many species of graphs arise as parse graphs in the syntactic analysis of the corresponding families of text structures. The exigencies of practical computation on formal languages frequently demand that text strings be converted into pointer structure renditions of parse graphs, simply as a matter of checking whether strings are well-formed formulas or not. Once this is done, there are many advantages to be gained from developing the graphical analogue of the calculus on strings. The mapping from strings to parse graphs is called parsing and the inverse mapping from parse graphs to strings is achieved by an operation that is called traversing the graph.

Other logical calculi

Propositional calculus is about the simplest kind of logical calculus in current use. It can be extended in several ways. (Aristotelian "syllogistic" calculus, which is largely supplanted in modern logic, is in some ways simpler – but in other ways more complex – than propositional calculus.) The most immediate way to develop a more complex logical calculus is to introduce rules that are sensitive to more fine-grained details of the sentences being used.

First-order logic (a.k.a. first-order predicate logic) results when the "atomic sentences" of propositional logic are broken up into terms, variables, predicates, and quantifiers, all keeping the rules of propositional logic with some new ones introduced. (For example, from "All dogs are mammals" we may infer "If Rover is a dog then Rover is a mammal".) With the tools of first-order logic it is possible to formulate a number of theories, either with explicit axioms or by rules of inference, that can themselves be treated as logical calculi. Arithmetic is the best known of these; others include set theory and mereology. Second-order logic and other higher-order logics are formal extensions of first-order logic. Thus, it makes sense to refer to propositional logic as "zeroth-order logic", when comparing it with these logics.

Modal logic also offers a variety of inferences that cannot be captured in propositional calculus. For example, from "Necessarily p" we may infer that p. From p we may infer "It is possible that p". The translation between modal logics and algebraic logics concerns classical and intuitionistic logics but with the introduction of a unary operator on Boolean or Heyting algebras, different from the Boolean operations, interpreting the possibility modality, and in the case of Heyting algebra a second operator interpreting necessity (for Boolean algebra this is redundant since necessity is the De Morgan dual of possibility). The first operator preserves 0 and disjunction while the second preserves 1 and conjunction.

Many-valued logics are those allowing sentences to have values other than true and false. (For example, neither and both are standard "extra values"; "continuum logic" allows each sentence to have any of an infinite number of "degrees of truth" between true and false.) These logics often require calculational devices quite distinct from propositional calculus. When the values form a Boolean algebra (which may have more than two or even infinitely many values), many-valued logic reduces to classical logic; many-valued logics are therefore only of independent interest when the values form an algebra that is not Boolean.

Solvers

One notable difference between propositional calculus and predicate calculus is that satisfiability of a propositional formula is decidable. Deciding satisfiability of propositional logic formulas is an NP-complete problem. However, practical methods exist (e.g., DPLL algorithm, 1962; Chaff algorithm, 2001) that are very fast for many useful cases. Recent work has extended the SAT solver algorithms to work with propositions containing arithmetic expressions; these are the SMT solvers.

Mathematical optimization

From Wikipedia, the free encyclopedia
https://en.wikipedia.org/wiki/Mathematical_optimization
Graph of a surface given by z = f(x, y) = −(x² + y²) + 4. The global maximum at (x, y, z) = (0, 0, 4) is indicated by a blue dot.
Nelder-Mead minimum search of Simionescu's function. Simplex vertices are ordered by their values, with 1 having the lowest ( best) value.

Mathematical optimization (alternatively spelled optimisation) or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

In the more general approach, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. The generalization of optimization theory and techniques to other formulations constitutes a large area of applied mathematics. More generally, optimization includes finding "best available" values of some objective function given a defined domain (or input), including a variety of different types of objective functions and different types of domains.

Optimization problems

Optimization problems can be divided into two categories, depending on whether the variables are continuous or discrete:

An optimization problem can be represented in the following way:

Given: a function f : A from some set A to the real numbers
Sought: an element x0A such that f(x0) ≤ f(x) for all xA ("minimization") or such that f(x0) ≥ f(x) for all xA ("maximization").

Such a formulation is called an optimization problem or a mathematical programming problem (a term not directly related to computer programming, but still in use for example in linear programming – see History below). Many real-world and theoretical problems may be modeled in this general framework.

Since the following is valid

it suffices to solve only minimization problems. However, the opposite perspective of considering only maximization problems would be valid, too.

Problems formulated using this technique in the fields of physics may refer to the technique as energy minimization, speaking of the value of the function f as representing the energy of the system being modeled. In machine learning, it is always necessary to continuously evaluate the quality of a data model by using a cost function where a minimum implies a set of possibly optimal parameters with an optimal (lowest) error.

Typically, A is some subset of the Euclidean space , often specified by a set of constraints, equalities or inequalities that the members of A have to satisfy. The domain A of f is called the search space or the choice set, while the elements of A are called candidate solutions or feasible solutions.

The function f is called, variously, an objective function, criterion function a loss function or cost function (minimization), a utility function or fitness function (maximization), or, in certain fields, an energy function or energy functional. A feasible solution that minimizes (or maximizes, if that is the goal) the objective function is called an optimal solution.

In mathematics, conventional optimization problems are usually stated in terms of minimization.

A local minimum x* is defined as an element for which there exists some δ > 0 such that

the expression f(x*) ≤ f(x) holds;

that is to say, on some region around x* all of the function values are greater than or equal to the value at that element. Local maxima are defined similarly.

While a local minimum is at least as good as any nearby elements, a global minimum is at least as good as every feasible element. Generally, unless the objective function is convex in a minimization problem, there may be several local minima. In a convex problem, if there is a local minimum that is interior (not on the edge of the set of feasible elements), it is also the global minimum, but a nonconvex problem may have more than one local minimum not all of which need be global minima.

A large number of algorithms proposed for solving the nonconvex problems – including the majority of commercially available solvers – are not capable of making a distinction between locally optimal solutions and globally optimal solutions, and will treat the former as actual solutions to the original problem. Global optimization is the branch of applied mathematics and numerical analysis that is concerned with the development of deterministic algorithms that are capable of guaranteeing convergence in finite time to the actual optimal solution of a nonconvex problem.

Notation

Optimization problems are often expressed with special notation. Here are some examples:

Minimum and maximum value of a function

Consider the following notation:

This denotes the minimum value of the objective function x2 + 1, when choosing x from the set of real numbers . The minimum value in this case is 1, occurring at x = 0.

Similarly, the notation

asks for the maximum value of the objective function 2x, where x may be any real number. In this case, there is no such maximum as the objective function is unbounded, so the answer is "infinity" or "undefined".

Optimal input arguments

Consider the following notation:

or equivalently

This represents the value (or values) of the argument x in the interval (−∞,−1] that minimizes (or minimises) the objective function x2 + 1 (the actual minimum value of that function is not what the problem asks for). In this case, the answer is x = −1, since x = 0 is infeasible, that is, it does not belong to the feasible set.

Similarly,

or equivalently

represents the {x, y} pair (or pairs) that maximizes (or maximize) the value of the objective function x cos y, with the added constraint that x lie in the interval [−5,5] (again, the actual maximum value of the expression does not matter). In this case, the solutions are the pairs of the form {5, 2kπ} and {−5, (2k + 1)π}, where k ranges over all integers.

Operators arg min and arg max are sometimes also written as argmin and argmax, and stand for argument of the minimum and argument of the maximum.

History

Fermat and Lagrange found calculus-based formulae for identifying optima, while Newton and Gauss proposed iterative methods for moving towards an optimum.

The term "linear programming" for certain optimization cases was due to George B. Dantzig, although much of the theory had been introduced by Leonid Kantorovich in 1939. (Programming in this context does not refer to computer programming, but comes from the use of program by the United States military to refer to proposed training and logistics schedules, which were the problems Dantzig studied at that time.) Dantzig published the Simplex algorithm in 1947, and also John von Neumann and other researches worked on the theoretical aspects of linear programming (like the theory of duality) around the same time.

Other notable researchers in mathematical optimization include the following:

Major subfields

  • Convex programming studies the case when the objective function is convex (minimization) or concave (maximization) and the constraint set is convex. This can be viewed as a particular case of nonlinear programming or as generalization of linear or convex quadratic programming.
    • Linear programming (LP), a type of convex programming, studies the case in which the objective function f is linear and the constraints are specified using only linear equalities and inequalities. Such a constraint set is called a polyhedron or a polytope if it is bounded.
    • Second-order cone programming (SOCP) is a convex program, and includes certain types of quadratic programs.
    • Semidefinite programming (SDP) is a subfield of convex optimization where the underlying variables are semidefinite matrices. It is a generalization of linear and convex quadratic programming.
    • Conic programming is a general form of convex programming. LP, SOCP and SDP can all be viewed as conic programs with the appropriate type of cone.
    • Geometric programming is a technique whereby objective and inequality constraints expressed as posynomials and equality constraints as monomials can be transformed into a convex program.
  • Integer programming studies linear programs in which some or all variables are constrained to take on integer values. This is not convex, and in general much more difficult than regular linear programming.
  • Quadratic programming allows the objective function to have quadratic terms, while the feasible set must be specified with linear equalities and inequalities. For specific forms of the quadratic term, this is a type of convex programming.
  • Fractional programming studies optimization of ratios of two nonlinear functions. The special class of concave fractional programs can be transformed to a convex optimization problem.
  • Nonlinear programming studies the general case in which the objective function or the constraints or both contain nonlinear parts. This may or may not be a convex program. In general, whether the program is convex affects the difficulty of solving it.
  • Stochastic programming studies the case in which some of the constraints or parameters depend on random variables.
  • Robust optimization is, like stochastic programming, an attempt to capture uncertainty in the data underlying the optimization problem. Robust optimization aims to find solutions that are valid under all possible realizations of the uncertainties defined by an uncertainty set.
  • Combinatorial optimization is concerned with problems where the set of feasible solutions is discrete or can be reduced to a discrete one.
  • Stochastic optimization is used with random (noisy) function measurements or random inputs in the search process.
  • Infinite-dimensional optimization studies the case when the set of feasible solutions is a subset of an infinite-dimensional space, such as a space of functions.
  • Heuristics and metaheuristics make few or no assumptions about the problem being optimized. Usually, heuristics do not guarantee that any optimal solution need be found. On the other hand, heuristics are used to find approximate solutions for many complicated optimization problems.
  • Constraint satisfaction studies the case in which the objective function f is constant (this is used in artificial intelligence, particularly in automated reasoning).
    • Constraint programming is a programming paradigm wherein relations between variables are stated in the form of constraints.
  • Disjunctive programming is used where at least one constraint must be satisfied but not all. It is of particular use in scheduling.
  • Space mapping is a concept for modeling and optimization of an engineering system to high-fidelity (fine) model accuracy exploiting a suitable physically meaningful coarse or surrogate model.

In a number of subfields, the techniques are designed primarily for optimization in dynamic contexts (that is, decision making over time):

Multi-objective optimization

Adding more than one objective to an optimization problem adds complexity. For example, to optimize a structural design, one would desire a design that is both light and rigid. When two objectives conflict, a trade-off must be created. There may be one lightest design, one stiffest design, and an infinite number of designs that are some compromise of weight and rigidity. The set of trade-off designs that improve upon one criterion at the expense of another is known as the Pareto set. The curve created plotting weight against stiffness of the best designs is known as the Pareto frontier.

A design is judged to be "Pareto optimal" (equivalently, "Pareto efficient" or in the Pareto set) if it is not dominated by any other design: If it is worse than another design in some respects and no better in any respect, then it is dominated and is not Pareto optimal.

The choice among "Pareto optimal" solutions to determine the "favorite solution" is delegated to the decision maker. In other words, defining the problem as multi-objective optimization signals that some information is missing: desirable objectives are given but combinations of them are not rated relative to each other. In some cases, the missing information can be derived by interactive sessions with the decision maker.

Multi-objective optimization problems have been generalized further into vector optimization problems where the (partial) ordering is no longer given by the Pareto ordering.

Multi-modal or global optimization

Optimization problems are often multi-modal; that is, they possess multiple good solutions. They could all be globally good (same cost function value) or there could be a mix of globally good and locally good solutions. Obtaining all (or at least some of) the multiple solutions is the goal of a multi-modal optimizer.

Classical optimization techniques due to their iterative approach do not perform satisfactorily when they are used to obtain multiple solutions, since it is not guaranteed that different solutions will be obtained even with different starting points in multiple runs of the algorithm.

Common approaches to global optimization problems, where multiple local extrema may be present include evolutionary algorithms, Bayesian optimization and simulated annealing.

Classification of critical points and extrema

Feasibility problem

The satisfiability problem, also called the feasibility problem, is just the problem of finding any feasible solution at all without regard to objective value. This can be regarded as the special case of mathematical optimization where the objective value is the same for every solution, and thus any solution is optimal.

Many optimization algorithms need to start from a feasible point. One way to obtain such a point is to relax the feasibility conditions using a slack variable; with enough slack, any starting point is feasible. Then, minimize that slack variable until the slack is null or negative.

Existence

The extreme value theorem of Karl Weierstrass states that a continuous real-valued function on a compact set attains its maximum and minimum value. More generally, a lower semi-continuous function on a compact set attains its minimum; an upper semi-continuous function on a compact set attains its maximum point or view.

Necessary conditions for optimality

One of Fermat's theorems states that optima of unconstrained problems are found at stationary points, where the first derivative or the gradient of the objective function is zero (see first derivative test). More generally, they may be found at critical points, where the first derivative or gradient of the objective function is zero or is undefined, or on the boundary of the choice set. An equation (or set of equations) stating that the first derivative(s) equal(s) zero at an interior optimum is called a 'first-order condition' or a set of first-order conditions.

Optima of equality-constrained problems can be found by the Lagrange multiplier method. The optima of problems with equality and/or inequality constraints can be found using the 'Karush–Kuhn–Tucker conditions'.

Sufficient conditions for optimality

While the first derivative test identifies points that might be extrema, this test does not distinguish a point that is a minimum from one that is a maximum or one that is neither. When the objective function is twice differentiable, these cases can be distinguished by checking the second derivative or the matrix of second derivatives (called the Hessian matrix) in unconstrained problems, or the matrix of second derivatives of the objective function and the constraints called the bordered Hessian in constrained problems. The conditions that distinguish maxima, or minima, from other stationary points are called 'second-order conditions' (see 'Second derivative test'). If a candidate solution satisfies the first-order conditions, then the satisfaction of the second-order conditions as well is sufficient to establish at least local optimality.

Sensitivity and continuity of optima

The envelope theorem describes how the value of an optimal solution changes when an underlying parameter changes. The process of computing this change is called comparative statics.

The maximum theorem of Claude Berge (1963) describes the continuity of an optimal solution as a function of underlying parameters.

Calculus of optimization

For unconstrained problems with twice-differentiable functions, some critical points can be found by finding the points where the gradient of the objective function is zero (that is, the stationary points). More generally, a zero subgradient certifies that a local minimum has been found for minimization problems with convex functions and other locally Lipschitz functions.

Further, critical points can be classified using the definiteness of the Hessian matrix: If the Hessian is positive definite at a critical point, then the point is a local minimum; if the Hessian matrix is negative definite, then the point is a local maximum; finally, if indefinite, then the point is some kind of saddle point.

Constrained problems can often be transformed into unconstrained problems with the help of Lagrange multipliers. Lagrangian relaxation can also provide approximate solutions to difficult constrained problems.

When the objective function is a convex function, then any local minimum will also be a global minimum. There exist efficient numerical techniques for minimizing convex functions, such as interior-point methods.

Global convergence

More generally, if the objective function is not a quadratic function, then many optimization methods use other methods to ensure that some subsequence of iterations converges to an optimal solution. The first and still popular method for ensuring convergence relies on line searches, which optimize a function along one dimension. A second and increasingly popular method for ensuring convergence uses trust regions. Both line searches and trust regions are used in modern methods of non-differentiable optimization. Usually, a global optimizer is much slower than advanced local optimizers (such as BFGS), so often an efficient global optimizer can be constructed by starting the local optimizer from different starting points.

Computational optimization techniques

To solve problems, researchers may use algorithms that terminate in a finite number of steps, or iterative methods that converge to a solution (on some specified class of problems), or heuristics that may provide approximate solutions to some problems (although their iterates need not converge).

Optimization algorithms

Iterative methods

The iterative methods used to solve problems of nonlinear programming differ according to whether they evaluate Hessians, gradients, or only function values. While evaluating Hessians (H) and gradients (G) improves the rate of convergence, for functions for which these quantities exist and vary sufficiently smoothly, such evaluations increase the computational complexity (or computational cost) of each iteration. In some cases, the computational complexity may be excessively high.

One major criterion for optimizers is just the number of required function evaluations as this often is already a large computational effort, usually much more effort than within the optimizer itself, which mainly has to operate over the N variables. The derivatives provide detailed information for such optimizers, but are even harder to calculate, e.g. approximating the gradient takes at least N+1 function evaluations. For approximations of the 2nd derivatives (collected in the Hessian matrix), the number of function evaluations is in the order of N². Newton's method requires the 2nd-order derivatives, so for each iteration, the number of function calls is in the order of N², but for a simpler pure gradient optimizer it is only N. However, gradient optimizers need usually more iterations than Newton's algorithm. Which one is best with respect to the number of function calls depends on the problem itself.

  • Methods that evaluate Hessians (or approximate Hessians, using finite differences):
    • Newton's method
    • Sequential quadratic programming: A Newton-based method for small-medium scale constrained problems. Some versions can handle large-dimensional problems.
    • Interior point methods: This is a large class of methods for constrained optimization, some of which use only (sub)gradient information and others of which require the evaluation of Hessians.
  • Methods that evaluate gradients, or approximate gradients in some way (or even subgradients):
    • Coordinate descent methods: Algorithms which update a single coordinate in each iteration
    • Conjugate gradient methods: Iterative methods for large problems. (In theory, these methods terminate in a finite number of steps with quadratic objective functions, but this finite termination is not observed in practice on finite–precision computers.)
    • Gradient descent (alternatively, "steepest descent" or "steepest ascent"): A (slow) method of historical and theoretical interest, which has had renewed interest for finding approximate solutions of enormous problems.
    • Subgradient methods: An iterative method for large locally Lipschitz functions using generalized gradients. Following Boris T. Polyak, subgradient–projection methods are similar to conjugate–gradient methods.
    • Bundle method of descent: An iterative method for small–medium-sized problems with locally Lipschitz functions, particularly for convex minimization problems (similar to conjugate gradient methods).
    • Ellipsoid method: An iterative method for small problems with quasiconvex objective functions and of great theoretical interest, particularly in establishing the polynomial time complexity of some combinatorial optimization problems. It has similarities with Quasi-Newton methods.
    • Conditional gradient method (Frank–Wolfe) for approximate minimization of specially structured problems with linear constraints, especially with traffic networks. For general unconstrained problems, this method reduces to the gradient method, which is regarded as obsolete (for almost all problems).
    • Quasi-Newton methods: Iterative methods for medium-large problems (e.g. N<1000).
    • Simultaneous perturbation stochastic approximation (SPSA) method for stochastic optimization; uses random (efficient) gradient approximation.
  • Methods that evaluate only function values: If a problem is continuously differentiable, then gradients can be approximated using finite differences, in which case a gradient-based method can be used.

Heuristics

Besides (finitely terminating) algorithms and (convergent) iterative methods, there are heuristics. A heuristic is any algorithm which is not guaranteed (mathematically) to find the solution, but which is nevertheless useful in certain practical situations. List of some well-known heuristics:

Applications

Mechanics

Problems in rigid body dynamics (in particular articulated rigid body dynamics) often require mathematical programming techniques, since you can view rigid body dynamics as attempting to solve an ordinary differential equation on a constraint manifold; the constraints are various nonlinear geometric constraints such as "these two points must always coincide", "this surface must not penetrate any other", or "this point must always lie somewhere on this curve". Also, the problem of computing contact forces can be done by solving a linear complementarity problem, which can also be viewed as a QP (quadratic programming) problem.

Many design problems can also be expressed as optimization programs. This application is called design optimization. One subset is the engineering optimization, and another recent and growing subset of this field is multidisciplinary design optimization, which, while useful in many problems, has in particular been applied to aerospace engineering problems.

This approach may be applied in cosmology and astrophysics.

Economics and finance

Economics is closely enough linked to optimization of agents that an influential definition relatedly describes economics qua science as the "study of human behavior as a relationship between ends and scarce means" with alternative uses. Modern optimization theory includes traditional optimization theory but also overlaps with game theory and the study of economic equilibria. The Journal of Economic Literature codes classify mathematical programming, optimization techniques, and related topics under JEL:C61-C63.

In microeconomics, the utility maximization problem and its dual problem, the expenditure minimization problem, are economic optimization problems. Insofar as they behave consistently, consumers are assumed to maximize their utility, while firms are usually assumed to maximize their profit. Also, agents are often modeled as being risk-averse, thereby preferring to avoid risk. Asset prices are also modeled using optimization theory, though the underlying mathematics relies on optimizing stochastic processes rather than on static optimization. International trade theory also uses optimization to explain trade patterns between nations. The optimization of portfolios is an example of multi-objective optimization in economics.

Since the 1970s, economists have modeled dynamic decisions over time using control theory. For example, dynamic search models are used to study labor-market behavior. A crucial distinction is between deterministic and stochastic models. Macroeconomists build dynamic stochastic general equilibrium (DSGE) models that describe the dynamics of the whole economy as the result of the interdependent optimizing decisions of workers, consumers, investors, and governments.

Electrical engineering

Some common applications of optimization techniques in electrical engineering include active filter design, stray field reduction in superconducting magnetic energy storage systems, space mapping design of microwave structures, handset antennas, electromagnetics-based design. Electromagnetically validated design optimization of microwave components and antennas has made extensive use of an appropriate physics-based or empirical surrogate model and space mapping methodologies since the discovery of space mapping in 1993. Optimization techniques are also used in power-flow analysis.

Civil engineering

Optimization has been widely used in civil engineering. Construction management and transportation engineering are among the main branches of civil engineering that heavily rely on optimization. The most common civil engineering problems that are solved by optimization are cut and fill of roads, life-cycle analysis of structures and infrastructures, resource leveling, water resource allocation, traffic management and schedule optimization.

Operations research

Another field that uses optimization techniques extensively is operations research. Operations research also uses stochastic modeling and simulation to support improved decision-making. Increasingly, operations research uses stochastic programming to model dynamic decisions that adapt to events; such problems can be solved with large-scale optimization and stochastic optimization methods.

Control engineering

Mathematical optimization is used in much modern controller design. High-level controllers such as model predictive control (MPC) or real-time optimization (RTO) employ mathematical optimization. These algorithms run online and repeatedly determine values for decision variables, such as choke openings in a process plant, by iteratively solving a mathematical optimization problem including constraints and a model of the system to be controlled.

Geophysics

Optimization techniques are regularly used in geophysical parameter estimation problems. Given a set of geophysical measurements, e.g. seismic recordings, it is common to solve for the physical properties and geometrical shapes of the underlying rocks and fluids. The majority of problems in geophysics are nonlinear with both deterministic and stochastic methods being widely used.

Molecular modeling

Nonlinear optimization methods are widely used in conformational analysis.

Computational systems biology

Optimization techniques are used in many facets of computational systems biology such as model building, optimal experimental design, metabolic engineering, and synthetic biology. Linear programming has been applied to calculate the maximal possible yields of fermentation products, and to infer gene regulatory networks from multiple microarray datasets as well as transcriptional regulatory networks from high-throughput data. Nonlinear programming has been used to analyze energy metabolism and has been applied to metabolic engineering and parameter estimation in biochemical pathways.

Politics of Europe

From Wikipedia, the free encyclopedia ...