Operant conditioning | Extinction | ||||||||||||||||||||||||||||||
Reinforcement Increase behaviour | Punishment Decrease behaviour | ||||||||||||||||||||||||||||||
Positive Reinforcement Add appetitive stimulus following correct behavior | Negative Reinforcement | Positive Punishment Add noxious stimulus following behaviour | Negative Punishment Remove appetitive stimulus following behavior | ||||||||||||||||||||||||||||
Escape Remove noxious stimulus following correct behaviour | Active Avoidance Behaviour avoids noxious stimulus | ||||||||||||||||||||||||||||||
Operant conditioning (also called instrumental conditioning) is a learning process through which the strength of a behavior is modified by reinforcement or punishment. It is also a procedure that is used to bring about such learning.
Although operant and classical conditioning both involve behaviors controlled by environmental stimuli, they differ in nature. In operant conditioning, stimuli present when a behavior is rewarded or punished come to control that behavior. For example, a child may learn to open a box to get the sweets inside, or learn to avoid touching a hot stove; in operant terms, the box and the stove are "discriminative stimuli". Operant behavior is said to be "voluntary": for example, the child may face a choice between opening the box and petting a puppy.
In contrast, classical conditioning involves involuntary behavior based on the pairing of stimuli with biologically significant events. For example, sight of sweets may cause a child to salivate, or the sound of a door slam may signal an angry parent, causing a child to tremble. Salivation and trembling are not operants; they are not reinforced by their consequences, and they are not voluntarily "chosen".
The study of animal learning in the 20th century was dominated by the analysis of these two sorts of learning, and they are still at the core of behavior analysis.
Historical note
Thorndike's law of effect
Operant conditioning, sometimes called instrumental learning, was first extensively studied by Edward L. Thorndike (1874–1949), who observed the behavior of cats trying to escape from home-made puzzle boxes.
A cat could escape from the box by a simple response such as pulling a
cord or pushing a pole, but when first constrained, the cats took a long
time to get out. With repeated trials ineffective responses occurred
less frequently and successful responses occurred more frequently, so
the cats escaped more and more quickly. Thorndike generalized this finding in his law of effect,
which states that behaviors followed by satisfying consequences tend to
be repeated and those that produce unpleasant consequences are less
likely to be repeated. In short, some consequences strengthen behavior and some consequences weaken behavior. By plotting escape time against trial number Thorndike produced the first known animal learning curves through this procedure.
Humans appear to learn many simple behaviors through the sort of
process studied by Thorndike, now called operant conditioning. That is,
responses are retained when they lead to a successful outcome and
discarded when they do not, or when they produce aversive effects. This
usually happens without being planned by any "teacher", but operant
conditioning has been used by parents in teaching their children for
thousands of years.
B. F. Skinner
B.F. Skinner
(1904–1990) is referred to as the father of operant conditioning, and
his work is frequently cited in connection with this topic. His 1938
book "The Behavior of Organisms: An Experimental Analysis", initiated his lifelong study of operant conditioning and its application to human and animal behavior. Following the ideas of Ernst Mach,
Skinner rejected Thorndike's reference to unobservable mental states
such as satisfaction, building his analysis on observable behavior and
its equally observable consequences.
Skinner believed that classical conditioning was too simplistic
to be used to describe something as complex as human behavior. Operant
conditioning, in his opinion, better described human behavior as it
examined causes and effects of intentional behavior.
To implement his empirical approach, Skinner invented the operant conditioning chamber, or "Skinner Box",
in which subjects such as pigeons and rats were isolated and could be
exposed to carefully controlled stimuli. Unlike Thorndike's puzzle box,
this arrangement allowed the subject to make one or two simple,
repeatable responses, and the rate of such responses became Skinner's
primary behavioral measure.
Another invention, the cumulative recorder, produced a graphical
record from which these response rates could be estimated. These records
were the primary data that Skinner and his colleagues used to explore
the effects on response rate of various reinforcement schedules.
A reinforcement schedule may be defined as "any procedure that delivers
reinforcement to an organism according to some well-defined rule".
The effects of schedules became, in turn, the basic findings from which
Skinner developed his account of operant conditioning. He also drew on
many less formal observations of human and animal behavior.
Many of Skinner's writings are devoted to the application of operant conditioning to human behavior. In 1948 he published Walden Two, a fictional account of a peaceful, happy, productive community organized around his conditioning principles. In 1957, Skinner published Verbal Behavior,
which extended the principles of operant conditioning to language, a
form of human behavior that had previously been analyzed quite
differently by linguists and others. Skinner defined new functional
relationships such as "mands" and "tacts" to capture some essentials of
language, but he introduced no new principles, treating verbal behavior
like any other behavior controlled by its consequences, which included
the reactions of the speaker's audience.
Concepts and procedures
Origins of operant behavior: operant variability
Operant
behavior is said to be "emitted"; that is, initially it is not elicited
by any particular stimulus. Thus one may ask why it happens in the
first place. The answer to this question is like Darwin's answer to the
question of the origin of a "new" bodily structure, namely, variation
and selection. Similarly, the behavior of an individual varies from
moment to moment, in such aspects as the specific motions involved, the
amount of force applied, or the timing of the response. Variations that
lead to reinforcement are strengthened, and if reinforcement is
consistent, the behavior tends to remain stable. However, behavioral
variability can itself be altered through the manipulation of certain
variables.
Modifying operant behavior: reinforcement and punishment
Reinforcement and punishment are the core tools through which operant
behavior is modified. These terms are defined by their effect on
behavior. Either may be positive or negative.
- Positive reinforcement and negative reinforcement increase the probability of a behavior that they follow, while positive punishment and negative punishment reduce the probability of behaviour that they follow.
Another procedure is called "extinction".
- Extinction occurs when a previously reinforced behavior is no longer reinforced with either positive or negative reinforcement. During extinction the behavior becomes less probable. Occasional reinforcement can lead to an even longer delay before behavior extinction due to the learning factor of repeated instances becoming necessary to get reinforcement, when compared with reinforcement being given at each opportunity before extinction.
There are a total of five consequences.
- Positive reinforcement occurs when a behavior (response) is rewarding or the behavior is followed by another stimulus that is rewarding, increasing the frequency of that behavior. For example, if a rat in a Skinner box gets food when it presses a lever, its rate of pressing will go up. This procedure is usually called simply reinforcement.
- Negative reinforcement (a.k.a. escape) occurs when a behavior (response) is followed by the removal of an aversive stimulus, thereby increasing the original behavior's frequency. In the Skinner Box experiment, the aversive stimulus might be a loud noise continuously inside the box; negative reinforcement would happen when the rat presses a lever to turn off the noise.
- Positive punishment (also referred to as "punishment by contingent stimulation") occurs when a behavior (response) is followed by an aversive stimulus. Example: pain from a spanking, which would often result in a decrease in that behavior. Positive punishment is a confusing term, so the procedure is usually referred to as "punishment".
- Negative punishment (penalty) (also called "punishment by contingent withdrawal") occurs when a behavior (response) is followed by the removal of a stimulus. Example: taking away a child's toy following an undesired behavior by him/her, which would result in a decrease in the undesirable behavior.
- Extinction occurs when a behavior (response) that had previously been reinforced is no longer effective. Example: a rat is first given food many times for pressing a lever, until the experimenter no longer gives out food as a reward. The rat would typically press the lever less often and then stop. The lever pressing would then be said to be "extinguished."
It is important to note that actors (e.g. a rat) are not spoken of as being reinforced, punished, or extinguished; it is the actions
that are reinforced, punished, or extinguished. Reinforcement,
punishment, and extinction are not terms whose use is restricted to the
laboratory. Naturally-occurring consequences can also reinforce, punish,
or extinguish behavior and are not always planned or delivered on
purpose.
Schedules of reinforcement
Schedules
of reinforcement are rules that control the delivery of reinforcement.
The rules specify either the time that reinforcement is to be made
available, or the number of responses to be made, or both. Many rules
are possible, but the following are the most basic and commonly used:
- Fixed interval schedule: Reinforcement occurs following the first response after a fixed time has elapsed after the previous reinforcement. This schedule yields a "break-run" pattern of response; that is, after training on this schedule, the organism typically pauses after reinforcement, and then begins to respond rapidly as the time for the next reinforcement approaches.
- Variable interval schedule: Reinforcement occurs following the first response after a variable time has elapsed from the previous reinforcement. This schedule typically yields a relatively steady rate of response that varies with the average time between reinforcements.
- Fixed ratio schedule: Reinforcement occurs after a fixed number of responses have been emitted since the previous reinforcement. An organism trained on this schedule typically pauses for a while after a reinforcement and then responds at a high rate. If the response requirement is low there may be no pause; if the response requirement is high the organism may quit responding altogether.
- Variable ratio schedule: Reinforcement occurs after a variable number of responses have been emitted since the previous reinforcement. This schedule typically yields a very high, persistent rate of response.
- Continuous reinforcement: Reinforcement occurs after each response. Organisms typically respond as rapidly as they can, given the time taken to obtain and consume reinforcement, until they are satiated.
Factors that alter the effectiveness of reinforcement and punishment
The effectiveness of reinforcement and punishment can be changed.
- Satiation/Deprivation: The effectiveness of a positive or "appetitive" stimulus will be reduced if the individual has received enough of that stimulus to satisfy his/her appetite. The opposite effect will occur if the individual becomes deprived of that stimulus: the effectiveness of a consequence will then increase. A subject with a full stomach wouldn't feel as motivated as a hungry one.
- Immediacy: An immediate consequence is more effective than a delayed one. If one gives a dog a treat for sitting within five seconds, the dog will learn faster than if the treat is in thirty.
- Contingency: To be most effective, reinforcement should occur consistently after responses and not at other times. Learning may be slower if reinforcement is intermittent, that is, following only some instances of the same response. Responses reinforced intermittently are usually slower to extinguish than are responses that have always been reinforced.
- Size: The size, or amount, of a stimulus often affects its potency as a reinforcer. Humans and animals engage in cost-benefit analysis. A smaller amount of food may not, to a rat, seem a worthwhile reward for an effortful lever press. A pile of quarters from a slot machine may keep a gambler pulling the lever longer than a single quarter. Most of these factors serve biological functions. For example, the process of satiation helps the organism maintain a stable internal environment (homeostasis). When an organism has been deprived of sugar, for example, the taste of sugar is an effective reinforcer. When the organism's blood sugar reaches or exceeds an optimum level the taste of sugar becomes less effective or even aversive.
Shaping
Shaping is a conditioning method much used in animal training and in
teaching nonverbal humans. It depends on operant variability and
reinforcement, as described above. The trainer starts by identifying the
desired final (or "target") behavior. Next, the trainer chooses a
behavior that the animal or person already emits with some probability.
The form of this behavior is then gradually changed across successive
trials by reinforcing behaviors that approximate the target behavior
more and more closely. When the target behavior is finally emitted, it
may be strengthened and maintained by the use of a schedule of
reinforcement.
Noncontingent reinforcement
Noncontingent
reinforcement is the delivery of reinforcing stimuli regardless of the
organism's behavior. Noncontingent reinforcement may be used in an
attempt to reduce an undesired target behavior by reinforcing multiple
alternative responses while extinguishing the target response.
As no measured behavior is identified as being strengthened, there is
controversy surrounding the use of the term noncontingent
"reinforcement".
Stimulus control of operant behavior
Though initially operant behavior is emitted without an identified
reference to a particular stimulus, during operant conditioning operants
come under the control of stimuli that are present when behavior is
reinforced. Such stimuli are called "discriminative stimuli." A
so-called "three-term contingency"
is the result. That is, discriminative stimuli set the occasion for
responses that produce reward or punishment. Example: a rat may be
trained to press a lever only when a light comes on; a dog rushes to the
kitchen when it hears the rattle of his/her food bag; a child reaches
for candy when s/he sees it on a table.
Discrimination, generalization & context
Most behavior is under stimulus control. Several aspects of this may be distinguished:
- Discrimination typically occurs when a response is reinforced only in the presence of a specific stimulus. For example, a pigeon might be fed for pecking at a red light and not at a green light; in consequence, it pecks at red and stops pecking at green. Many complex combinations of stimuli and other conditions have been studied; for example an organism might be reinforced on an interval schedule in the presence of one stimulus and on a ratio schedule in the presence of another.
- Generalization is the tendency to respond to stimuli that are similar to a previously trained discriminative stimulus. For example, having been trained to peck at "red" a pigeon might also peck at "pink", though usually less strongly.
- Context refers to stimuli that are continuously present in a situation, like the walls, tables, chairs, etc. in a room, or the interior of an operant conditioning chamber. Context stimuli may come to control behavior as do discriminative stimuli, though usually more weakly. Behaviors learned in one context may be absent, or altered, in another. This may cause difficulties for behavioral therapy, because behaviors learned in the therapeutic setting may fail to occur
Behavioral sequences: conditioned reinforcement and chaining
Most
behavior cannot easily be described in terms of individual responses
reinforced one by one. The scope of operant analysis is expanded through
the idea of behavioral chains, which are sequences of responses bound
together by the three-term contingencies defined above. Chaining is
based on the fact, experimentally demonstrated, that a discriminative
stimulus not only sets the occasion for subsequent behavior, but it can
also reinforce a behavior that precedes it. That is, a discriminative
stimulus is also a "conditioned reinforcer". For example, the light that
sets the occasion for lever pressing may be used to reinforce "turning
around" in the presence of a noise. This results in the sequence "noise –
turn-around – light – press lever – food". Much longer chains can be
built by adding more stimuli and responses.
Escape and avoidance
In
escape learning, a behavior terminates an (aversive) stimulus. For
example, shielding one's eyes from sunlight terminates the (aversive)
stimulation of bright light in one's eyes. (This is an example of
negative reinforcement, defined above.) Behavior that is maintained by
preventing a stimulus is called "avoidance," as, for example, putting
on sun glasses before going outdoors. Avoidance behavior raises the
so-called "avoidance paradox", for, it may be asked, how can the
non-occurrence of a stimulus serve as a reinforcer? This question is
addressed by several theories of avoidance (see below).
Two kinds of experimental settings are commonly used: discriminated and free-operant avoidance learning.
Discriminated avoidance learning
A
discriminated avoidance experiment involves a series of trials in which
a neutral stimulus such as a light is followed by an aversive stimulus
such as a shock. After the neutral stimulus appears an operant response
such as a lever press prevents or terminate the aversive stimulus. In
early trials, the subject does not make the response until the aversive
stimulus has come on, so these early trials are called "escape" trials.
As learning progresses, the subject begins to respond during the neutral
stimulus and thus prevents the aversive stimulus from occurring. Such
trials are called "avoidance trials." This experiment is said to involve
classical conditioning because a neutral CS (conditioned stimulus) is
paired with the aversive US (unconditioned stimulus); this idea
underlies the two-factor theory of avoidance learning described below.
Free-operant avoidance learning
In
free-operant avoidance a subject periodically receives an aversive
stimulus (often an electric shock) unless an operant response is made;
the response delays the onset of the shock. In this situation, unlike
discriminated avoidance, no prior stimulus signals the shock. Two
crucial time intervals determine the rate of avoidance learning. This
first is the S-S (shock-shock) interval. This is time between successive
shocks in the absence of a response. The second interval is the R-S
(response-shock) interval. This specifies the time by which an operant
response delays the onset of the next shock. Note that each time the
subject performs the operant response, the R-S interval without shock
begins anew.
Two-process theory of avoidance
This
theory was originally proposed in order to explain discriminated
avoidance learning, in which an organism learns to avoid an aversive
stimulus by escaping from a signal for that stimulus. Two processes are
involved: classical conditioning of the signal followed by operant
conditioning of the escape response:
- Classical conditioning of fear. Initially the organism experiences the pairing of a CS with an aversive US. The theory assumes that this pairing creates an association between the CS and the US through classical conditioning and, because of the aversive nature of the US, the CS comes to elicit a conditioned emotional reaction (CER) – "fear."
- Reinforcement of the operant response by fear-reduction. As a result of the first process, the CS now signals fear; this unpleasant emotional reaction serves to motivate operant responses, and responses that terminate the CS are reinforced by fear termination. Note that the theory does not say that the organism "avoids" the US in the sense of anticipating it, but rather that the organism "escapes" an aversive internal state that is caused by the CS. Several experimental findings seem to run counter to two-factor theory. For example, avoidance behavior often extinguishes very slowly even when the initial CS-US pairing never occurs again, so the fear response might be expected to extinguish. Further, animals that have learned to avoid often show little evidence of fear, suggesting that escape from fear is not necessary to maintain avoidance behavior.
Operant or "one-factor" theory
Some
theorists suggest that avoidance behavior may simply be a special case
of operant behavior maintained by its consequences. In this view the
idea of "consequences" is expanded to include sensitivity to a pattern
of events. Thus, in avoidance, the consequence of a response is a
reduction in the rate of aversive stimulation. Indeed, experimental
evidence suggests that a "missed shock" is detected as a stimulus, and
can act as a reinforcer. Cognitive theories of avoidance take this idea a
step farther. For example, a rat comes to "expect" shock if it fails to
press a lever and to "expect no shock" if it presses it, and avoidance
behavior is strengthened if these expectancies are confirmed.
Operant hoarding
Operant
hoarding refers to the observation that rats reinforced in a certain
way may allow food pellets to accumulate in a food tray instead of
retrieving those pellets. In this procedure, retrieval of the pellets
always instituted a one-minute period of extinction
during which no additional food pellets were available but those that
had been accumulated earlier could be consumed. This finding appears to
contradict the usual finding that rats behave impulsively in situations
in which there is a choice between a smaller food object right away and a
larger food object after some delay.
Neurobiological correlates
The first scientific studies identifying neurons that responded in ways that suggested they encode for conditioned stimuli came from work by Mahlon deLong and by R.T. Richardson. They showed that nucleus basalis neurons, which release acetylcholine broadly throughout the cerebral cortex,
are activated shortly after a conditioned stimulus, or after a primary
reward if no conditioned stimulus exists. These neurons are equally
active for positive and negative reinforcers, and have been shown to be
related to neuroplasticity in many cortical regions. Evidence also exists that dopamine
is activated at similar times. There is considerable evidence that
dopamine participates in both reinforcement and aversive learning. Dopamine pathways project much more densely onto frontal cortex regions. Cholinergic projections, in contrast, are dense even in the posterior cortical regions like the primary visual cortex. A study of patients with Parkinson's disease,
a condition attributed to the insufficient action of dopamine, further
illustrates the role of dopamine in positive reinforcement.
It showed that while off their medication, patients learned more
readily with aversive consequences than with positive reinforcement.
Patients who were on their medication showed the opposite to be the
case, positive reinforcement proving to be the more effective form of
learning when dopamine activity is high.
A neurochemical process involving dopamine has been suggested to
underlie reinforcement. When an organism experiences a reinforcing
stimulus, dopamine pathways in the brain are activated. This network of pathways "releases a short pulse of dopamine onto many dendrites, thus broadcasting a global reinforcement signal to postsynaptic neurons."
This allows recently activated synapses to increase their sensitivity
to efferent (conducting outward) signals, thus increasing the
probability of occurrence for the recent responses that preceded the
reinforcement. These responses are, statistically, the most likely to
have been the behavior responsible for successfully achieving
reinforcement. But when the application of reinforcement is either less
immediate or less contingent (less consistent), the ability of dopamine
to act upon the appropriate synapses is reduced.
Questions about the law of effect
A
number of observations seem to show that operant behavior can be
established without reinforcement in the sense defined above. Most cited
is the phenomenon of autoshaping
(sometimes called "sign tracking"), in which a stimulus is repeatedly
followed by reinforcement, and in consequence the animal begins to
respond to the stimulus. For example, a response key is lighted and then
food is presented. When this is repeated a few times a pigeon subject
begins to peck the key even though food comes whether the bird pecks or
not. Similarly, rats begin to handle small objects, such as a lever,
when food is presented nearby.
Strikingly, pigeons and rats persist in this behavior even when pecking
the key or pressing the lever leads to less food (omission training). Another apparent operant behavior that appears without reinforcement is contrafreeloading.
These observations and others appear to contradict the law of effect, and they have prompted some researchers to propose new conceptualizations of operant reinforcement. A more general view is that autoshaping is an instance of classical conditioning;
the autoshaping procedure has, in fact, become one of the most common
ways to measure classical conditioning. In this view, many behaviors
can be influenced by both classical contingencies (stimulus-response)
and operant contingencies (response-reinforcement), and the
experimenter's task is to work out how these interact.
The example of someone having a positive experience with a drug
is easy to see how drug dependence and the law of effect works. The
tolerance for a drug goes up as one continues to use it after having a
positive experience with a certain amount the first time.
It will take more and more to get that same feeling. This is when the
controlled substance in an experiment would have to be modified and the
experiment would really begin. The law of work for psychologist B. F.
Skinner almost half a century later on the principles of operant
conditioning, "a learning process by which the effect, or consequence,
of a response influences the future rate of production of that response.
Applications
Reinforcement
and punishment are ubiquitous in human social interactions, and a great
many applications of operant principles have been suggested and
implemented. The following are some examples.
Addiction and dependence
Positive and negative reinforcement play central roles in the development and maintenance of addiction and drug dependence. An addictive drug is intrinsically rewarding; that is, it functions as a primary positive reinforcer of drug use. The brain's reward system assigns it incentive salience (i.e., it is "wanted" or "desired"),
so as an addiction develops, deprivation of the drug leads to craving.
In addition, stimuli associated with drug use – e.g., the sight of a
syringe, and the location of use – become associated with the intense
reinforcement induced by the drug. These previously neutral stimuli acquire several properties: their appearance can induce craving, and they can become conditioned positive reinforcers of continued use.
Thus, if an addicted individual encounters one of these drug cues, a
craving for the associated drug may reappear. For example, anti-drug
agencies previously used posters with images of drug paraphernalia
as an attempt to show the dangers of drug use. However, such posters
are no longer used because of the effects of incentive salience in
causing relapse upon sight of the stimuli illustrated in the posters.
In drug dependent individuals, negative reinforcement occurs when a drug is self-administered in order to alleviate or "escape" the symptoms of physical dependence (e.g., tremors and sweating) and/or psychological dependence (e.g., anhedonia, restlessness, irritability, and anxiety) that arise during the state of drug withdrawal.
Animal training
Animal trainers and pet owners were applying the principles and
practices of operant conditioning long before these ideas were named and
studied, and animal training still provides one of the clearest and
most convincing examples of operant control. Of the concepts and
procedures described in this article, a few of the most salient are the
following:
(a) availability of primary reinforcement (e.g. a bag of dog yummies);
(b) the use of secondary reinforcement, (e.g. sounding a clicker
immediately after a desired response, then giving yummy);
(c) contingency, assuring that reinforcement (e.g. the clicker) follows
the desired behavior and not something else;
(d) shaping, as in gradually getting a dog to jump higher and higher;
(e) intermittent reinforcement, as in gradually reducing the frequency
of reinforcement to induce persistent behavior without satiation;
(f) chaining, where a complex behavior is gradually constructed from
smaller units.
Animal training has effects on positive reinforcement and
negative reinforcement. Schedules of reinforcements may play a big role
on the animal training case.
Applied behavior analysis
Applied behavior analysis is the discipline initiated by B. F. Skinner
that applies the principles of conditioning to the modification of
socially significant human behavior. It uses the basic concepts of
conditioning theory, including conditioned stimulus (SC), discriminative stimulus (Sd), response (R), and reinforcing stimulus (Srein or Sr for reinforcers, sometimes Save for aversive stimuli).
A conditioned stimulus controls behaviors developed through respondent
(classical) conditioning, such as emotional reactions. The other three
terms combine to form Skinner's "three-term contingency": a
discriminative stimulus sets the occasion for responses that lead to
reinforcement. Researchers have found the following protocol to be
effective when they use the tools of operant conditioning to modify
human behavior:
- State goal Clarify exactly what changes are to be brought about. For example, "reduce weight by 30 pounds."
- Monitor behavior Keep track of behavior so that one can see whether the desired effects are occurring. For example, keep a chart of daily weights.
- Reinforce desired behavior For example, congratulate the individual on weight losses. With humans, a record of behavior may serve as a reinforcement. For example, when a participant sees a pattern of weight loss, this may reinforce continuance in a behavioral weight-loss program. However, individuals may perceive reinforcement which is intended to be positive as negative and vice versa. For example, a record of weight loss may act as negative reinforcement if it reminds the individual how heavy they actually are. The token economy, is an exchange system in which tokens are given as rewards for desired behaviors. Tokens may later be exchanged for a desired prize or rewards such as power, prestige, goods or services.
- Reduce incentives to perform undesirable behavior For example, remove candy and fatty snacks from kitchen shelves.
Practitioners of applied behavior analysis (ABA) bring these
procedures, and many variations and developments of them, to bear on a
variety of socially significant behaviors and issues. In many cases,
practitioners use operant techniques to develop constructive, socially
acceptable behaviors to replace aberrant behaviors. The techniques of
ABA have been effectively applied in to such things as early intensive behavioral interventions for children with an autism spectrum disorder (ASD) research on the principles influencing criminal behavior, HIV prevention, conservation of natural resources, education, gerontology, health and exercise, industrial safety, language acquisition, littering, medical procedures, parenting, psychotherapy, seatbelt use, severe mental disorders, sports, substance abuse, phobias, pediatric feeding disorders, and zoo management and care of animals. Some of these applications are among those described below.
Child behaviour – parent management training
Providing positive reinforcement for appropriate child behaviors is a
major focus of parent management training. Typically, parents learn to
reward appropriate behavior through social rewards (such as praise,
smiles, and hugs) as well as concrete rewards (such as stickers or
points towards a larger reward as part of an incentive system created
collaboratively with the child).
In addition, parents learn to select simple behaviors as an initial
focus and reward each of the small steps that their child achieves
towards reaching a larger goal (this concept is called "successive
approximations").
Economics
Both psychologists and economists have become interested in applying
operant concepts and findings to the behavior of humans in the
marketplace. An example
is the analysis of consumer demand, as indexed by the amount of a
commodity that is purchased. In economics, the degree to which price
influences consumption is called "the price elasticity of demand."
Certain commodities are more elastic than others; for example, a change
in price of certain foods may have a large effect on the amount bought,
while gasoline and other essentials may be less affected by price
changes. In terms of operant analysis, such effects may be interpreted
in terms of motivations of consumers and the relative value of the
commodities as reinforcers.
Gambling – variable ratio scheduling
As stated earlier in this article, a variable ratio schedule yields
reinforcement after the emission of an unpredictable number of
responses. This schedule typically generates rapid, persistent
responding. Slot machines pay off on a variable ratio schedule, and they
produce just this sort of persistent lever-pulling behavior in
gamblers. The variable ratio payoff from slot machines and other forms
of gambling has often been cited as a factor underlying gambling
addiction.
Military psychology
Human beings have an innate resistance to killing and are reluctant
to act in a direct, aggressive way towards members of their own species,
even to save life. This resistance to killing has caused infantry to
be remarkably inefficient throughout the history of military warfare.
This phenomenon was not understood until S.L.A. Marshall
(Brigadier General and military historian) undertook interview studies
of WWII infantry immediately following combat engagement. Marshall's
well-known and controversial book, Men Against Fire, revealed that only
15% of soldiers fired their rifles with the purpose of killing in
combat.
Following acceptance of Marshall's research by the US Army in 1946, the
Human Resources Research Office of the US Army began implementing new
training protocols which resemble operant conditioning methods.
Subsequent applications of such methods increased the percentage of
soldiers able to kill to around 50% in Korea and over 90% in Vietnam.
Revolutions in training included replacing traditional pop-up firing
ranges with three-dimensional, man-shaped, pop-up targets which
collapsed when hit. This provided immediate feedback and acted as
positive reinforcement for a soldier's behavior.
Other improvements to military training methods have included the timed
firing course; more realistic training; high repetitions; praise from
superiors; marksmanship rewards; and group recognition. Negative
reinforcement includes peer accountability or the requirement to retake
courses. Modern military training conditions mid-brain response to combat pressure by closely simulating actual combat, using mainly Pavlovian classical conditioning and Skinnerian operant conditioning (both forms of behaviorism).
Modern marksmanship training is such an excellent example of behaviorism that it has been used for years in the introductory psychology course taught to all cadets at the US Military Academy at West Point as a classic example of operant conditioning. In the 1980s, during a visit to West Point, B.F. Skinner identified modern military marksmanship training as a near-perfect application of operant conditioning.
Lt. Col. Dave Grossman states about operant conditioning and US Military training that:
It is entirely possible that no one intentionally sat down to use operant conditioning or behavior modification techniques to train soldiers in this area…But from the standpoint of a psychologist who is also a historian and a career soldier, it has become increasingly obvious to me that this is exactly what has been achieved.
Nudge theory
Nudge theory (or nudge) is a concept in behavioural science, political theory and economics which argues that indirect suggestions to try to achieve non-forced compliance can influence the motives, incentives and decision making
of groups and individuals, at least as effectively – if not more
effectively – than direct instruction, legislation, or enforcement.
Praise
The concept of praise as a means of behavioral reinforcement is
rooted in B.F. Skinner's model of operant conditioning. Through this
lens, praise has been viewed as a means of positive reinforcement,
wherein an observed behavior is made more likely to occur by
contingently praising said behavior.
Hundreds of studies have demonstrated the effectiveness of praise in
promoting positive behaviors, notably in the study of teacher and parent
use of praise on child in promoting improved behavior and academic
performance, but also in the study of work performance.
Praise has also been demonstrated to reinforce positive behaviors in
non-praised adjacent individuals (such as a classmate of the praise
recipient) through vicarious reinforcement.
Praise may be more or less effective in changing behavior depending on
its form, content and delivery. In order for praise to effect positive
behavior change, it must be contingent on the positive behavior (i.e.,
only administered after the targeted behavior is enacted), must specify
the particulars of the behavior that is to be reinforced, and must be
delivered sincerely and credibly.
Acknowledging the effect of praise as a positive reinforcement
strategy, numerous behavioral and cognitive behavioral interventions
have incorporated the use of praise in their protocols. The strategic use of praise is recognized as an evidence-based practice in both classroom management and parenting training interventions,
though praise is often subsumed in intervention research into a larger
category of positive reinforcement, which includes strategies such as
strategic attention and behavioral rewards.
Several studies have been done on the effect cognitive-behavioral
therapy and operant-behavioral therapy have on different medical
conditions. When patients developed cognitive and behavioral techniques
that changed their behaviors, attitudes, and emotions; their pain
severity decreased. The results of these studies showed an influence of
cognitions on pain perception and impact presented explained the general
efficacy of Cognitive-Behavioral therapy (CBT) and Operant-Behavioral
therapy (OBT).
Psychological manipulation
Braiker identified the following ways that manipulators control their victims:
- Positive reinforcement: includes praise, superficial charm, superficial sympathy (crocodile tears), excessive apologizing, money, approval, gifts, attention, facial expressions such as a forced laugh or smile, and public recognition.
- Negative reinforcement: may involve removing one from a negative situation
- Intermittent or partial reinforcement: Partial or intermittent negative reinforcement can create an effective climate of fear and doubt. Partial or intermittent positive reinforcement can encourage the victim to persist – for example in most forms of gambling, the gambler is likely to win now and again but still lose money overall.
- Punishment: includes nagging, yelling, the silent treatment, intimidation, threats, swearing, emotional blackmail, the guilt trip, sulking, crying, and playing the victim.
- Traumatic one-trial learning: using verbal abuse, explosive anger, or other intimidating behavior to establish dominance or superiority; even one incident of such behavior can condition or train victims to avoid upsetting, confronting or contradicting the manipulator.
Traumatic bonding
Traumatic bonding occurs as the result of ongoing cycles of abuse in which the intermittent reinforcement of reward and punishment creates powerful emotional bonds that are resistant to change.
The other source indicated that
'The necessary conditions for traumatic bonding are that one person must
dominate the other and that the level of abuse chronically spikes and
then subsides. The relationship is characterized by periods of
permissive, compassionate, and even affectionate behavior from the
dominant person, punctuated by intermittent episodes of intense abuse.
To maintain the upper hand, the victimizer manipulates the behavior of
the victim and limits the victim's options so as to perpetuate the power
imbalance. Any threat to the balance of dominance and submission may be
met with an escalating cycle of punishment ranging from seething
intimidation to intensely violent outbursts. The victimizer also
isolates the victim from other sources of support, which reduces the
likelihood of detection and intervention, impairs the victim's ability
to receive countervailing self-referent feedback, and strengthens the
sense of unilateral dependency...The traumatic effects of these abusive
relationships may include the impairment of the victim's capacity for
accurate self-appraisal, leading to a sense of personal inadequacy and a
subordinate sense of dependence upon the dominating person. Victims
also may encounter a variety of unpleasant social and legal consequences
of their emotional and behavioral affiliation with someone who
perpetrated aggressive acts, even if they themselves were the recipients
of the aggression. '.
Video games
The majority of video games are designed around a compulsion loop,
adding a type of positive reinforcement through a variable rate
schedule to keep the player playing. This can lead to the pathology of video game addiction.
As part of a trend in the monetization of video games during the 2010s, some games offered loot boxes
as rewards or as items purchasable by real world funds. Boxes contains a
random selection of in-game items. The practice has been tied to the
same methods that slot machines and other gambling devices dole out
rewards, as it follows a variable rate schedule. While the general
perception that loot boxes are a form of gambling, the practice is only
classified as such in a few countries. However, methods to use those
items as virtual currency for online gambling or trading for real world money has created a skin gambling market that is under legal evaluation.
Workplace culture of fear
Ashforth discussed potentially destructive sides of leadership and identified what he referred to as petty tyrants: leaders who exercise a tyrannical style of management, resulting in a climate of fear in the workplace. Partial or intermittent negative reinforcement can create an effective climate of fear and doubt. When employees get the sense that bullies are tolerated, a climate of fear may be the result.
Individual differences in sensitivity to reward, punishment, and motivation have been studied under the premises of reinforcement sensitivity theory and have also been applied to workplace performance.
One of the many reasons proposed for the dramatic costs
associated with healthcare is the practice of defensive medicine.
Prabhu reviews the article by Cole and discusses how the responses of
two groups of neurosurgeons are classic operant behavior. One group
practice in a state with restrictions on medical lawsuits and the other
group with no restrictions. The group of neurosurgeons were queried
anonymously on their practice patterns. The physicians changed their
practice in response to a negative feedback (fear from lawsuit) in the
group that practiced in a state with no restrictions on medical
lawsuits.