PSYC 4032 PSYCHOLOGY OF LEARNING EXAM 4 STUDYGUIDE (Chapters 5, 6, 9, 10, 12)


Thorndike and the Law of Effect

Edward Lee Thorndike (1874-1949)

While Pavlov was developing a general model of learning involving "reflexes" and
classical conditioning (an approach that was becoming popular in Europe),
Thorndike was also carrying out experiments on animal learning. Thorndike was
interested in how animals learn to solve problems. His approach was fundamentally different than Pavlov's. While Pavlov was interested in how animals react to various stimuli, Thorndike was interested in how the animal responds to a situation in the environment in an effort to achieve some result.
If Thorndike had been in Pavlov's lab he would have wondered how dogs learn to
produce specific behaviour in order to get food. (For example, some dog owners
insist that their dog sit before being given food. Thorndike would have been
interested in how the animal learns this behaviour.)
Note that people had been interested in instrumental learning for a number of years before Pavlov and Thorndike started their experiments on learning. In particular, they were interested in showing that animals were capable of intelligent behaviour as a way of defending Darwin's theory of evolution. This was considered important because people who attacked the Theory of Natural Selection argued that humans were fundamentally different than other animals in terms of there ability to reason. What set Thorndike apart from his predecessors was that he was the first to investigate instrumental learning systematically using sound experimental methods.
Thorndike's Puzzle Box Procedure
Thorndike placed a hungry cat inside a "puzzle box" with food outside. Initially, the cat would become agitated and produce many different "random" behaviours in an attempt to get out of the cage. Eventually, the cat would press the paddle by chance, the door would open and the cat could escape and get the food. The cat would then be placed inside the box again and would again take a long time (on average) to escape after exhibiting many different behaviours.
Puzzle Box

Thorndike examined the time to escape (his operational definition of learning was thus latency) as a function of trials. The learning curve was gradual and uneven (see below).

There was little evidence of sudden insight (intelligence in the "Sherlock Holmes/CSI Miami" sense). Nevertheless, after about thirty trials, the cats
would press the paddle almost as soon as they were placed in the cage. Thorndike concluded that the animals learned by "trial and success". Based on observation such as these, Thorndike proposed a general theory of
learning which is called the Law of Effect. This law of effect states that:
       "The consequences of a response determine whether the tendency to
       perform it is strengthened or weakened. If the response is followed by
       a satisfying event (e.g., access to food), it will be strengthened; of the
       response is not followed by a satisfying event, it will be weakened."
The Law of Effect starts with the assumption that when an animal encounters a
new environment, it will initially produce largely random behaviours (e.g.,
scratching, digging, etc.). Over repeated trials, the animal will gradually associate some of these behaviours with good things (e.g., access to food) and these behaviours will be more likely to occur again. In Thorndike's terms, these
behaviours are "stamped in". Other behaviours that have no useful consequences are "stamped out" (see below).

Because, the more useful behaviours are more and more likely to be performed,
the animal is more and more likely to complete the task quickly. Thus, in the cat in the box example, the time to escape will tend to decrease. Note that according to Thorndike's view of learning, there is no need to postulate any further intelligent processes in the animal. There is no need to assume that the
animal notices the causal connection between the act and its consequence and no need to believe that the animal was trying to attain some goal. The animal simply learns to associate certain behaviours with satisfaction such that these behaviours become more likely to occur. Thorndike called this type of learning instrumental learning. The animal learns to produce a response that is intrumental in getting satisfaction.

Skinner and Operant Learning

                Burrhus Fredric Skinner (1904-1990)

Picking up on Thorndike's research on instrumental learning was another Harvard graduate student in Psychology --approximately 40 years after Thorndike left Harvard for Columbia University -- B. F. Skinner. Skinner replaced the term instrumental learning with the term operarant learning refined Thorndike's terminology and methodology  to fit the new paradigm in psychology -- Behaviorism as well as Ernst Mach's paradigmitic contribution to physics Operationalism. He began by elucidating differences between Pavlovian/Watsonian type classical conditioning and his operant learning--showing that they were fundamentally different processes.

In classical conditioning:

In operant learning:
Skinner replaced Thorndike's term instrumental responses with the term operant responses or simply operants because they operate on the world to produce a consequence (feedback from the world that has just been operated on). He also referred to instrumental learning as operant learning. Thus, operant learning is: The process through which the consequence of an operant (behavior) affects the likelihood that the behavior will be produced again in the future. Unlike reflexes, operant behaviors can be accomplished in a number of ways (compare an eyeblink to pressing a paddle) and are what we normally think of as voluntary actions. In operant learning, the emphasis is on the consequences of a motor act rather than the act in and of itself. Skinner, like Thorndike, believed in the Law of Effect. He believed that the
tendency to emit an operant behavior is strengthened or weakened by the consequences of the response. However, he avoided mentalistic terms and interpretations. Thus for example he used the term reinforcer, instead of reward, to refer to the stimulus change that occurs after a behavior and tends to make that behavior more likely to occur in the future. (The term "satisfaction" was distasteful to Skinner due to its anthropomorphizing and loose use of language . After all, how can we know if a cat is satisfied by the food is gets when it escapes from the cage? All we really know is that the behavior leading to the food will be more likely to occur again in a similar situation.) Thus Thorndike's law of effect becomes a logical contingency:
If: Bx(A) ---> Sx(1) ---> Bx(A)=
then: Sx(1)= SR

If: Bx(A) ---> Sx(1) ---> Bx(A)=
then: Sx(1)= SP
and its first corrolary refers to the manner in which the Punishing stimulus conditions (Sp) or the Reinforcing stimulus conditions (Sr) are achieved:


Bx(A)=

Bx(A)=

Sx added following Bx

Positive

reinforcement

Positive

punishment

Sx removed following Bx

Negative

reinforcement

Negative

punishment

Skinner Box
Skinner developed a new method for studying operant learning using what is
commonly called a "Skinner box". Skinner boxes are also called operant chambers.
Operant Chamber Illustration

A Skinner box is a cage with a lever or some other mechanism that the animal can operate to produce some effect, such as the delivery of a small amount of juice. The advantage of the Skinner box over Thorndike's puzzle box is that the animal does not have to be replaced into the cage on each trial. With the Skinner box, the animal is left in the box for the experimental session and is free to respond whenever it wishes. The standard measurement used by Skinner to assess operant learning was the rate of responses. (This was Skinner's operational definition of learning.) Skinner and his followers argued that virtually everything we do can be understood as operant or instrumental responses that occur because of their past reinforcement and that this is independent of whether or not we are aware of the consequences of our behaviour. For example, if the students in a class all smiled when the Professor walks to the right side of the room but put on blank expressions when he or she walks to the left, there is a good chance that the Professor will end up spending most of the lecture on the right - even thought he or she is not aware of what is happening. A more simple effect you can have on your Professor is to simply be alert and enthusiastic. This will tend to make him or her more enthusiastic and you will get a better lecture.
Four Consequences of Behaviour
As mentioned above, Skinner believed that operant behaviour (i.e., operant
responses) is determined by its consequences. He identified four possible
consequences of behaviour:
1) Positive Reinforcement
      Any stimulus that increases the probability of a behaviour (e.g.,
      access to fish is a positive reinforcer for a cat).
      Familiar examples of positive reinforcement: studying and gambling.
2) Negative Reinforcement
      Any stimulus whose removal increases the probability of a behaviour.
      For example, bar pressing that turns off a shock.
3) Positive Punishment
      Any stimulus whose presence (as opposed to absence in -ve
      reinforcement) decreases the probability of behaviour. For example,
      bar press that leads to a shock.
4) Negative Punishment
    Any stimulus whose removal decreases the probability of a behaviour. Skinner thought that punishment was the least effective of the 4 possible consequences for learning.
Processes Associated with Operant Conditioning
As with classical conditioning, there are a number of processes in operant
conditioning.
Shaping
Imagine a rat in Skinner box where a pellet of food will be delivered whenever the animal presses a lever. What happens if the rat in a Skinner box never presses the lever? To deal with this problem, one can use a procedure known as "shaping". One might start by providing the reinforcement when the rat gets close to the lever. This increases the chance that the rat will touch the lever by accident. Then you provide reinforcement when the animal touches the lever but not when the animal is near the lever. Now you hope the animal will eventually press the lever and, when it does, you only reinforce pressing.
Thus, shaping involves reinforcing behaviours that are increasingly similar to
desired behavior. (Shaping is sometimes called the method of successive approximations.)


Extinction and Spontaneous Recovery
Extinction in operant conditioning is similar to extinction in classical conditioning. If the reinforcer is no longer paired with the response, the response decreases. e.g., people stop smiling if you do not smile back.
The response also exhibits spontaneous recovery some time after the extinction session.
Extinction of operant learning has two non-intuitive facts:
   1. The larger the Reinforcer, the more rapid the extinction.
   2. The greater the number of training trials, the more rapid the extinction.
This may reflect the fact that the onset of extinction is more "obvious".
Stimulus Control
The instrumental or operant response in operant condition is not elicited by an external stimuli but is, in Skinner's terms, emitted from within. But this does not
mean that external stimuli have no effect. In fact they do exert considerable control over behaviour because they serve as discriminative stimuli.
Suppose a pigeon is trained to hop on a treadle to get some grain. When a green
light comes on, hopping on the treadle will pay off, but when a red light comes on it will not. In this case, the green light becomes a positive discriminative stimuli (S+) and the red light becomes a negative discriminative stimuli (S-).
Note that the S+ does not signal food in the way that the CS+ might in a Pavlov's
laboratory. (Recall the example with the black and gray squares where after
training the animal salivates in response to a black square, the CS+, but not a gray square.) Instead, the S+ signals a particular relationship between the instrumental response and the reinforcer telling the pigeon "if you jump now, you will get food." A variety of techniques have been used to study the role of discriminative stimuli in operant learning and many of the results mirror those of generalization of discrimination in classical conditioning.
For example, if a pigeon is trained to respond only when a yellow light appeared,
after training, it will also respond to lights of a different colour. However, there is a response gradient - the response decreases with the size of the difference
(measured in terms of wave frequency) between the test light and original yellow
light (i.e., the original discriminative stimulus).

The following cartoon illustrates an experiment in which a rat learns to discriminate between a triangle and a square in order to get food.
                    Discriminative Stimuli (after Lashley, 1930)


Reinforcement Schedules in Operant Conditioning
A major area of research in Operant Learning is on the effects of different
reinforcement schedules. The first distinction is between partial and continuous
reinforcement.
In initial training, continuous reinforcement is the most efficient but after a response is learned, the animal will continue to perform with partial reinforcement. Extinction is slower following partial reinforcement than following continuous reinforcement. Skinner and others have described four basic schedules of partial reinforcement which have different effects on the rate and pattern of responding.
We have fixed and variable interval and ratio schedules.
         Ratio schedules: reinforcer given after some number of responses.
         Interval schedules: reinforcer given after some time period.
         Fixed: the number of responses or time period is held constant.
         Variable: the number of responses or the time period is varied around a mean.
Typical Behaviour with the 4 Schedules:
Response rate is generally higher with the ratio schedules.

Yes I know that Operant Learning Theory is:

laiden with terms, but we need this exact vocabulary to avoid misunderstandings that might occur if we relied on layman's terms. Example:  Although we commonly understand that a "reward" provides a subject with some psychological pleasure or satisfaction, the term has at least 3 significant drawbacks for the scientist:
1. It is difficult to quantify notions such as "pleasure"
2. What is pleasurable to one person isn't necessarily to another;
3. Perhaps most important, the concept of reward tells us nothing about how behavior will be affected by the stimulus.
Reinforcer, on the other hand, is more carefully defined.  Indeed, whether an event is a reinforcer depends only on the effect it has on behavior.
Primary vs. Conditioned Reinforcement:
The final part of the puzzle comes in the form of conditioned reinforcement (Sr) NOTICE THE LOWER CASE "r"!!!. This is a concept that seems to cause a fair amount of confusion, so follow along...imagine a rat in an operant chamber who has learned that if it presses a lever when a green light is on it will receive food (an example of a Primary Reinforcer--stimuli needed for survival by the organism --food, water, oxygen, etc.) But not when a red light is on. In the example of our friend the rat, above, the S+  component seems pretty straight-forward. But the green light, because it has been associated with food in the past, can be used as a reinforcer to maintain behavior as well. Like a CS, the reinforcing properties of the green light exist because of the unique past history of this individual rat (rats which have not been trained in the presence of green lights signaling the availability of food show little interest in green lights). Lever pressing can be maintained, at least to some extent, by making the appearance of the green light contingent upon some behavior of the rat, such as pressing another lever. Now, where Sr gets complex (sorry!) is that there are two basic types of conditioned reinforcement--one type (token reinforcement) in which the Sr is a necessary step along the way to receiving Primary reinforcement (SR), and one in which Sr is not a necessary condition for obtaining reinforcement but has reinforcing properties because it has been associated with powerful primary/biologically significant reinforcing events in the past. Most discussions of Sr are about token reinforcement. The term refers to the fact that with this type of reinfrocement, the Sr must be accumulated (like money, or tokens) before the Primary reinforcement (SR) can be obtained. For example, getting points on a test, a grade in a class, etc., are examples of token reinforcement in that a student needs to pass a test (get points, or tokens), get a passing grade in a class, etc., to eventually graduate and get a diploma (another Sr, but more on this later). In this case, Sr makes perfectly good sense that it will maintain behavior, since the bigger contingency out there can't be had without picking up this one along the way. That's not to say that the Sr is unimportant; it does maintain the behavior by providing a small, but immediate, reinforcement for the behavior.

The other kind of Sr is usually much more subtle. We don't need to get it, like we do with the "necessary step" type. But it is still very important in our lives. Getting a smile when we've said an encouraging word to a friend, or receiving a smile when we really need it, may not be necessary (in that it is not tied into a larger reinforcement system), but it still "feels good". And why does a smile have reinforcing properties? Because, like the green light for the rat, it has been present when other "good" things have happened to us.

Operant Learning (B.F. Skinner in his own words)
It has long been known that behavior is affected by its consequences.  We reward and punish people, for example, so that they will behave in different ways.  A more specific effect of a consequence was first studied experimentally by Edward L. Thorndike in a well-known experiment.  A cat enclosed in a box struggled to escape and eventually moved the latch which opened the door.  When repeatedly enclosed in a box, the cat gradually ceased to do those things which had proved ineffective ("errors") and eventually made the successful response very quickly.

 In operant conditioning, behavior is also affected by its consequences, but the process is not trial-and-error learning.  It can best be explained with an example.  A hungry rat is placed in a semi-soundproof box.  For several days bits of food are occasionally delivered into a tray by an automatic dispenser.  The rat soon goes to the tray immediately upon hearing the sound of the dispenser.  A small horizontal section of a lever protruding from the wall has been resting in its lowest position, but it is now raised slightly so that when the rat touches it, it moves downward.  In doing so it closes an electric circuit and operates the food dispenser.  Immediately after eating the delivered food the rat begins to press the lever fairly rapidly.  The behavior has been strengthened or reinforced by a single consequence.  The rat was not "trying" to do anything when it first touched the lever and it did not learn from "errors."

 To a hungry rat, food is a natural reinforcer, but the reinforcer in this example is the sound of the food dispenser, which was conditioned as a reinforcer when it was repeatedly followed by the delivery of food before the lever was pressed.  In fact, the sound of that one operation of the dispenser would have had an observable effect even though no food was delivered on that occasion, but when food no longer follows pressing the lever, the rat eventually stops pressing.  The behavior is said to have been extinguished.

 An operant can come under the control of a stimulus.  If pressing the lever is reinforced when a light is on but not when it is off, responses continue to be made in the light but seldom, if at all, in the dark.  The rat has formed a discrimination between light and dark.  When one turns on the light, a response occurs, but that is not a reflex response.

 The lever can be pressed with different amounts of force, and if only strong responses are reinforced, the rat presses more and more forcefully.  If only weak responses are reinforced, it eventually responds only very weakly.  The process is called differentiation.

 A response must first occur for other reasons before it is reinforced and becomes an operant.  It may seem as if a very complex response would never occur to be reinforced, but complex responses can be shaped by reinforcing their component parts separately and putting them together in the final form of the operant.

 Operant reinforcement not only shapes the topography of behavior, it maintains it in strength long after an operant has been formed.  Schedules of reinforcement are important in maintaining behavior.  If a response has been reinforced for some time only once every five minutes, for example, the rat soon stops responding immediately after reinforcement but responds more and more rapidly as the time for the next reinforcement approaches.  (That is called a fixed-interval schedule of reinforcement.)  If a response has been reinforced n the average every five minutes but unpredictably, the rat responds at a steady rate.  (That is a variable-interval schedule of reinforcement.)  If the average interval is short, the rate is high; if it is long, the rate is low.

 If a response is reinforced when a given number of responses has been emited, the rat responds more and more rapidly as the required number is approached.  (That is a fixed-ratio schedule of reinforcement.)  The number can be increased by easy stages up to a very high value; the rat will continue to respond even though a response is only very rarely reinforced.  "Piece-rate pay" in industry is an example of a fixed-ratio schedule, and employers are sometimes tempted to "stretch" it by increasing the amount of work required for each unit of payment.  When reinforcement occurs after an average number of responses but unpredictably, the schedule is called variable-ratio.  It is familiar in gambling devices and systems which arrange occasional but unpredictable payoffs.  The required number of responses can easily be stretched, and in a gambling enterprise such as a casino the average ratio must be such that the gambler loses in the long run if the casino is to make a profit.

 Reinforcers may be positive or negative.  A positive reinforcer reinforces when it is presented; a negative reinforcer reinforces when it is withdrawn.  Negative reinforcement is not punishment.  Reinforcers always strengthen behavior; that is what "reinforced" means.  Punishment is used to suppress behavior.  It consists of removing a positive reinforcer or presenting a negative one.  It often seems to operate by conditioning negative reinforcers.  The punished person henceforth acts in ways which reduce the threat of punishment and which are incompatible with, and hence take the place of, the behavior punished.

 This human species is distinguished by the fact that its vocal responses can be easily conditioned as operants.  There are many kinds of verbal operants because the behavior must be reinforced only through the mediation of other people, and they do many different things.  The reinforcing practices of a given culture compose what is called a language.  The practices are responsible for most of the extraordinary achievements of the human species.  Other species acquire behavior from each other through imitation and modelling (they show each other what to do), but they cannot tell each other what to do.  We acquire most of our behavior with that kind of help.  We take advice, heed warnings, observe rules, and obey laws, and our behavior then comes under the control of consequences which would otherwise not be effective.  Most of our behavior is too complex to have occurred for the first time without such verbal help.  By taking advice and following rules we acquire a much more extensive repertoire than would be possible through a solitary contact with the environment.

 Responding because behavior has had reinforcing consequences is very different from responding by taking advice, following rules, or obeying laws.  We do not take advice because of the particular consequence that will follow; we take it only when taking other advice from similar sources has already had reinforcing consequences.  In general, we are much more strongly inclined to do things if they have had immediate reinforcing consequences than if we have been merely advised to do them.

 The innate behavior studied by ethologists is shaped and maintained by its contribution to the survival of the individual and species.  Operant behavior is shaped and maintained by its consequences for the individual.  Both processes have controversial features.  Neither one seems to have any place for a prior plan or purposes.  In both, selection replaces creation.

 Personal freedom also seems threatened.  It is only the feeling of freedom, however, which is affected.  Those who respond because their behavior has had positively reinforcing consequences usually feel free.  They seem to be doing what they want to do.  Those who respond because the reinforcement has been negative and who are therefore avoiding or escaping from punishment are doing what they have to do and do not feel free.  These distinctions do not involve the fact of freedom.

 The experimental analysis of operant behavior has led to a technology often called behavior modification.  It usually consists of changing the consequences of behavior, removing consequences which have caused trouble, or arranging new consequences for behavior which has lacked strength.  Historically, people have been controlled primarily through negative reinforcement that is, they have been punished when they have not done what is reinforcing to those who could punish them.  Positive reinforcement has been less often used, partly because its effect is slightly deferred, but it can be as effective as negative reinforcement and has many fewer unwanted byproducts.  For example, students who are punished when they do not study may study, but they may also stay away from school (truancy), vandalize school property, attack teachers, or stubbornly do nothing.  Redesigning school systems so that what students do is more often positively reinforced can make a great difference.

III.    The First Seeds of Cognitive Psychology (E.C. Tolman)

Overview:

Edward Chance Tolman (Uh oh... we might be computers after all)
A number of studies in the Berkeley laboratory of Edward Tolman appeared both to show flaws in the law of effect as well as radical Behaviorism as promoted by Skinner and his followers ...and to require (gasp!!)mental representation in their explanation. For example, rats were allowed to explore a maze in which there were three routes of different lengths between the starting position and the goal. The rats behavior when the maze was blocked implied that they must have some sort of mental map of the maze. The rats prefer the routes according to their shortness, so, when the maze is blocked at point A, stopping them using the shortest route, they will choose the second shortest route. When, however, the maze is blocked at point B the rats does not retrace his steps and use route 2, which would be predicted according to the law of effect, but rather uses route 3 . The rat must be recognising that block B will stop him using route 2 by using some memory of the layout of the maze. Tolman's group also showed that animals could use knowledge they gained learning a maze by running to navigate it swimming and that unexpected changes in the quality of reward could weaken learning even though the animal was still rewarded. This result was developed further by Crespi who, in 1942, showed that unexpected decreases in reward quantity caused rats temporarily to run a maze more slowly than normal while unexpected increases caused a temporary elevation in running speed (The animals are making stastical calculations, and using mathematical spacial navigation algorithims, and at the very least vector algebra/analytical geometry and trigonometry to a degree that would no doubt impress both Rene Descartes and Pythagoras).  
 
At the same time as this work was appearing in the USA the Polish psychologists Konorski and Miller began the first cognitive analyses of classical conditioning - the forerunners of the work of Rescorla, Wagner, Dickinson and Mackintosh. In case you had forgotten here is a very basic review of the Rescorla/Wagner reinterpretation of Pavlovian conditioning as Cognitive Neuroscience in the Information Processing tradition: According to Rescorla and Kamin, associations are only learned when a surprising event accompanies a CS. In a normal simple conditioning experiment the US is surprising the first few times it is experienced so it is associated with salient stimuli which immediately precede it. In a blocking experiment once the association between the CS (CS1) presented in the first phase of the procedure and the US has been made the US is no longer surprising (since it is predicted by CS1). In the second phase, where both CS1  and CS2 are experienced, as the US is no longer surprising it does not induce any further learning and so no association is made between the US and CS2. This explanation was presented by Rescorla and Wagner (1972) as a formal model of conditioning which expresses the capacity a CS has to become associated with a US at any given time. This associative strength of the US to the CS is referred to by the letter V and the change in this strength which occurs on each trial of conditioning is called dV. The more a CS is associated with a US the less additional association the US can induce. This informal explanation of the role of US surprise and of CS (and US) salience in the process of conditioning can be stated as follows:
dV = ab(L - V)
where a is the salience (intensity) of the US, b is the salience (intensity) of the CS and L is the amount of processing given to a completely unpredicted US. In words: when the US is first encountered the CS has no association to it so V is zero. On the first trial the CS gains a strength of abL in its association with the US which is proportional to the saliences of the CS and the US and to the initial amount of processing given to the US. As we start trial two the associative strength is V is abL so the change in strength that occurs with the second pairing of the CS and US is ab(L - abL). It is smaller than the amount learned on the first trial and this reduction in amount that is learned reflects the fact that the CS now has some association with the US, so the US is less surprising (cute...very cute--oops I'm not supposed to impose my opinions). As more trials ensue, the equation predicts a gradually decreasing rate of learning which reaches an asymptote at L.
However, the diagram below shows: this is not what is seen when the development CS-US associations is measured over time. Instead the learning curve is sigmoidal. Rescorla has argued that the equation is consistent with observed behavior if one assumes that very small changes in associative strength are undetectable and that there is a limit to the amount of effect that very large changes can have on behavior.

CS-US aquisition
There are other respects, however, where the model performs better in predicting experimental outcomes. It can also be applied to a number of CSs each of which contributes to an overall associative strength V of the US in the right hand side of the equation. It is reasonably clear that the presence of the CS salience term b in the equation lets it account for overshadowing. The meaning of the equation is clearest if the specific dVs on the left hand side are seen as referring to the increments in association between specific CSs while V on the right hand side is referring to the predictability of the US and so is the sum of all the different CS-US associations. If the conditioning strength accrued to CS1 is denoted by dV1 and that to CS2 by dV2 then our equations are:
dV1 = ab1(L - V)
dV2 = ab2(L - V)
and both dV1 and dV2 accrue to V on each trial. The amount of association directed to each CS is proportional to their salience.
The equation also models blocking well. During the initial phase of a blocking experiment the associative strength of the US is increased so later, when a second CS is presented the amount of associative strength it can gain has been reduced.
The critical question is, however, does the model predict experimental outcomes it was not explicitly divised for, i.e. can it be generalized? In one example the model predicts the effects of pairing two previously learned CSs on learning about a third new stimulus. If on separate occasions (not as compound stimuli) two CSs of equal salience have both been completely associated with a US then V=L for both stimuli and dV on subsequent trials is zero for both. Now a third CS in conjunction with the original pair is presented so three CSs are presented together whereas only two of them were presented singly in the past. The overall associative strength of the US is now 2L, a contribution of L from both of the original CSs. The equation predicts that there will be a negative change in associative strength on this trial proportional to the salience of the CSs:
dV = ab(L - 2L)
dV = -abL
Conducting the experiment shows: the third stimulus becomes a conditioned inhibitor of the US - it provokes a CR of the opposite quality to that produced by the other two CSs.

 It was obviously onlsy a matter of time before The elegant science of behaviorism began to be co-opted by the "cognitive neuroscience" movement, AI, Neural Networking, Holographic models of Neuronal connections... etc. In other words stuff like this: 

IV.    Information Processing Theory (G. Miller)

Overview:


George A. Miller has provided two theoretical ideas that are fundamental to cognitive psychology and the information processing framework.
The first concept is "chunking" and the capacity of short term memory. Miller (1956) presented the idea that short-term memory could only hold 5-9 chunks of information (seven plus or minus two) where a chunk is any meaningful unit. A chunk could refer to digits, words, chess positions, or people's faces. The concept of chunking and the limited capacity of short term memory became a basic element of all subsequent theories of memory.
The second concept is TOTE (Test-Operate-Test-Exit) proposed by Miller, Galanter & Pribram (1960). Miller et al. suggested that TOTE should replace the stimulus-response as the basic unit of behavior. In a TOTE unit, a goal is tested to see if it has been achieved and if not an operation is performed to achieve the goal; this cycle of test-operate is repeated until the goal is eventually achieved or abandoned. The TOTE concept provided the basis of many subsequent theories of problem solving (e.g., GPS) and production systems.
Scope/Application:
Information processing theory has become a general theory of human cognition; the phenomenon of chunking has been verified at all levels of cognitive processing.
Example:
The classic example of chunks is the ability to remember long sequences of binary numbers because they can be coded into decimal form. For example, the sequence 0010 1000 1001 1100 1101 1010 could easily be remembered as 2 8 9 C D A. Of course, this would only work for someone who can convert binary to hexadecimal numbers (i.e., the chunks are "meaningful").
The classic example of a TOTE is a plan for hammering a nail. The Exit Test is whether the nail is flush with the surface. If the nail sticks up, then the hammer is tested to see if it is up (otherwise it is raised) and the hammer is allowed to hit the nail. The General Problem Solver (GPS) was a theory of human problem solving stated in the form of a simulation program (Ernst & Newell, 1969; Newell & Simon, 1972). This program and the associated theoretical framework had a significant impact on the subsequent direction of cognitive psychology. It also introduced the use of productions as a method for specifying cognitive models.
The theoretical framework was information processing and attempted to explain all behavior as a function of memory operations, control processes and rules. The methodology for testing the theory involved developing a computer simulation and then comparing the results of the simulation with human behavior in a given task. Such comparisons also made use of protocol analysis (Ericsson & Simon, 1984) in which the verbal reports of a person solving a task are used as indicators of cognitive processes.
GPS was intended to provide a core set of processes that could be used to solve a variety of different types of problems. The critical step in solving a problem with GPS is the definition of the problem space in terms of the goal to be achieved and the transformation rules. Using a means-end-analysis approach, GPS would divide the overall goal into subgoals and attempt to solve each of those. Some of the basic solution rules include: (1) transform one object into another, (2) reduce the different between two objects, and (3) apply an operator to an object. One of the key elements need by GPS to solve problems was an operator-difference table that specified what transformations were possible.
Scope/Application:
While GPS was intended to be a general problem-solver, it could only be applied to "well-defined" problems such as proving theorems in logic or geometry, word puzzles and chess.  However, GPS was the basis other theoretical work by Newell et al. such as SOAR and GOMS. Newell (1990) provides a summary of how this work evolved.
Example:
Here is a trace of GPS solving the logic problem to transform L1= R*(-P => Q) into L2=(Q \/ P)*R (Newell & Simon, 1972, p420):
Goal 1: Transform L1 into LO
 Goal 2: Reduce difference between L1 and L0
 Goal 3: Apply R1 to L1
 Goal 4: Transform L1 into condition (R1)
 Produce L2: (-P => Q) *R
 Goal 5: Transform L2 into L0
 Goal 6: Reduce difference between left(L2) and left(L0)
 Goal 7: Apply R5 to left(L2)
 Goal 8: Transform left(L2) into condition(R5)
 Goal 9: Reduce difference between left(L2) and condition(R5)
 Rejected: No easier than Goal 6
 Goal 10: Apply R6 to left(L2)
 Goal 11: Transform left(L2) into condition(R5)
 Produce L3: (P \/ Q) *R
 Goal 12: Transform L3 into L0
 Goal 13: Reduce difference between left(L3) and left(L0)
 Goal 14: Apply R1 to left(L3)
 Goal 15: Transform left(L3) into condition(R1)
 Produce L4: (Q \/ P)*R
 Goal 16: Transform L4 into L0
 Identical, QED