1 Introduction

Associative learning can occur as a result of the arrangement of contingencies between stimuli and outcomes. The experimenter controls the occurrence of these events in classical conditioning (Pavlov, 1927) 131. In instrumental conditioning, the experimenter arranges the environment in such a way that a response is required for a particular outcome to occur (Skinner, 1938) 153. There may be other procedures that produce associative learning, and there are certainly other forms of learning, including habituation, various forms of perceptual learning (e.g., released image recognition learning; Suboski and Bartashunas, 1984 166), verbal learning, and the learning of complex motor sequences. The present chapter focuses on methods for determining the effects of drugs on associative learning with particular reference to instrumental conditioning. Classical conditioning also will be covered briefly, but no discussion of these other forms of learning will be included. This should not be taken as a reflection on the relative importance of various forms of learning

1.1 Methods in Appetitive Instrumental Conditioning

1.1.1 Subjects and Apparatus

A typical form of instrumental conditioning involves training rats to press a lever for food. Male rats are usually used because their activity levels are less subject to variation from day to day through the influence of hormonal changes as seen in females (Wang, 1923) 186. To further control the environment of the rats, they are often housed individually in quiet colony rooms kept at a regulated temperature (e.g., 21±1 °C) with lights automatically switched off and on at regular daily intervals (e.g., lights off at 1900 h, on at 0700 h). Food availability is controlled so that the rat has been food-deprived for some regular interval prior to instrumental conditioning sessions. This can be done by allowing the rats access to food for a fixed period of time each day (e.g., 30 min), usually some time after conditioning sessions, but at least 20 h before the next session. Alternatively, a percentage (e.g., 85%) of the rat’s body weight may be calculated. The rat then is fed a small daily ration (e.g., 5–10 g) for several days until its weight reaches this target value, and thereafter it is given a daily ration (e.g., 15–20 g) sufficient to maintain its target weight. With this latter procedure, care must be taken to adjust the target weights upward to compensate for growth, especially if young animals (less than 90 d of age) are used. This can be achieved by maintaining two or three animals on free food and adjusting the target weights of the deprived animals upward in proportion to the weight gain of these control animals

Once the rats have become accustomed to the feeding schedule, test sessions are begun. Lever press training is usually carried out in a small chamber (e.g., 20×20×20 cm) outfitted with a lever and feeder cup. The lever, located on one of the walls, can be of varying dimensions, but typically protrudes approximately 2.0 cm into the experimental chamber and is 2.0–5.0 cm wide and 1.0 cm thick. The height of the lever also varies from 2.0 cm up to 6.0 cm above the floor of the chamber. The lever is connected to a microswitch in such a way that downward displacement leads to mechanical closure of the switch. The force requirement for switch closure can vary and will affect response rates. A required force of 0.10 N often is used. Switch closures can be fed into electrically controlled switching, counting and timing devices (or a computer) that control environmental events for the rat, such as food delivery, and record responses for the experimenter. The feeder cup can be located on the same wall as the lever or on a different wall, usually at floor level. The cup also can be of varying dimensions, a typical example being 3.0 cm long, 2.0 cm wide, and 1.0 cm deep. The cup is connected via a tube to an electrically controlled food dispenser located outside of the chamber. Food is delivered in the form of commercially available pellets (e. g., 45 mg). The floor of the chamber usually is constructed with aluminum rods spaced 1.0 cm apart, allowing fecal boluses and urine to pass through the grid into a litter tray. The walls are constructed of Plexiglas or aluminum plate. The entire chamber is located in a larger outer box outfitted with a fan for ventilation. The outer box serves to attenuate noises extraneous to the experiment providing good environmental control. Outer boxes can be purchased commercially or constructed by lining a wooden box with foam rubber or styrofoam

1.1.2 Behavioral testing

Prior to the initiation of training rats to press the lever for food it is helpful to habituate the animals to the experimental environment and to deliver some food pellets into the feeder cup. This can be achieved by placing the rats into the chamber for one session (e.g., 30 min) a day for 2 or 3 d. During these sessions food pellets are dispensed periodically (e.g., once every min). After intially freezing for a brief period (e.g., 15 s to several min) following the unfamiliar “click” of the food dispenser, most food-deprived rats will explore and find and eat the food pellets. With continued pairings they will be observed to approach the feeder cup from anywhere in the chamber immediately following the dispenser click, suggesting that they have learned the association of the sound with food. The rats are now ready to be trained to lever press for food

The lever press response is shaped by the experimenter using the method of successive approximations. The experimenter uses a hand switch to operate the pellet dispenser while observing the free operant behavior of the rat. Initially, an approach toward the lever from anywhere in the chamber will be immediately followed by delivery of a food pellet. Soon the rat will be spending most of its time in the vicinity of the lever, sniffing, rearing, and possibly biting at the lever or any other small objects such as the heads of bolts. In the course of this activity the rat will eventually extend one of its forepaws toward the lever or possibly touch the lever with a paw. The experimenter immediately delivers a food pellet each time this occurs. The rat will now be placing its paw on the lever upon each return from the feeder cup. Once this point is reached, the experimenter begins to deliver a food pellet only when a slight depression of the lever occurs. Soon the rat will depress the lever far enough to close the microswitch, thereby operating the pellet dispenser on its own. The lever press response has been shaped

From this point onward, the rat will reliably approach and press the lever on numerous occasions during a test session as long as the rat continues to be food deprived and continues to receive a food pellet following the response. It was discovered, however, that it is not necessary to deliver food pellets in a one-to-one ratio to lever press responses to maintain responding. Trained rats will reliably lever press when food pellets only intermittently follow the response. Thus, rats will lever press at high rates on ratio schedules requiring 5, 10, or even 50 responses per pellet. Ratio schedules can be arranged so that the ratio value is variable (e.g., ranging from 1 to 25, but with a mean of 10) It is also possible to arrange response-dependent pellet delivery according to time. For example, an lectronic timer may close a switch after 1 min so that the next response produces pellet delivery; this is a fixed interval 1-min schedule. Variable interval schedules are similarly arranged, but with variable time intervals from pellet to pellet (e.g., ranging from 10 to 120 s, but with a mean of 1 min, a variable interval 1-min schedule). These simple schedules of pellet delivery for lever pressing can be combined into more complex composite schedules with multiple components either signaled by an external stimulus (e.g., light) or not signaled (cf., Ferster and Skinner, 1957 69)

What happens if food pellet delivery no longer occurs, even intermittently, following lever press responses? If the rat had been well trained on a lean variable interval schedule (eg., one pellet every 3 min on the average with intervals ranging from 20 s to 5 min), it will have had previous experience responding for as long as 5 min without receiving a pellet. Therefore, in a session where pellets are no longer available, no change from previous rates of responding would be expected in the first 5 min. However, as time progresses in the session, rates will be seen to decrease, showing a typical extinction curve. If the rats are tested again the next day for another extinction session (e.g., 30 min), rates initially will be higher than at the end of the previous day’ session (spontaneous recovery) and again will show an intrasession decline. This pattern of responding in extinction was first described by Skinner (1938) 153 and is highly reliable. An example of extinction over five 30-min sessions following training on a variable interval 30-s schedule averaged for a group of six rats is shown in Fig. 1.

Fig. 1.
figure 1

Mean responses per min for a group (n=6) of rats during each 5-min segment of two 30-min sessions on a variable interval 30-s schedule (VI30) and five 30-min sessions of extinction (EXT 1–5). Note the intrasession decline during extinction, spontaneous recovery at the beginning of each extinction session, and overall session-to-session decline

As was mentioned above, a trained rat will reliably press a lever as long as the rat continues to be food-deprived and continues to receive a food pellet following the response. The extinction effects of withholding pellets were just described. What would be the effect of testing a trained rat that is no longer food-deprived? This phenomenon has been termed by Morgan (1974) 124 as resistance to satiation. It has been found that satiated rats will continue to respond for food pellets, but that they fail to eat all of the pellets. Over time this responding can be seen to decline in a manner somewhat similar to that seen in extinction. An example is shown in Fig. 2. These animals received ten 30-min sessions of training on a variable-interval 30-s schedule of food presentation. Following the last session they were placed on free food for 20 d and then retested on the same schedule for 3 d. Prior to each test day, rats also received 1 h of prefeeding with food pellets like those used in training sessions in their home cages. Results showed that nondeprived rats continued to lever press for food, but at decreasing rates across days. (A subsequent study also showed that rates decreased within sessions over time.) Rats also failed to eat all of the pellets they earned. Thus, either the withholding of food for lever press responses or nondeprivation will lead to a cessation of conditioned instrumental responding. However, responding does not cease immediately, but rather shows a gradual decline within and across days.

Fig. 2.
figure 2

Mean (±SEM) responses per min for a group (n=6) of satiated rats during three 30-min sessions on a variable interval 30-s (VI 30) schedule. Sessions occurred every 5 d. The rats had received 10 sessions of VI 30 training (one session every 5 d) while being 72 h food-deprived. They had been on free food for 20 d before the test sessions shown above and received 1 h of prefeeding with food pellets like those delivered on the VI 30 schedule prior to each test session. Results show a gradual decline in responding across sessions while satiated

1.1.3 What Is Learned?

Ever since the enduring effects of behaviorally dependent food presentations on instrumental responding were described, behavioral scientists have been theorizing about what elements are involved in this form of learning. That learning has occurred seems clear. Prior to arranging a relationship between lever-press responses and food presentation, food-deprived rats press the lever rarely, perhaps five or 10 times in a 30-min session. Subsequent to establishing the relationship and a brief period of response shaping, rats press the lever frequently, as often as 100 or more times a min, and even on lean schedules of food presentation, rates seldom are lower than five responses per min. Thus, the same stimuli have a dramatically different effect on behavior as a result of the association of food with pressing the lever.

One way to conceptualize learning in this situation places the emphasis on the response. This seems reasonable since lever press response frequency (defined by microswitch closures) is the dependent measure and clearly changes as a result of food presentations. The observation of learning is defined by these response changes. Thus, it has been suggested that the presentation of food strengthens, stamps in, or reinforces the formation of an association between stimulus (e.g., the lever) and response (e.g., the lever press). Thorndike (1911) 172 was the first to state this reinforcement principle, and it was widely accepted for many years. However, it has been argued that theories that emphasize the learning of particular responses in instrumental conditioning are inadequate (cf., Bindra, 1978) 26. The basic issue concerns the specificity of the learned response. The lever press response (i.e., microswitch closure) can be achieved by pressing the lever with a paw, biting the lever, pressing the lever with the snout, or any combination of these, and is in fact observed to be made in a variety of ways by any one animal. If food presentations reinforce the formation of an association between the lever-related stimuli and the lever-press response, what exactly constitutes the response? If the microswitch closure can be made with such a variety of movements, can all of these movements become associated with the lever-related stimuli? It would appear that any attempt to define the response in terms of the muscle groups used would force the response into a rigid pattern of behavior that has been empirically shown not to occur. Thus, an explanation of learning based on the reinforcement of stimulus-response associations by food presentation appears to be inadequate (Bindra, 1978) 26

An alternative point of view that emphasizes the role of stimuli has been termed incentive motivational learning. According to this hypothesis, as described by Bindra (1978) 26, when an animal learns to press a lever for food, what is being learned is an association between the lever-related stimuli and the food. Since the food is a biologically relevant stimulus that partially satisfies the nutritional needs of the animal, it can be termed hedonic or rewarding. Thus, animals learn an association between relatively neutral lever-related environmental stimuli and a rewarding stimulus. As a consequence, when the (appropriately deprived) trained animal again encounters the lever-related stimuli, these stimuli lead to activation of a central representation of the rewarding stimulus and thereby generate the motivational state normally produced by the rewarding stimulus itself. Rewarding stimuli (e.g., food) reliably produce appetitive reactions that include instrumental responses that serve to bring the animal close to the reward and transactional responses such as eating once contact with the reward has been achieved. The ability of rewarding stimuli to produce these responses is termed incentive motivation. Thus, once an animal has learned an association between a neutral environmental stimulus and a rewarding stimulus, the incentive motivational properties of the rewarding stimulus can be produced by the previously neutral stimulus. The previously neutral stimulus has become a conditioned incentive motivational stimulus

The major advantage of the incentive learning point of view over the stimulus-response point of view concerns response flexibility. Once a neutral stimulus has become a conditioned incentive stimulus, it has an enhanced ability to attract the animal, but the actual response observed will be determined by the immediate demands of the situation. This contrasts with the stimulus-response point of view that only a particular response is learned

According to the incentive motivational learning point of view a rewarding stimulus (e.g., food presentation to a food-deprived animal) has inextricably associated with it the ability to produce appetitive reactions. However, it has been possible to show under certain circumstances that the pairing of a neutral stimulus with a rewarding stimulus leads to evidence of learning the association between the two apparently without the neutral stimulus acquiring incentive motivational properties (Beninger and Phillips, 1980, 1981) 18,19. The “certain circumstances” in this case involved blockade of dopamine receptors with pimozide (see below). Results might suggest that rewarding stimuli lead to activation of a central representation of the stimulus (including aspects ot taste, texture, appearance, and so on) and the activation of a modulating system that alters the incentive properties of associated environmental stimuli. It may be possible to selectively block incentive learning while preserving learning of the association between the neutral stimulus and the reward stimulus. This would imply that the learning of the association is not sufficient to produce incentive learning. Rather, the action of another system is required to endow neutral stimuli with incentive properties that influence their response-eliciting potency (cf., Beninger, 1983 13)

This point of view has advantages for understanding the effects of extinction or satiation on responding. It is well known that, compared to original learning, responding is reinstated more rapidly when food is once again available following extinction. According to the above hypothesis, this may occur because during extinction the incentive properties of the lever-related stimuli, but not the learning of an association between lever-related stimuli and food, were being weakened. When reward is reinstated, relearning may be faster because the latter learning is relatively intact and the lever-related stimuli simply may have once again to acquire an enhanced response-eliciting potency. In the case of satiation, responding is observed to continue for a time before decreasing to operant levels. Previous incentive theories held that the incentive motivational properties of the lever-related stimuli were learned by association with food that produced appetitive reactions. However, the ability of food to produce these reactions depended on the presence of an appropriate need state such as food deprivation (Bindra, 1978) 26. It follows that if trained animals were satiated, they should fail lo lever press because the lever-related stimuli no longer produce incentive motivation. According to the alternative hypothesis that learning the association between stimuli is independent of the modulating effects of an incentive learning system, once incentive learning has occurred, it should continue to influence responding (at least transiently) even in the absence of an appropriate need state. As shown in Fig. 2, satiated animals do continue to respond. However, as responding is no longer rewarded (food is presented, but the animals are no longer food-deprived), an extinction-like decline is seen. These theoretical issues will be discussed further with reference to particular experimental procedures in section 3

1.2 Methods in Aversive Instrumental Conditioning

1.2.1 Apparatus

A typical form of aversive conditioning involves training rats to avoid electric footshock in a shuttle box. The selection and housing of rats already has been discussed. However, for aversively motivated behaviors, it is unnecessary to food deprive the animals. Usually ad libitum access to food and water is given in the home cage. Avoidance training is often carried out in a shuttle box (e.g., 20×80×30 cm deep) outfitted with a guillotine door that separates the box into two compartments (in this case, each 40 cm long). Often the two compartments are made more discriminable by, for example, painting one black and the other gray or one with horizontal black and white stripes, the other with vertical stripes. Each compartment of the shuttle box also might be equipped with a light or tone generator that can serve as additional conditioned stimuli. The floor of each compartment consists of metal rods spaced 1.0 cm apart with a litter tray below. Electric shock is applied through the grids with the use of a commercially available shock generator that provides scrambled direct current. The shock intensity can vary, but currents below 1.0 mA are adequate to establish avoidance learning without causing excessive distress to the rats. The shuttle box can be located inside a larger ventilated, sound-attenuating box for extraneous noise suppression if the experiment is automated. For manually conducted experiments, the shuttle box should be located in a relatively undisturbed area with environmental stimuli extraneous to the box kept constant from day to day

1.2.2 Behavioral Testing

Avoidance training in a shuttle box can be one-way or two-way. One-way avoidance is so named because the rat is always required to make the avoidance response by traversing the box in the same direction. This procedure is difficult to automate because it is necessary to get the rat into the start side before beginning a trial; usually one-way avoidance training is carried out by an experimenter who moves the rat to the start side prior to each trial. Two-way avoidance requires the rat to avoid shock by moving to the side opposite the one it is in at the beginning of the trial. This task is more easily automated, but more difficult for rats to learn (Bolles, 1970) 27

A typical procedure for one-way avoidance training will be described. The parameters selected are somewhat arbitrary, and alternative values will be mentioned when possible. Rats will receive 10 trials per session (this can vary from fewer than 10 to over 50) with one session per day (more are possible) conducted at the same time each day, 7 d a week (the use of fewer days per week is possible). Each trial will begin by placing the rat, by hand, into the shock side of the shuttle box such that it faces the wall opposite the guillotine door (the direction is not important, but should be kept consistent from trial to trial, rat to rat, and session to session). The door is opened immediately, and a tone is turned on to act as a conditioned stimulus (CS) for shock. (Since the two sides of the shuttle box are different and one side is always paired with shock, it is not necessary to use an additional CS such as the tone; however, some studies involving an independent evaluation of the strength of a CS paired with shock employ a tone that subsequently can be presented in another context.) The duration of the CS is 10 s (this can range from 2 s to several min), and its offset is followed immediately by electrification of the grid floor on the shock side of the shuttle box. Once the rat reaches the safety of the other side, the door is replaced and an intertrial interval of 30 s (this can vary from 5 s to many minutes or hours) is begun. During the interval the rat remains in the safe side. (It is also possible to remove the rat to a waiting box or its home cage during the interval.) As soon as the interval elapses, the rat is picked up, placed into the shock side, and the next trial begun

Two classes of responses can be recorded. Escape responses occur when the rat remains in the shock side until shock is presented and then moves to safety. Avoidance responses occur when the rat shuttles to safety during the 10-s preshock period and avoids shock. When an avoidance response has been made, the tone is immediately turned off, no shock is presented, and the door is replaced. The timing of the intertrial interval then begins. Besides recording number of avoidance and escape responses, response latencies also can be a dependent measure. Latencies are particularly useful because they could show, for example, a drug-related change in time of occurrence of the response not reflected in changes in number of avoidance or escape responses (e.g., avoidance latencies could increase from 2.5 s to 6.0 s)

1.2.3 What Is learned?

Aversive stimuli are biologically relevant to animals. If they are not promptly escaped or avoided altogether, they could produce tissue damage leading to serious injury or death. Just as the presentation of food to a food-deprived animal is rewarding because it satisfies nutritional needs, the offset of shock is rewarding because the animal escapes potentially harmful effects. Stimuli that are repeatedly paired with painful electrical footshock (e.g., the tone or features of the shock side of the shuttle box) become conditioned aversive stimuli and their offset, like the offset of shock itself, is rewarding. In the context of reinforcement theories, the consequences of this reward are a strengthening of the association between stimuli related to safety (e.g., the safe side of the shuttle box) and the running response. Subsequently, the safety-related stimuli will elicit the running response during the preshock period, and shock will be avoided. However, as discussed in section 1.1.3, this theoretical position has been found to be inadequate because it fails to allow for the flexibility of responses usually seen in operant learning tasks

According to incentive theory, what is learned is an association between environmental stimuli (in this case the safety-related stimuli) and reward (in this case the offset of unconditioned or conditioned aversive stimuli) As a consequence, when the trained animal (in the presence of the appropriate aversive stimuli) again encounters the safety-related stimuli, these stimuli lead to activation of a central representation of the rewarding stimulus and thereby generate the motivational state normally produced by the rewarding stimulus itself. Exactly as described in section 1.1.3 for appetitive instrumental conditioning, rewarding stimuli (e.g, offset of shock or offset of conditioned aversive stimuli) produce appetitive reactions that include instrumental responses that serve to bring the animal close to the reward. Thus, it can be seen that both appetitive and aversive instrumental conditioning can be understood in the same theoretical framework that treats both forms of conditioning as examples of incentive motivational learning

As discussed in section 1 1.3, some data have shown that it is possible to dissociate the learning of an association between a neutral stimulus and a rewarding stimulus from the acquisition by the neutral stimulus of incentive motivational properties. This is equally true for aversive instrumental responding. Thus, data suggest that animals treated with dopamine receptor-blocking drugs learn that the safe side of the shuttle box is safe, being associated with the offset of aversive stimuli, but these safety-related stimuli fail to become conditioned incentive stimuli; i.e., they fail to acquire the ability to elicit operant responses (Beninger et al., 1980c) 25. Further discussion of this interesting observation will follow in section 3.1.3; but first, a brief review of some neurotransmitter systems and their possible role in learning

2 Neurotransmitter Systems and Learning

2.1 Overview of Neurotransmitter Systems

Drugs affect learning by altering chemical processes in the brain. Over the past 25 years the chemical anatomy of the brain has become known in considerable detail, and a large number of transmitter systems now has been identified (e.g., Emson, 1983 64). These include, for example, the cholinergic systems projecting from the basal forebrain to amygdala, olfactory bulb, hippocampus, and cortex, intrastriatal cholinergic neurons, and thalamopetal cholinergic neurons originating in the pons and mesencephalon (Armstrong et al., 1983 7; Fibiger, 1982 70; Mesulam et al., 1983 120). Serotonergic systems originate in brainstem raphe neurons with extensive ascending projections and descending projections to the spinal cord (Azmitia, 1978 8; Steinbusch and Nieuwenhuys, 1983 165). Dopaminergic systems originating in the ventral mesencephalon and projecting to forebrain areas including the dorsal and ventral striatum, septum, interstitial nucleus of the stria terminahs, amygdala, and portions of the cortex have been extensively studied (Lindvall, 1979 105; Lindvall and Bjorkland, 1983 106). Similarly, the norepinephrine- (noradrenaline) containing systems originating in the locus ceruleus and lateral tegmental sites, with connections distributed widely throughout the neuroaxis, have been well described (Moore and Bloom, 1979 122)

Amino acid transmitters also have been identified. The inhibitory amino acid gamma-aminobutyric acid (GABA) is widely distributed with possible neuronal pathways, including specific cerebellar cells, hippocampal basket cells, striatonigral and pallidonigral neurons, and cortical and spinal interneurons (see McGeer et al., 1978 116). Another inhibitory amino acid, glycine, may also serve as a neurotransmitter in the spinal cord and brainstem (McGeer et al., 1978) 116. The excitatory amino acid glutamate has been identified in efferents of the cortex and hippocampus and intrinsic hippocampal fibers (Fonnum et al., 1981) 77. Furthermore, a large number of peptides have been found in brain tissue and some are thought to be involved in interneuronal communication (see review by Snyder, 1980 156). For example, substance P has been localized in cells of the corpus striatum that project to the substantia nigra, cells of the habenula that project to the interpeduncular nucleus, intraamygdaloid cells, and cells of the bed nucleus of the stria terminalis that project to the medial preoptic area. Enkephalin has been localized in cells of the caudate nucleus that project to the globus pallidus. Neurotensin has been identified in a pathway originating in the central nucleus of the amygdala and projecting to the bed nucleus of the stria terminalis. Cholecystokinin is found in cells of the periaqueductal gray of the brainstem. Finally, some peptides have been found to be co-localized in neurons containing other neurotransmitters. For example, substance P coexists with serotonin (5-hydroxytryptamme, 5-HT) in the raphe nuclei of the medulla, and cholecystokinin-octapeptide (CCK-8) coexists with dopamine (DA) in the ventral tegmental area (Hokfelt et al., 1980 93; Snyder, 1980 156). Many of these transmitter systems may participate in learning. As new drugs that have relatively specific effects on individual transmitter systems are rapidly becoming available, there is a huge amount of work to be done in determining the participation of various transmitter systems in learning

2.2 Neurotransmitters Involved in Learning

A good beginning to the study of the role of various neurotransmitters in learning has been made. For example, Deutsch and coworkers found that relearning a discrimination task was differentially affected by cholinergic manipulations, depending on the time since original learning (e.g., see review by Deutsch and Rogers, 1979 50). They concluded that learning results in a systematic change in cholinergic synaptic excitability that represents a component of mnemonic processing and that lasts for at least weeks. Others, evaluating the effects of cholinergic drugs on new learning, found that intracerebroventricular (icv) injections of cholinergic antagonists impaired the retention of a discriminated active avoidance task, whereas cholinergic agonists at moderate doses improved recall (Flood et al., 1981,1983,1984,1985 73,74,75,76). Recently, there has been intense interest in the possible role of corticopetal cholinergic cells of the basal forebrain in memory. Animals treated with anticholinergics or with lesions of these cells are impaired in the acquisition and performance of tasks making specific demands on recent memory. Further discussion of this interesting work is beyond the scope of the present chapter; more information may be found in the following papers: Bartus et al. (1985) 10, Beninger et al. (1986) 16, Hepler et al. (1985) 89, Murray and Fibiger (1985) 126, and Salamone et al. (1984) 148

It has been suggested that serotonin plays a role in the learning underlying the “tuning out” of nonrewarded or irrelevant stimuli. Thus, animals treated with the tryptophan hydroxylase inhibitor para-chlorophenylalanine failed to show latent inhibition (Solomon et al., 1978 158), showed a slowing of habituation to an auditory stimulus (Carlton and Advokat, 1973 35; Conner et al., 1970 40), and responded more in extinction (Beninger and Phillips, 1979 17). Similarly treated animals showed less suppression of responding to punishment and enhanced two-way avoidance learning (see review by Ogren, 1982 129). Norepinephrine also may be implicated in aspects of learning. Botwinick et al. (1977) 29 found that mice treated with the dopamine-beta-hydroxylase inhibitor bis-(4-methyl-l-homopiperazinyl-thiocarbonyl)-disulfide were impaired in the recall of an appetitive spatial discrimination task. Archer (1982) 6 reported that rats receiving intraperitoneal (ip) injections of the selective norepinephrine neurotoxin N-(2-chloroethyl)-N-ethyl-2-bromobenzylamine were impaired in acquisition of a two-way avoidance task. He speculated that this impairment might suggest a role for central norepinephrine neurons in the learning of the association between the CS and shock or possibly the CS and the response. Everitt et al. (1983) 68 evaluated the effects of bilateral 6-hydroxydopamine (6-OHDA) lesions of the dorsal norepinephrine bundle on a discriminated lever press task and found acquisition to be significantly impaired, a finding in good agreement with that of Archer (1982) 6. They discussed their results in terms of the “attentional hypothesis,” which suggests that animals with norepinephrine depleted are more distractable (cf., Mason and Iversen, 1979 112); however, they argued that this hypothesis may be too flexible to make useful differential predictions about behavior in most learning tasks

The inhibitory amino acid transmitter GABA has been implicated in learning. Thus, it was found that water-deprived rats trained for 6 consecutive d in a light-dark discrimination task performed better than controls if given icv injections of GABA after each session (Ishikawa and Saito, 1978 96). Grecksch et al. (1978) 84 gave rats posttraining intrahippocampal injections of the drug n-diproplyacetate. This compound enhances GABA levels by blocking the metabolic enzyme GABA-transaminase that converts GABA to succinic semialdehyde. They found an improvement in the retention of a brightness discrimination shock avoidance task in a Y-maze, a finding in good agreement with that of Ishikawa and Saito (1978) 96. Katz and Liebler (1978) 98 employed another GABA-transaminase inhibitor, amino-oxyacetic acid, injected ip into rats immediately following two-way shock avoidance sessions. They found a significant impairment of learning across sessions. The inconsistency in results may be related to the use of discrimination tasks by Ishikawa and Saito (1978) 96 and Grecksch et al. (1978) 84 versus a two-way shuttle box avoidance task by Katz and Liebler (1978) 98 or to other interexperiment differences. Further studies are needed to better understand the role of GABA in learning

As reviewed briefly above, a large number of peptides have been found in the brain, and many of these now have been implicated in learning (e.g. see reviews by de Wied, 1984 61; LaHoste et al., 1980 101; Squire and Davis, 1981 163). This is a relatively new area of research, and some results remain controversial. For example, vasopressin has been reported to increase resistance to extinction of avoidance responding (de Wied, 1971) 60 and apparently to enhance memory in a number of tasks (see reviews by de Wied, 1984 61; de Wied and van Ree, 1982 62). However, Ettenberg et al. (1983) 66 have argued that vasopressin effects on learning and memory are almost always tested in aversively motivated tasks. They showed a similar apparent memory enhancement when vasopressin was given immediately following learning of an appetitively motivated task, but went on to show that this effect failed to occur when the peripheral pressor effects and aversive consequences of vasopressin were avoided by the use of an analog, desglycinamide arginine vasopressin. Similarly, it was shown that treatment with a drug that antagonized the pressor effects of vasopressin blocked its potentiation of learned behavior (Le Moal et al., 1981) 104. Therefore, the possible role of vasopressin in learning and memory remains to be established

Cholecystokinin-octopeptide may influence the effects of reward on behavior and thereby influence reward-related learning. Thus, Vaccarino and Koob (1984) 183 showed that microinjection of nanogram amounts of sulfated CCK-8 into the nucleus accumbens antagonized electrical self-stimulation of the ventral tegmental area. Substance P also has been implicated in learning. One approach to the study of the neurochemistry of learning involves training rats on a task in a single session and then administering pharmacological compounds immediately afterward. If the drugs influence ongoing neuronal processes involved in the consolidation of learned information, their effect should be reflected in increased or decreased performance on the task when retested 24 h later. One advantage of this approach is that the animals are originally trained in a drug-free state, and, since most drugs are largely cleared from the body 24 h after injection, retesting also is carried out in a drug-free state. Using this approach it has been found that microinjections of substance P into the substantia nigra or amygdala impaired the retention of a passive avoidance or one-trial appetitive task, whereas injections into the lateral hypothalamus improved learning (Huston and Staubli, 1978,1979 93,94; Staubli and Huston, 1979 93)

Others have found that opiates may be involved in learning (see reviews by Bolles and Fanselow, 1982 28; Riley et al., 1980 140). Using the posttraming injection technique, water-deprived rats were trained in a one-trial appetitive task to find water in an alcove of a large wooden box and immediately following 5 s of drinking were injected icv with morphine or its vehicle. When retested 24 h later, rats that had received morphine showed significantly shorter latencies to drink from the tube in the alcove (White et al., 1978) 180. Similar results were reported for the effects of morphine on a food-rewarded task (White et al., 1977 181). On the other hand, Lucion et al. (1982) 108 found that posttrainmg icv injections of the endogenous opiate compounds beta-endorphin or metenkephalin caused an impairment in a two-way avoidance task tested 24 h later In agreement with this finding, Gallagher et al. (1983) 80 reported that posttraining ip injections of the opiate antagonist naloxone enhanced learning of new spatial cues for a radial maze task. There were a number of differences between these studies and those of White et al. (1977, 1978) 181,180, including the motivational nature and/or type of task and the doses and opiate compounds employed. Possibly the different effects are related to these or other variables

Based on the well-documented observation of opiate self-administration by humans, some researchers have evaluated possible rewarding effects of opiates in animals. For example, using the place-preference paradigm (see Koob, Swerdlow, and Gilbert, this volume, and section 3.1.6) it has been found that ip or icv injections of morphine or injections directly into the lateral hypothalamus, periaqueductal gray, or nucleus accumbens are rewarding (Bechara and van der Kooy, 1985 11; van der Kooy et al., 1982 184). Recently, Glimcher et al. (1984) 82 found that the peptide neurotensin produced a significant place preference when injected directly into the ventral tegmental area, suggesting that it too may be involved in reward-related learning. These effects of opiates and neurotensin on reward-related learning may be mediated through dopaminergic systems. Thus, it has been found that systemically administered morphine leads to a naloxone-reversible enhancement of the spontaneous discharge rate of neurons in the ventral tegmental area and substantia nigra (Gysling and Wang, 1983 86), and Broderick (1985) 33, using in vivo electrochemistry, showed that morphine stimulated dopamine (DA) release in the striatum. Receptors for neurotensin have been demonstrated on mesencephalic dopaminergic neurons (Palacios and Kuhar, 1981 130), and recently Blaha and Phillips (personal communication), using in vivo electrochemistry, showed that icv neurotensin stimulated DA release in the nucleus accumbens. Furthermore/ it has been shown that the rewarding properties of heroin in a place-preference paradigm were blocked by the DA receptor antagonist pimozide (Bozarth and Wise, 1981 32), and place preferences produced by direct microinjections of enkephalin into the ventral tegmental area were attenuated in a dose-dependent manner by the DA receptor blocker haloperidol (Phillips et al., 1983 134). Thus, the rewarding effects of opiates and neurotensin may depend on the release of dopamine by ascending neurons originating in the ventral mesencephalon

From this brief overview, it can be seen that many neurotransmitters appear to participate in learning. Included in the ever-growing list is dopamine, its role in learning being suggested by the observation that the reward effects of opiates and possibly neurotensin may be mediated by dopaminergic systems. There is an extensive literature concerning the possible role of dopamine in reward-related learning involving the pharmacological manipulation of dopaminergic neurotransmission in various behavioral paradigms (see reviews by Beninger, 1983 13; Wise, 1982 182). This literature will be discussed in some detail in the following section

3 Paradigms for Assessing the Role of Dopamine in Learning

Psychopharmacological studies of the role of dopamine (DA) in learning have involved many methodologies in addition to the basic appetitive and aversive instrumental conditioning procedures described in section 1. As discussed in sections 1.1.3 and 1.2.3, various theoretical accounts of the elements of associative learning in instrumental conditioning have been attempted, but there is no generally agreed upon framework. The combination of methodological variations and the lack of a widely accepted theoretical framework in research concerned with the possible role played by DA in learning has led to some confusion and difficulty in comparing results across experiments. The use of a limited range of doses or of a single dose of drugs that affect dopaminergic neurotransmission also has contributed to this confusion. In this section an attempt will be made to bring together experiments employing diverse methodologies within the framework of incentive motivational learning theory as discussed earlier. It will be argued that there is a strong basis for concluding that DA plays a major role in reward-related learning

In evaluating the role of DA in learning in various paradigms involving reward, it is important to first establish the aspects of learning that are mediated by reward and those aspects of learning that appear to be independent of the effects of reward. If DA is specifically involved only in reward, then those aspects of learning that are independent of reward should also be independent of DA manipulations. The entire undertaking is further complicated by the finding that DA appears to be involved in controlling the level of general locomotor activity. This challenges every experimenter wishing to assess the effects of pharmacological manipulation of DA systems on learning to devise methodologies that are unconfounded by these general locomotor changes. As discussed below, some excellent methodologies have been devised to deal with this potential confound

3.1 Specific Paradigms

3.1.1 Appetitive Lever Pressing

According to incentive motivational theory, what is learned when an animal acquires the lever press response for food is an association between the lever-related stimuli and the rewarding stimulus, food. Rewarding stimuli reliably produce appetitive reactions, including instrumental responses, that serve to bring the animal close to the reward (e.g., approach). Therefore, stimuli associated with reward acquire the ability to activate the central representation of the reward and the associated appetitive reactions. Once this type of learning has occurred, at least intermittent reward is required to maintain it. If reward no longer occurs, the conditioned incentive motivation produced by the lever-related stimuli gradually is diminished, and responding is seen to gradually decline, showing extinction

If DA is involved in incentive motivational learning, two clear predictions follow from this account of learning an appetitive lever press response. The first is that if DA neurotransmission is blocked during response shaping, learning should fail to occur because the lever-related stimuli will fail to become conditioned incentive stimuli. The second is that if DA neurotransmission is blocked during the maintenance of lever-press responding in trained animals, responding should be seen to gradually decline, showing an extinction-like effect. This decline should be seen both within and across sessions

Both of these predictions have been tested, and both have been confirmed. Thus, Tombaugh et al. (1979) 180 and Wise and Schwartz (1981) 183 found that rats treated with pimozide showed a dose-dependent attenuation of acquisition of lever pressing for food. It should be pointed out, however, that a simple motoric impairment resulting from DA receptor blockade might produce the same pattern of results. Therefore, the interpretation of the results of Tombaugh et al. (1979) 180 and Wise and Schwartz (1981) 183 as indicative of a block of reward-related learning by pimozide is confounded by the effects of pimozide on general locomotor activity. It may not be possible to dissociate reward-blocking from activity-reducing effects of DA antagonists on response shaping in appetitive lever-pressing tasks. It is possible, however, to dissociate these two alternatives when evaluating the effects of DA antagonists on the maintenance of lever pressing for food. Wise et al. (1978a) showed that lever pressing for food underwent an extinction-like day-to-day decline in animals treated with pimozide. Similar results have been reported by Gray and Wise (1980) 83, Mason et al. (1980) 113, Tombaugh et al. (1979, 1980a, 1982b) 180,175,176, and Wise et al. (1978b), and an example is shown in Fig. 3. Others have reported that DA antagonists produce a gradual decline in responding over time within a session, also shown in Fig. 3 (Gray and Wise, 1980 83; Greenshaw et al., 1981 85; Mason and Iversen, 1979 112; Phillips and Fibiger, 1979 133; Salamone, 1986 147; Tombaugh, 1981 174; Tombaugh et al., 1980a, 1982b 175,178; Wise et al, 1978b). If DA receptor blockade produced a simple motor impairment, that impairment might be expected to remain constant over sessions. The observation of intra- and intersession declines suggests that this interpretation is incorrect. Another possibility is that the drug is not completely cleared from the system from the time of the first injection to the second. This could result in an additive effect of this putative drug residual with the next dose, thus, the dose would get larger from session to session, possibly resulting in the intersession declines reported. This possibility seems unlikely for several reasons. First, it would not account for the intrasession declines seen, especially on the first drug day. Second, in every study reporting an intersession decline in responding, the drug was given every third or fourth day with no-drug training sessions conducted on the intervening days. Finally, Mason et al. (1980) 113 and Wise et al. (1978a,b) directly tested this possibility by giving animals several injections of pimozide in their home cage prior to the first test session with pimozide. Rates were observed to be similar to those of the group receiving pimozide for the first time in the test. Therefore, an accumulation of drug with repeated dosing cannot account for the extinction-like declines produced by DA receptor blockers.

Fig. 3.
figure 3

(A) Mean responses per min during the final session of responding for a group of 10 rats on a continuous reinforcement schedule (CRF) and during three CRF sessions (36 min) where animals were treated with a 1.0-mg/kg dose of pimozide (CRF + DRUG 1–3). Response rates showed an extinction-like session-to-session decline. (B) Mean responses per min during 3-min blocks of the first 36-min CRF session where pimozide was injected. Rates showed an extinction-like intrasession decline (adapted from Mason et al., 1980 113)

Another possible interpretation of the extinction-like decline observed both within and across sessions for animals treated with DA receptor blockers might be that the drugs produce a complex motor impairment. According to this hypothesis, drug-treated animals learn that it is aversive to perform motor acts such as lever-pressing while in the drug state. This would suggest that, when first drugged, the animals would lever press at their usual rates, but over time rates would decline as the animals learned that it is aversive to respond while drugged. This hypothesis can be ruled out based on the ingenious study of Franklin and McCoy (1979) 79 using electrical stimulation of the brain as the rewarding stimulus. Animals were trained on a reward schedule consisting of two alternating 6-min periods, one signaled by a flashing light and the other by its absence. During each period, response-dependent reward was available for the first 3 min. Thus, both the flashing light and its termination served as cues for reward, with temporal cues (the elapsing of 3 min) signaling nonreward within each period. Once the animals were well trained on this reward schedule, they were tested with pimozide. Test sessions began with no flashing light, but with brain stimulation available as usual at the beginning of the period; however, brain stimulation continued to be available beyond the usual 3-min interval. As expected, pimozide-treated animals responded at first, but showed an extinctionlike decrease in responding. When responding reached a low level for 2 min, the authors turned on the flashing light. Lever pressing was observed immediately to resume at a high rate and again show an extinction-like decline. Control animals for whom the flashing light was not a signal for reward failed to show this effect. If the original decline had occurred because the rats learned that lever-pressing while in the drug state was aversive, they would not have been expected to resume responding. Similar results were by Beninger and Freedman (1982) 13 and Fouriezos et al. (1978) 78 in similar experiments. It can be concluded that the extinction-like decline seen in the lever-pressing of animals treated with DA receptor blockers cannot be attributed to a complex motor impairment

It may be useful to consider the Franklin and McCoy (1979) 79 experiment in the context of incentive theory. Accordingly, the lever-related stimuli plus the flashing light would have become conditioned incentive stimuli having, through their association with reward, acquired the ability to elicit appetitive reactions, including approach and lever pressing. In the pimozide test, the lever-related stimuli plus the absence of the flashing light lost their incentive value because responding in their presence was presumably no longer rewarded (as DA neurotransmission was blocked). When the flashing light then was presented, responding resumed because this stimulus still was a conditioned incentive stimulus

3.1.1.1 Transfer

One approach that has been used to evaluate the similarity of operant responding in animals treated with DA receptor blockers to those experiencing extinction involves tests of possible transfer between these two conditions. Wise et al. (1978a) were the first to use this procedure. They trained rats to lever press for food and then tested them for three sessions in extinction. An intersession decline in responding typical of extinction was observed. On the following day, these animals received pimozide prior to a session in which food pellets again were available for lever press responding. Rates were low and consistent with the pattern observed for animals receiving a fourth day of extinction. Thus, it appeared that the two procedures, extinction and normal reward plus pimozide, were interchangeable. A similar pattern of results for transfer tests from extinction to pimozide plus normal reward has been reported by Beninger (1982) 12, Tombaugh et al. (1980a) 175, and Wise et al. (1978b)

A problem arises with the transfer procedure when attempts are made to switch in the other direction, from 3 d of testing with pimozide plus normal reward to extinction. When this test was made, results consistently showed that response rates on the extinction day were higher than those observed on the third day of pimozide plus reward (Beninger, 1982 12; Tombaugh et al., 1979, 1980a 180,175), as can be seen in Fig. 4. If extinction was identical to pimozide plus reward, the two conditions should be interchangeable, and transfer should be seen in either direction. However, DA receptor blockers are well known to produce decreased motor activity in a dose-related manner (cf., Costall and Naylor, 1979 43; Ungerstedt, 1979 182), and many studies have shown that animals given a combination of a DA receptor blocker and extinction respond significantly less than animals given extinction alone (Beninger, 1982 13; Gray and Wise, 1980 83; Phillips and Fibiger, 1979 134; Tombaugh et al., 1980a, 1982b 175,178). These motor changes in animals given pimozide will affect transfer tests. When transfer from 3 d of extinction to pimozide plus reward is tested, the motor effects of pimozide will influence responding in the direction expected, i.e., pimozide will reduce responding. In this case the data suggest that the two procedures are interchangeable. When transfer from 3 d of pimozide plus reward to extinction is tested, the motor effects of pimozide will be absent in the extinction session, and response rates therefore might be expected to be higher in extinction, as has been reported. Thus, transfer tests probably do not provide an unconfounded procedure for evaluating the comparability of extinction and pimozide plus reward. Other procedures, however, like those discussed in the first part of this section, clearly show that DA receptor blockers, in addition to affecting motor activity, influence incentive learning.

Fig. 4.
figure 4

Mean responses per min for a group of 10 rats during three continuous reinforcement (CRF) sessions (36 min) preceded by injections of pimozide (1.0 mg/kg) and during transfer to a session of extinction (EXT) where rates were observed to increase

3.1.2 Discriminated Appetitive Operant Responding

Discrimination tasks can be either simultaneous or successive. In either case two or more stimuli are presented, and one is consistently a signal for reward. The animal is usually required to perform some operant response to attain reward in the presence of the appropriate stimulus (S+), responding in the presence of the other stimulus (S−) producing no reward. In a simultaneous discrimination, the two stimuli are presented concurrently, and the animal is required to choose from two response options; in a successive discrimination, only one stimulus (S+ or S−) is present at any one time, and the animal must choose to respond or to withhold responses. From an incentive learning point of view, the S+ is consistently associated with reward and therefore becomes a conditioned incentive stimulus, having an increased ability to produce appetitive reactions

Only one study has been published that reports the effects of blockade of DA neurotransmission on the acquisition of a discrimination. Tombaugh et al. (1983) 179 trained rats in either a light-dark task or a T-maze spatial discrimination task. The former involved two levers simultaneously presented; the cue light above one of the levers was illuminated (S+), whereas the light above the other remained off. Presses on the lever associated with the S+ were rewarded. In the T-maze, choice of one arm (e.g., left) was consistently rewarded. Tombaugh et al. (1983) 179 found a dose-related decrease in acquisition of the light-dark discrimination lever-pressing task. With the higher doses (0.5 and 1.0 mg/kg), too few responses occurred to provide sufficient data for analysis. This finding is in accord with the results of studies by Tombaugh et al. (1979) 180 and Wise and Schwartz (1981) 183 reporting a failure of lever-press acquisition following pimozide as discussed in the previous section. As was the case in those studies, in the light-dark discrimination acquisition it is not possible to dissociate the locomotor impairments produced by pimozide from possible effects on discrimination learning. In the spatial discrimination T-maze task, no significant effect of pimozide was found on acquisition, even with doses as high as 2.0 mg/kg, although latencies to leave the start box and enter an arm were significantly increased by pimozide. On the basis of this finding it might appear that intact DA neurotransmission is not required for incentive learning in a T-maze discrimination task. This result remains as one of the strongest challenges to the hypothesis that DA is required for incentive learning to occur. However, recent data (Hoffman and Beninger, 1985 91) have shown that the effects of pimozide on incentive learning in other paradigms (see section 3.1.4) interact with the amount of conditioning. Possibly the demands of the learning task also interact with the effects of pimozide. From this point of view, the failure of pimozide to disrupt spatial discrimination learning in the experiment of Tombaugh et al. (1983) 179 may have been specific to the task employed and the maximum dose used (cf., Hoffman and Beninger, 1985) 91. This is, however, highly speculative, and the data of Tombaugh et al. (1983) 179 remain to be incorporated into the general understanding of the role of DA in incentive learning

Other studies have examined the effects of DA receptor blockers on the performance of established discriminations. Results have consistently shown that these drug treatments lead to a decrease in responding in the presence of the S+, but do not reduce the accuracy of the discrimination (Beninger, 1982 12; Bowers et al., 1985 31; Tombaugh, 1981 174; Tombaugh et al., 1980b 176). In other words, drugged animals continue to respond to the S+ more than to the S−, but at diminishing rates, snowing an extinction-like decline over time (see Fig. 5). This pattern is similar to that seen in extinction when responding to the S+ is no longer followed by reward (Beninger, 1982) 12. From the point of view of incentive theory, the conditioned incentive motivational properties of the S+ are lost with continued testing in the presence or DA receptor blockade. These findings would suggest that attenuated DA neurotransmission does not impair the ability to discriminate stimuli, but rather reduces the ability of the S+ to control operant responding.

Fig. 5.
figure 5

Mean responses per min for a group of six rats on a successive discrimination consisting of a multiple random-interval 30-s extinction (MULT RI30 EXT) schedule. The left-hand panel shows S+ (■) and S− (▲) rates in 3-cycle blocks for the 25th training session. The right-hand panel shows rates over three sessions where animals were treated with a pimozide dose of 1.0 mg/kg (MULTI + DRUG 1–3) (adapted from Beninger, 1982 12)

In an interesting study, Szostak and Tombaugh (1981) 67 trained pigeons on fixed consecutive number schedules prior to testing the effects of pimozide. For one group (FCN-8), the completion of eight or more pecks on one key set up the availability of reward for pecking on a second key, but there was no external signal that reward was available. For the other group (FCN-SD), eight or more pecks on the one key led to a change in color of the second key (an external signal), which could then be pecked to deliver food. The authors found that treatment with the DA receptor blocker pimozide resulted in similar decreases in response rates for both groups; however, whereas pimozide had little effect on the accuracy of the FCN-SD group, performance of the FCN-8 group was observed to be impaired. If pimozide reduces the ability of conditioned incentive stimuli (e.g., the first key) to control operant responding, the differential effect of the drug in these two tasks may be related to predrug differences in stimulus control. Thus, responding on the first key was more often followed by reward for the FCN-SD group (on 99% of the trials) than for the FCN-8 group (on 84% of the trials); therefore, the first key may have been a stronger incentive stimulus that was more resistant to the effects of pimozide for the FCN-SD group. Szostak and Tombaugh (1981) 167 suggested that the differential effect of pimozide in the two groups may be related to the locus of the controlling stimulus for switching from the first to the second key; for the FCN-8 group this response was controlled by internal stimuli (a count of eight responses), and for the FCN-SD group it was controlled by an external stimulus (the color change of the second key). Although it is possible that the degree of external versus internal stimulus control influences the effects of DA receptor blockers on behavior, as the authors suggest, it is perhaps simpler to attribute the differential effects of the drug to the degree of overall stimulus control as reflected by accuracy of performance. Laties (1975) 102 has shown that the effects of a number of drugs interact with the degree of stimulus control. It would be interesting to train groups to equivalent levels of performance on these two tasks and then to assess the effects of pimozide. This would unconfound the level of accuracy and locus (internal versus external) of the stimulus control.

Ridley et al. (1980) 139 examined the effects of the DA agonist (+)-amphetamine on discriminated appetitive operant responding in marmosets. They trained the monkeys in a simultaneous red-white discrimination task consisting of two levers with cue lights above each and in a successive red-white discrimination tasks with only one lever present. They found that amphetamine resulted in a dose-dependent impairment of the successive discrimination while not significantly affecting the simultaneous task (cf., Evenden and Robbins, 1985 67). Reduced performance was caused by an increase in responding to the nonrewarded cue in the successive task. The authors concluded that amphetamine disrupted response control. From an incentive learning point of view, the results can be understood as follows. In both discriminations, the cue light and associated lever that signals reward (S+) will become a conditioned incentive stimulus. Amphetamine, by enhancing DA neurotransmission, might lead to incentive learning in the test environment, resulting in many cues becoming incentive stimuli. In the simultaneous discrimination where there already is a strong conditioned incentive stimulus, the effects of amphetamine are minimal, possibly because the additional incentive learning produced by amphetamine does not lead to a relative change in the strength of the S+. Similarly, in S+ trials of the successive discrimination, the relative incentive value of the cue light/lever stimulus may be unaffected by amphetamine. However, in S− trials of the successive discrimination, the inappropriate incentive learning produced by amphetamine may lead to responding to the cue light/lever stimulus that previously failed to elicit responses. This is exactly what Ridley et al. (1980) 139 found. As will be discussed in section 3.1.5, this explanation is consistent with studies of environment-specific conditioned activity produced by DA agonists

One final study by Szostak et al. (1981) 167 will be considered in this section. In this experiment, pigeons were trained in two discrimination problems differing in task complexity. All pigeons received both discrimination problems within every session. One problem was an identity matching to sample task and the other was a symbolic matching to sample task. The pigeon was faced with three keys arranged horizontally, and the center one was illuminated. A single peck on the center key resulted in its changing to one of four stimuli. Further pecking the center key led to presentation of stimuli on each of the two side keys with continued illumination of the center key. For the identity task, the pigeon was required to peck the side key stimulus matching the center key (e.g., if the center key is red, peck red; if green, peck green). For the symbolic task, the pigeon was required to peck the appropriate side key (e.g., if the center key is red, peck a vertical line; if the center key is green, peck a horizontal line). When the pigeons were treated with pimozide, response rates declined; furthermore, pigeons were observed to be impaired in a dose-dependent fashion in the symbolic task with no significant effect in the identity task. The authors suggested that “⋯dopamine may be involved when more difficult tasks are used.” Although Szostak et al. (1981) 167 did not directly compare performance on the two tasks during nondrug sessions, examination of the baseline data presented in their paper suggests that the symbolic matching task, in fact, may have been more difficult. From this point of view, the results may be consistent with the results of Tombaugh et al. (1983) 179 as discussed above

3.1.3 Aversively Motivated Instrumental Responding

A typical avoidance learning procedure has been described in section 1.2. Studies of the role of DA in avoidance learning have provided valuable data for understanding the elements of learning in this task. In this section, some of these studies will be described. It will be shown that animals undergoing avoidance acquisition training while DA neurotransmission is reduced in their brains fail to acquire the avoidance response, although escaping readily at shock onset. However, they learn the relationship between environmental signals for shock and shock itself. They also learn the association between safety-related environmental stimuli and shock offset. What fails to happen in animals with DA neurotransmission reduced is the acquisition by the safety-related environmental stimuli of the ability to elicit reactions that involve instrumental responses that serve to bring the animal closer to them. Thus, incentive learning fails to occur in spite of fairly extensive learning about the avoidance task itself, including CS-shock associations and safety-related stimuli-shock offset associations. Furthermore, if animals first are trained in an avoidance task and then the effects of DA receptor blockade assessed, it will be found that initially there is little effect of this treatment. However, with continued testing under the influence of DA receptor blockade, even pretrained animals will begin to show a loss of avoidance responses until eventually they return to untrained levels of performance. Data supporting this complex scheme will be reviewed below

Numerous studies have shown that untrained animals with DA neurotransmission disrupted either fail to acquire or are significantly impaired in their ability to acquire the avoidance response. This has been shown following treatment with the DA receptor blockers chlorpromazine (Posluns, 1962 137), haloperidol (Fibiger et al., 1975 72), and pimozide (Anisman et al., 1982a,b 4,5; Beninger et al., 1980b, c, 1983 24,25,13) and numerous similarly acting drugs (Ahlenius et al., 1984 1; Koob et al., 1984 100; Niemegeers et al., 1969 128). Similar results have been reported following bilateral destruction of dopaminergic systems with the neurotoxin 6-OHDA. Sites of injection include the substantia nigra (Cooper et al., 1974 42; Delacour et al., 1977 49; Fibiger et al., 1974 71), nigrostriatal bundle (Zis et al., 1974), and caudate nucleus (Delacour et al., 1977) 49 or, using smaller injection volumes with more localized effects, combined caudate nucleus and nucleus accumbens (Koob et al., 1984) 100; no effect was seen with frontal cortical injections (Koob et al., 1984) 100

Animals treated with compounds that disrupt DA neurotransmission, although failing to acquire the avoidance response, escape readily when shock is presented. Evidence that they learn which stimuli are associated with safety comes from several sources. Beninger et al. (1980c) 25 found that the latency to perform the escape response grew shorter from trial to trial for pimozide-treated rats, showing a learning curve (Fig. 6) comparable to that of vehicle-treated rats over the first five trials. As testing progressed, however, the control animals continued to show shorter latencies as they began to shuttle prior to shock onset (avoid), whereas the pimozide groups continued to wait until shock was presented before shuttling. This initial improvement in escape latencies of the pimozide groups suggests that they were learning where to go to find safety. Anisman et al. (1982a) 4 tested this possibility directly by requiring pimozide-treated mice to perform a discriminated avoidance response; in a Y-maze, shock escape or avoidance could only be made by entering the appropriate arm of the maze as indicated by visual or positional cues. They found that although pimozide impaired acquisition of the avoidance response, it had no significant effect on the animals’ ability to select the appropriate arm. Similar results were reported by Ahlenius et al. (1984) 1. The frequent observation of very rapid acquisition of avoidance responding on the first nondrug session following testing in the drug state provides further support for the suggestion that some learning about the test environment is occurring in animals that fail to avoid (Beninger et al., 1980b,c 24,25; Fibiger et al., 1975 72; Posluns, 1962 137).

Fig. 6.
figure 6

Mean latency (s) to escape (>10 s) or avoid (<10 s) for groups of eight rats receiving 10 trails of one-way shock avoidance training. One group (▲) received pimozide (1.0 mg/kg) prior to this session, and the other (■) received its vehicle (adapted from Beninger et al., 1980c 25)

That animals with DA neurotransmission disrupted, although failing to acquire the avoidance response, learn the association between environmental cues that signal shock and the shock itself also has been shown. Thus, treatment with chlorpromazine during the pairing of a CS with shock did not affect the subsequent ability of the CS to produce conditioned suppression of operant responding (cf., Cook and Kelleher, 1963 41). Fibiger et al. (1975) 72 mentioned anecdotal observations of urination, defecation, and other signs indicative of fear during a CS for shock in haloperidol-treated rats, suggesting that they had learned the CS-shock association. Beninger et al. (1980c) 25 trained rats to lever press for food. They were then injected with pimozide and tested in a shuttle box avoidance paradigm with a tone CS signaling shock; they failed to acquire the avoidance response. When undrugged, the rats were again placed in the lever-press apparatus and in the course of the session presented with the tone CS from the avoidance task. The rats showed conditioned suppression to the tone. Thus, they had learned the tone-shock association under the influence of pimozide in spite of their failure to avoid. In a related experiment, pimozide-treated animals failed to acquire the avoidance response over five test sessions. They then received three additional sessions with no drug and with shock no longer following the CS (extinction). In spite of the absence of shock, the rats showed acquisition of avoidance responding. Thus, the CS had become a sufficiently strong conditioned aversive stimulus that its offset was an incentive stimulus effective in its own right in producing the acquistion of avoidance responding (Beninger et al., 1980b) 24. Finally, Anisman et al. (1982b) 5 injected animals with vehicle or pimozide and then presented zero or 10 CS (light plus tone)-shock pairings. Several days later, when tested undrugged for avoidance acquisition, animals previously receiving vehicle injections and 10 pairings acquired the avoidance response significantly faster than those receiving zero pairings. The same effect was observed in animals receiving pairings while under the influence of pimozide. It can be concluded that blockade of DA neurotransmission does not impair an animal’s ability to learn the association between environmental cues that signal shock and the shock itself

The observed effects of attenuated DA neurotransmission on the performance of avoidance responding in pretrained animals depends on several variables. One is the number of avoidance trials and another is the dose of drug, and both of these probably interact with the difficulty of the task (e.g., one-way versus two-way avoidance). Beninger et al. (1983) 13 pretrained rats in an avoidance task and then examined the effects of pimozide (0.5 and 1.0 mg/kg) over 15 sessions (10 trials per session). They found that initially the drug had only a small effect on performance, but with repeated testing over days, avoidance responding deteriorated, especially in the high-dose group, until by days 10–15, responding was near untrained levels (see Fig. 7). Control studies showed that this effect could not be attributed to a toxic buildup of the drug with repeated dosing (cf., section 3.1.1). A number of reports of a decrease in avoidance responding in pretrained animals given DA antagonists or neurotoxins are consistent with these findings (Ahlenius et al., 1984 1; Fibiger et al., 1974 71; Neill et al., 1974 127; Posluns, 1962 137; Taboada et al., 1979 169). Anisman et al. (1982a,b) 4,5 found no significant effect of pimozide (0.4 mg/kg) on avoidance responding of pretrained mice tested for 50 trials, although this dose did significantly impair the acquisition of naive mice. It is possible that additional drug tests at this dose over subsequent sessions would have shown a gradual decline as reported by Beninger et al. (1983) 13. Anisman et al.(1982a) 4 showed that a higher dose (0.8 mg/kg) produced a more rapid and significant decline in the first 50 trials. This might suggest an alternate approach to the assessment of drug effects in avoidance paradigms. Rather than evaluating the effects of a limited range of doses over a longer number of test sessions (cf., Beninger et al., 1983 15), a dose-response function can be derived for a single acquisition session in naive animals or a single test session in pretrained animals. It would be expected that this function would be shifted to the right for pretrained animals (see Fig. 8).

Fig. 7.
figure 7

Mean number of avoidance responses per session (10 trials) for groups of rats receiving nondrug pretraining or no training prior to testing with pimozide. For the pretrained groups (n = 48) the last non-drug session is shown (B), and 15 sessions where they were treated with doses of 0.5 (■) or 1.0 (▲) mg/kg. The nonpretrained groups (n = 16) received five session following injections of 0.5 (□) or 1.0 (Δ) mg/kg (adapted from Beninger et al., 1983 15)

Fig. 8.
figure 8

Theoretical dose-response curves for the effects of DA receptor blockers on avoidance responding in pretrained (−) and nonpretrained (⋯⋯) animals. Data suggest that the pretrained animals would be more resistant to the effects of the drugs than the nonpretrained animals (see text)

It is noteworthy that DA receptor blockers do not simply produce an impairment in animals’ ability to initiate responses, as has been so often suggested. This interpretation cannot account for the large difference in the effects of these drugs on the acquisition of avoidance versus performance in pretrained animals. Thus, untrained animals receiving 1.0 mg/kg of pimozide fail to acquire avoidance responding over five 10-trial sessions, whereas trained animals receiving the same dose continue to perform avoidance responses for up to 10 or more sessions before reaching untrained levels (Fig. 7). If DA receptor blockade impaired response initiation, the pretrained animals would not be expected to continue to avoid. The incentive learning point of view provides an alternative. During acquisition, the offset of shock may lead to a DA-mediated change in the ability of safety-related stimuli to attract the animal. Once this putative DA-mediated change has occurred, it can influence behavior for a time even when DA neurotransmission is blocked.

3.1.4 Conditioned Reward

The procedures to be described in this section have been referred to traditionally as methods of conditioned reinforcement. However, because the term “reinforcement” often has associated with it reference to theoretical accounts of stimulus-response learning, the more neutral term “reward” will be used in its place. Tests of conditioned reward basically involve first the pairing of some neutral stimulus with reward (this can be done using instrumental or classical conditioning procedures) and then an assessment of the ability of. that paired stimulus to act as a rewarding stimulus in its own right (cf., Mackintosh, 1974 111, pp. 233 ff). From an incentive learning point of view, repeated pairing of a neutral stimulus with reward should lead to the establishment of an association between the two stimuli and to a change in the incentive properties of the neutral stimulus. As a result of this latter effect, the neutral stimulus will now be a conditioned incentive stimulus, having acquired the ability to produce appetitive reactions that include instrumental responses that serve to bring the animal close to the reward. That the originally neutral stimulus has in fact become a conditioned reward can be established by showing that it can, in its own right, lead to a change in the incentive properties of another neutral stimulus

A conditioned reward procedure was employed by Beninger and Phillips (1980) 18. Three phases were included. The first involved placing rats for 40 min a day for several days into an operant chamber outfitted with two levers. Pressing one of the levers produced a tone for 3 s, whereas pressing the other had no prearranged consequences. The purpose of this phase was to habituate the rats to the chamber and to measure the operant rate of pressing the two levers. For the second phase, the levers were removed from the chamber, and the tone was repeatedly paired with food over four conditioning sessions. The ability of the tone to act as a conditioned reward was assessed in the third phase by again measuring the rate of pressing the two levers. (Note that food was never presented in the third phase.) A relative increase in pressing the tone lever, but not the other lever, would provide evidence that the tone has become a conditioned reward. This is exactly what was found, and, furthermore, control groups receiving tones and food that were negatively correlated in phase 2 failed to show a relative change in rate of pressing the tone lever in phase 3. Thus, pairing the tone with food established it as a conditioned reward. In incentive theory terms, the presentation of a conditioned reward following pressing of the appropriate lever led to a change in the incentive properties of the lever-related stimuli; as a result, those stimuli were able to produce appetitive reactions, including approach and downward displacement

The conditioned reward procedure was used by Beninger and Phillips (1980) 18 to assess the effects of a DA receptor blocker on incentive learning. This procedure provided a unique means of dissociating possible effects of the drugs on operant responding from effects on learning. Animals could be treated with drugs during the pairing phase, where the only response requirements were that the animals eat the food, and then tested drug-free to assess the effectiveness of the tone as a conditioned reward. Results revealed that pimozide during pairing produced a dose-dependent decrease in the ability of the tone to act as a conditioned reward. This suggested that DA-mediated incentive learning during the pairing phase (cf., Davis and Smith, 1975, 1977 47,48)

These results were recently replicated and extended by Hoffman and Beninger (1985) 91. They found that by using a 3-s period of darkness (lights off) as the CS in the paradigm of Beninger and Phillips (1980) 18, a stronger conditioned reward effect was established and a higher dose of pimozide was required to block the effect. When fewer pairings were given in the conditioning phase (2 sessions instead of 4), a weaker conditioned reward effect was observed, and a lower dose of pimozide was required to block the effect. They showed in a second experiment that the effects of pimozide could not be explained by the establishment of a conditioned taste aversion to food, and, in a subsequent paper (Hoffman and Beninger, 1986) 92, that the effects of pimozide were probably not attributable to a drug-induced change in the animals’ primary level of food motivation. Hoffman and Beninger (1985) 91 suggested that their results may provide a basis for reconciling some contradictory reports of the effects of pimozide on the establishment of conditioned reward. Tombaugh et al. (1982a) 177 had reported that pimozide failed to block the establishment of a cue light as a conditioned reward in a sign-tracking paradigm. Rats received pairings of the cue light with food while treated with pimozide (1.0 mg/kg), and when subsequently presented with the cue light while drug-free were observed to approach it, suggesting that it had become a conditioned incentive stimulus. The results of Hoffman and Beninger (1985) 91 also showed that 1.0 mg/kg of pimozide failed to block the establishment of conditioned reward, but showed that a higher dose would produce this effect and that the required dose was a function of the amount of conditioning. It is not possible to directly compare the sign-tracking paradigm of Tombaugh et al. (1982a) 177 to the conditioned reward paradigm of Hoffman and Beninger (1985) 91, but results might suggest that an exploration of a wider range of doses in a sign-tracking paradigm might reveal that incentive learning in this task is also dependent on intact DA neurotransmission during pairings of the cue light with food

In the discussion in section 2.1.3, data were reviewed that showed that animals treated with DA receptor blockers while undergoing avoidance training, although failing to acquire the avoidance response, nevertheless learned the relationship between safety-related stimuli and reward (shock offset). What seems to happen is that the safety-related stimuli fail to become conditioned incentive stimuli that can produce approach responses. If the same theoretical framework of incentive learning can be applied to appetitive and aversively motivated instrumental responding, as suggested in section 1.2.3, it might follow that animals receiving pairings of a neutral stimulus (e.g., tone) with food while treated with a DA receptor blocker, although failing to show that the neutral stimulus has become a conditioned reward, might learn the association between the two stimuli. Data consistent with this hypothesis have been found. Beninger and Phillips (1981) 19 presented three groups of rats with four sessions like those in the second phase of the conditioned reward study described above. One group received tone-food pairings; the second was similarly trained, but injected with pimozide prior to each session, and the third group received pellets but no tones. Following these sessions, all rats (no longer drugged) were trained to lever-press in 30-min sessions on a successive operant discrimination consisting of alternating 1-min components of intermittent reward and extinction. The 1-min periods of reward were signaled by the tone that had been used in the previous phase to signal food for two groups, and the 1-min periods of extinction were signaled by the absence of the tone. Results (Fig. 9) showed that the group previously receiving tone-food pairings (but not pimozide) acquired the discrimination significantly faster than the group receiving pellets alone. This finding was consistent with previous reports of transfer from classical to operant conditioning when the classical CS (in this case, tone) is used as the S+ in the operant discrimination (e.g., Bower and Crusec, 1964 30; Mellgren and Ost, 1969 118; Trapold et al., 1968 181). Of particular interest was the group receiving pimozide prior to tone-food pairings. As can be seen in Fig. 9, the discrimination performance of this group was intermediate between that of the other two. One interpretation of this finding is that this group learned the association of the tone with food while treated with pimozide, but the tone failed to acquire incentive properties. As discrimination sessions progressed, the tone would be associated frequently with food, and, with DA neurotransmission no longer impaired, might be acquiring incentive properties, leading to the fairly rapid improvement observed. Further support for this possiblity was found by Beninger and Phillips (1981) 19 when they compared S+ and S− rates of the three groups. Most of the change in discrimination ratios for the group receiving pimozide during the pairing phase was attributable to S+ rates. These data are consistent with the hypothesis that pimozide blocks incentive learning, but not the learning of associations between stimuli.

Fig. 9.
figure 9

Mean (±SEM) discrimination ratios for three groups of rats (N = 6) over 18 sessions of training on a multiple random-interval 32-s extinction (MULTIRI32 EXT) schedule. The S+ was a tone and the S− was the absence of the tone. Prior to discrimination sessions the TONE + PELLETS group had received four sessions of pairings of the S+ with food. The PELLETS group received the same amount of food, but without the tone signal. The TONE + PELLETS + PIMOZIDE group also received pairings, but was treated with pimozide (1.0 mg/kg) for each of the four pairing sessions. No drugs were given during discrimination sessions shown here (from Beninger and Phillips, 1981 19)

One final set of results will be covered briefly in this section. There is a growing number of studies showing that animals treated with DA agonists, including (+)-amphetamine and pipradrol, during the test phase of conditioned reward studies show a relatively greater enhancement of responding on the lever that produces the conditioned reward than on the other lever (Beninger et al., 1980a, 1981 23,19; Hill, 1970 90; Mazurski and Beninger, 1986 115; Robbins, 1975,1976, 1978 141,142,143; Robbins and Koob, 1978 144; Robbins et al., 1983 145). [Recently, Taylor and Robbins (1984,1985) 170,171 have shown that this effect occurs following intraaccumbens amphetamine and, to a lesser extent, following intracaudate amphetamine; further, they found that 6-OHDA lesions of the nucleus accumbens, but not caudate, attenuated the effects of amphetamine, suggesting that the accumbens may be a critical substrate for the effect.] Of particular interest is the observation that the DA agonist apomorphine, although leading to an overall enhancement of responding, fails to lead to a specific effect on the lever producing conditioned reward, as shown in Fig. 10 (Mazurski and Beninger, 1986 115; Robbins et al., 1983 145). The difference in the mechanism of action of apomorphine and amphetamine may provide valuable clues regarding the participation of DA in incentive learning. Apomorphine is a direct DA agonist, and amphetamine is an indirect DA agonist, enhancing neurogenic release (e.g., Ernst, 1967 65). Results would suggest that the ability of a conditioned reward to produce incentive learning depends on the control by that stimulus of the release of DA. That conditioned rewards can lead to increased DA release has been demonstrated by Schiff (1982) 150. It should be noted that the release of DA may be required for a conditioned incentive stimulus to act as a reward, leading to the acquisition by some other stimulus of incentive properties. However, once a neutral stimulus has become a conditioned incentive stimulus, it can influence responding directly (e.g., produce approach) even when DA neurotransmission is blocked. Of course, this ability is eventually lost with repeated testing in the presence of DA receptor blockade, as shown in both appetitive and aversively motivated paradigms.

Fig. 10.
figure 10

Mean responses per 30 min on two levers, one that produced a 3-s tone (T) and another that produced a 3-s period of lights out (L), during a preconditioning period (left-hand two bars in each set of four bars) and following pairing of the lights-out stimulus with food (right-hand two bars in each set of four bars). The upper panel (A) shows results for animals treated with saline (SAL) or amphetamine doses ranging from 0.01 to 2.5 mg/kg during test sessions. The lower panel (B) shows results for animals treated with apomorphine doses ranging from 0.1 to 5.0 mg/kg during test sessions

3.1.5 Conditioned Activity

Many drugs that act as DA agonists produce increases in locomotor activity (cf., Costall and Naylor, 1979 43; Ungerstedt, 1979 182). A number of investigators have been interested in the possibility that the stimulant effects of these drugs can be classically conditioned by repeated pairing with a specific environment. A typical experimental protocol might be as follows. On each of 10 consecutive days, a group of rats is injected with the DA agonist (+)-amphetamine and placed into a chamber where locomotor activity is monitored for 60 min; the rats are then returned to their home cages. A control group similarly receives daily placement in an activity monitoring chamber, but these animals are injected with saline prior to each session. To equate the drug experience of the two groups, the control group receives an injection of amphetamine upon return to their home cages and to equate the injection experience of each group, the amphetamine experimental group receives saline upon return to their home cages. On the 11th day, both groups receive saline prior to placement in the test boxes. If classical conditioning has occurred, the repeated pairing of the unconditioned drug stimulus with the test box (CS) will lead to the unconditioned response to the drug stimulus (increased activity) being elicited by the CS; i.e., greater activity will be seen in the group previously receiving amphetamine in the test boxes. This result has been observed frequently following treatments with amphetamine (Beninger and Hahn, 1983 15; Hayashi et al., 1980 87; Irwin and Armstrong, 1961 95; Pickens and Crowder, 1967 135; Pihl and Altman, 1971 136; Tilson and Rech, 1973 173) or cocaine (Barr et al., 1983 9; Beninger and Herz, 1986 16; Post et al., 1981 138). Some authors have referred to this effect as a placebo effect. Thus, when the experimental animals received saline on the test day, they reacted (were more active) as if they had received amphetamine presumably because they were “expecting” a drug injection.

The phenomenon of environment-specific conditioned activity can be understood as an example of incentive learning. According to this point of view, enhanced DA neurotransmission produced by the stimulant drug in the test environment leads to those stimuli acquiring incentive properties that increase their ability to produce operant responses such as approach. When animals previously receiving a stimulant in the test box are injected with saline and placed there, they may be more active because various test box stimuli now are incentive stimuli. The role of DA in the establishment of this phenomenon was suggested by the observation that pretreatment with pimozide prior to sessions in which amphetamine or cocaine were paired with the test environment resulted in an attenuation of the unconditioned effects of these drugs and a failure to observe enhanced activity following saline on the test day (Beninger and Hahn, 1983 15; Beninger and Herz, 1986 16). As discussed in the previous sections, once incentive learning is established, evidence of that learning can be seen even when DA systems are blocked. For example, animals trained to lever-press for food show a gradual decline in responding when treated with a DA receptor blocker rather than ceasing to respond immediately. If environment-specific conditioning is an example of incentive learning, the same phenomenon should be observed. This is exactly what was found. Beninger and Hahn (1983) 15 and Beninger and Herz (1986) 16 showed that the same dose of pimozide that blocked the establishment of amphetamine or cocaine-produced environment-specific conditioning when given on pairing days failed to block its expression when given on the saline test day (see Fig. 11)

Fig. 11.
figure 11

Median activity rating for the experimental (●) and control (◯) groups. In experiment 1, the experimental group received amphetamine, and the control group received saline before conditioning session 5. Both groups received saline before test sessions. In experiment 2, both groups received pimozide (0.4 mg/kg) 4 h before conditioning session 10, and the experimental group received amphetamine, and the control group received saline immediately before that session. Both groups were also injected with saline before the test day 11. In experiment 3, the experimental and control groups received amphetamine and saline, respectively, before conditioning session 10. Both groups received saline before test sessions; however, both groups were also injected with pimozide (0.4 mg/kg) 4 h before test sessions on days 11 and 29. Statistical comparisons were made with Mann-Whitney U tests: * p < 0.05; + p < 0.01 (From Beninger and Hahn, 1983 15)

3.1.6 Place-Preference Conditioning

The place preference procedure has become increasingly more popular over recent years as a tool for studying reward-related learning (cf., Koob, Swerdlow, and Gilbert, this volume). Like conditioned reward procedures discussed in section 3.1.4, it has the advantage of separating the drug treatment days from the drug-free test days, thereby allowing an assessment of drug effects in drug-free animals. A further advantage is that the procedure places minimal response demands on the animals. A typical experiment might have three phases analogous to those described for the conditioned reward paradigm. In the first phase, the animal is exposed to a test box (perhaps 20×80×30 cm deep) consisting of two sides made distinct from one another by variations in floor texture, pattern and/or brightness of the walls, possibly odor, and so on. The two sides are connected by a small tunnel or hole in a partition between them, and the animal can move freely between the two sides. After several days (e.g., 5), the conditioning phase is begun. It consists of placing the animal into one side of the test box with access to the other side now blocked. Sessions may last for 30 min each day, and the side into which the animal is placed alternates from day to day for 8 or more d. One side always is paired with a reward (e.g., food) and the other with no reward. Test sessions in phase three are like pre-exposure sessions in phase one, the animals again having free access to the two sides of the test box. A conditioned place preference is evidenced by a greater amount of time now being spent in the side of the box previously associated with reward. From an incentive learning point of view, pairing one side of the test box with reward leads to acquisition by the distinctive stimuli in that side of incentive motivational properties that enhance their ability to elicit appetitive reactions including approach

Food or sucrose have been found to produce incentive learning using this paradigm (Spyraki et al., 1982a 160; Tombaugh et al., 1982a 177; White and Carr, 1985 179). That DA may play a critical role in this learning was supported by the further finding of Spyraki et al. (1982a) 160 that the DA receptor blocker haloperidol, injected prior to placement into one side of the test box, with food during the conditioning phase blocked the establishment of the effect. On the other hand, in a similar experiment, Tombaugh et al. (1982a) 177 found that pimozide failed to block place conditioning produced by food. One difference between the two studies is that the rats of Spyraki et al. (1982a) 160 were no longer food-deprived in the test phase, whereas those of Tombaugh et al. (1982a) 177 were. It is possible that a greater dose of drug is required during conditioning to block the demonstration of a place preference in animals that are still deprived in the test. Alternatively, other differences in procedures may have resulted in the discrepant findings, as discussed in section 3.1.4 above and in Hoffman and Beninger (1985) 91.

In spite of the uncertain findings regarding the possible role of DA neurotransmission in the establishment of a place preference by food, there is ample evidence from psychopharmacological studies supporting a role for DA in place-preference learning. Thus, place preferences have been reported following pairing of one side of the test box with the DA agonists (+)-amphetamine (Gilbert and Cooper, 1983 81; Mackay and van der Kooy, 1985 110; Spyraki et al., 1982b 175), apomorphine (Spyraki et al., 1982b 176; van der Kooy et al., 1983 185), bromocriptine (Morency and Beninger, 1986 182), or cocaine (MacKay and van der Kooy, 1985 110; Mucha et al., 1982 125; Spyraki et al., 1982c 162). Some data point to the DA projections to the nucleus accumbens as possibly being the critical component of the brain’ DA systems in this effect. Carr and White (1983, 1986) 37,38 observed a place preference following bilateral microinjections of amphetamine into the nucleus accumbens, but not other DA terminal regions including the medial frontal cortex, several areas of the caudate nucleus, the amygdala, and the region of the area postrema. Furthermore, place conditioning produced by systemic amphetamine was blocked not only by the DA receptor antagonists haloperidol or cis-flupenthixol (Mackay and van der Kooy, 1985 115; Spyraki et al., 1982b 161), but also by bilateral 6-OHDA lesions of the nucleus accumbens that produced DA depletions of up to 90% (Spyraki et al, 1982b 161)

It was found that place preferences produced by systemic cocaine were not blocked by pimozide, haloperidol, or 6-OHDA lesions of the nucleus accumbens (Morency and Beninger, 1986 132; Spyraki et al., 1982c 162). However, recently Morency and Beninger (1986) 132 have demonstrated a place preference following icv cocaine that was blocked by pimozide (see Fig. 12), supporting a role for DA in reward produced by centrally administered cocaine. These results might suggest that place preferences observed following systemic (ip) cocaine are mediated by nondopaminergic substrates, a position reinforced by the finding of place conditioning following systemic treatment with the local anesthetic procaine (Morency and Beninger, 1986 132; Spyraki et al., 1982c 162). One possibility is that ip cocaine or procaine resulted in a local anesthetic effect that reduced possible aversive properties of the injection associated with the drug side of the test box, whereas no such reduction would occur following saline injections and placement in the other side. According to this hypothesis, the preference seen for the drug side would be a preference for a relatively less aversive place (cf., Morency and Beninger, 1986 132). As discussed in section 3.1.3, the learning of an association between a neutral stimulus and an aversive stimulus is not blocked by treatment with DA antagonists. Furthermore, it was shown that once this association is learned, offset of a conditioned aversive CS can act as a reward in its own right (recall that animals failing to acquire the avoidance response while treated with pimozide showed acquisiton when tested drug-free in extinction, when shock no longer occurred). Thus, place preference following ip cocaine and the failure of pimozide to block it may represent a case of aversive conditioning to the side not associated with local anesthetic effects and subsequent reward-related learning, escape from the conditioned aversive stimuli, in the test phase. This account is, of course, speculative, but does provide a basis for understanding the unexpected results in the cocaine and procaine studies.

Fig. 12.
figure 12

Mean (±SEM) percent of time spent on the conditioned side prior to (PRE) and following (POST) the pairing of that side with injections of cocaine. (A) Groups (n = 8) received intraperitoneal (ip) cocaine in a dose of 5.0 mg/kg (■) or ip cocaine plus 1.0 mg/kg pimozide (▲) during conditioning sessions; (B) Groups (n = 8) received intracerebroventncular (icv) cocaine in a dose of 50 μg in 1.0 μL (■) or icv cocaine plus 1.0 mg/kg pimozide (▲) during conditioning sessions (adapted from Morency and Beninger, 1987)

4 Schizophrenia as an Impairment of Incentive Learning

Schizophrenia is a debilitating human disease. It is characterized by a period of active psychotic symptomatology including delusions, hallucinations, or certain disorders of thought and having a duration, with residual phases, of at least 6 mo (Spitzer et al., 1980 159). There is now an impressive body of data that supports the hypothesis that DA may, in some way, hyperfunction in the brains of persons who develop schizophrenia. There are three main lines of evidence. The first is that the medications that have been used successfully to treat schizophrenia are DA receptor blockers. Many of the drugs employed in the studies described in section 3, including pimozide, haloperidol, cis-flupenthixol, and chlorpromazine are in common clinical use as antipsychotics. It has been shown that there is a high and significant correlation between the ability of antipsychotic drugs to block DA receptors in the brain and the average clinical dose for controlling schizophrenia, suggesting that their effectiveness as antipsychotics is related to their ability to block DA receptors (cf., Seeman, 1981 151). The second line of evidence is that DA agonists are psychotogenic. It has been found that chronic treatment with drugs such as (+)-amphetamine or cocaine can lead to acute symptoms that closely resemble those seen in schizophrenia, including disorientation, ideas of reference, and paranoid delusions (Angrist, 1983 2; Connell, 1958 39; Kokkinidis and Anisman, 1981 99; Snyder, 1972 154). These drugs also exacerbate symptoms when given to schizophrenics (Angrist et al., 1980 3; Janowsky et al., 1973 97). It has also been found that Parkinson’ patients taking high doses of the DA precursor l-3,4-dihydroxyphenylalanine may show psychotic symptoms (Brogden et al., 1971 34; Duvoisin, 1984 63). The third line of evidence comes from studies of postmortem brain tissue from schizophrenics. It has been found that the density of DA receptors or the ability of DA receptors to activate the receptor-linked enzyme adenylate cyclase is elevated in the brains of deceased schizophrenics (Cross et al., 1981 44; Lee and Seeman, 1980 103; Memo et al., 1983 119; Seeman et al., 1984 152). One finding that has frequently been cited as contradictory evidence for the DA hypothesis is the observation that antipsychotic drugs, although blocking DA receptors on their first administration, usually do not show good therapeutic effects for many days (e.g., Lipton and Nemeroff, 1978 107). This point will be further discussed below. There have been many reviews of the data relevant to the hypothesis that DA neurotransmisson may be overactive in schizophrenia (Carlton and Manowitz, 1984 36; Crow, 1978,1979 45,46; Lipton and Nemeroff, 1978 107; Mackay, 1980 109; Matthysse, 1974 114; Miller, 1984 121; Pearlson and Coyle, 1983 132; Rupniak et al., 1983 146; Sayed and Garrison, 1983 149; Snyder, 1976, 1981 155,157).

There is an overwhelming literature on the phenomenology of schizophrenia, and any attempt to explain all of the manifestations of this disease in a single framework is probably doomed to failure from the outset. However, as reviewed briefly above, there is a strong empirical basis for the hypothesis that DA may hyperfunction in the brains of schizophrenics (although the genetic and/or environmental etiological factors responsible for this change remain to be worked out; cf., Hemmings and Hemmings, 1978 88). As reviewed in section 3, there is also a strong empirical basis for the hypothesis that DA may mediate incentive learning. Using DA as the link, it may follow that schizophrenia is an impairment of incentive learning. Since DA may hyperf unction in the brains of schizophrenics, it may be that schizophrenics undergo excessive incentive learning and that this learning leads to the behavioral changes seen in the disease

What behavioral consequences might be expected from an overfunctioning of the DA systems? In answering this question, it may be helpful first to briefly review the role of DA in learning. Dopamine appears to be involved in incentive learning, a form of associative learning that occurs when a neutral environmental stimulus is paired with (signals) a rewarding stimulus. This association leads to the acquisition by the previously neutral stimulus of the ability to produce appetitive reactions that include instrumental responses that serve to bring the animal close to the reward (e.g., approach). Thus, incentive learning normally leads to a change in the ability of some environmental stimuli (those that signal reward) to attract the animal, i.e., produce approach responses. If the DA systems were hyperfunctioning, it might be expected that excessive or inappropriate incentive learning may occur. As a result, stimuli that actually do not signal reward will acquire the ability to attract the animal

If schizophrenics undergo excessive incentive learning, it would follow that they might be inappropriately attracted to environmental stimuli that actually do not signal reward. This is exactly what has been found in a large number of psychological studies of schizophrenic patients as reviewed by McGhie (1977) 117. His hypothesis is that schizophrenics experience a widening of attention. This has been seen in observational studies where schizophrenics frequently report difficulty in ignoring sounds or other irrelevant environmental events, and also in empirical studies of attention. For example, schizophrenics recognized tachistos-copically presented pictures faster than controls and were more easily distracted by extraneous stimuli in the performance of vigilence tasks. This apparent widening of attention in schizophrenics can be understood as inappropriate incentive learning possibly resulting from overactive DA neurotransmission

One of the observations from the animal studies reviewed in section 3 was that animals with DA neurotransmission blocked, although failing to show incentive learning, did learn associations between stimuli. For example, tone-shock associations, the relationship between safety-related stimuli and safety, or tone-food associations apparently were learned in spite of the observed failure of incentive learning. This would suggest that DA is not involved in the learning of associations between stimuli. It might follow that in the schizophrenic patient, although putative DA hyperfunctioning may lead to inappropriate incentive learning, associative learning abilities may remain intact. Thus, the schizophrenic may have an intact ability to learn about the relationships among stimuli in the environment, but have an impaired ability to recognize appropriate incentive stimuli because many stimuli are inappropriately acquiring the ability to attract. This perspective might suggest that many of the diagnostic symptoms of schizophrenia result from an appropriate interpretation of the environment by a patient whose DA systems are inappropriately converting neutral stimuli to incentive stimuli. For example, one of Schneider’ (see Pearlson and Coyle, 1983 132) first rank symptoms is that the “patient attributes fixed false idiosyncratic significance to an otherwise unremarkable perception.” For the patient the perception (stimulus) may not be unremarkable because previous overactivity of DA neurons may have led to its becoming an incentive stimulus; since associative learning abilities may be independent of DA and therefore relatively intact, the attribution of significant by the patient is appropriate because the stimulus actually is “attractive.”

Finally, if schizophrenia is an impairment of incentive learning, the delayed onset of action of antipsychotic drugs can be understood with reference to the results of numerous studies. It has been seen in many paradigms (section 3) that once incentive learning has occurred, DA receptor blockers do not lead to an immediate cessation of responding, but rather a gradual decline. Thus, following food-rewarded lever-press training or shock avoidance learning, pimozide treatments did not block the operant response on the first test day, but were observed to lead to a gradual decline with repeated testing over days. In fact, avoidance responding in trained animals was observed to take approximately 10 d to return to pretrained levels (Fig. 7). Schizophrenic patients treated with DA receptor blockers usually require at least a week before therapeutic effects are seen (cf., Lipton and Nemeroff, 1978 107)

In conclusion, the discussion in this section provides possible links between schizophrenia and incentive learning, based on empirical findings of a role for DA in incentive learning and the observation of hyperfunctioning of DA in schizophrenia. The possible anatomical location and synaptic changes occurring in incentive learning are beyond the scope of this paper, but have been discussed elsewhere (Beninger, 1983) 13. It is hoped that with continued psychopharmacological studies of DA function employing drugs selectively affecting DA receptor subtypes and central injection techniques for applying drugs locally to regions of the brain, it will be possible to further refine understanding of the role of DA in learning. The application of these basic research findings to the understanding of human disorders thought to involve impaired DA systems, including schizophrenia and Parkinson’ disease, provides a continuing challenge to psychopharmacologists and the possibility of making genuine progress in evolving better methods of treatment