Introduction

The striatum contributes to decision making, voluntary movement, action selection, behavioral flexibility, and dynamic updating during learning [1,2,3,4,5,6]. Among these functions, the striatum supports specific aspects of action learning and selection, with roles in learning, including Pavlovian and instrumental conditioning [7,8,9]. To perform these functions, the striatum integrates information from cortical, limbic, and thalamic afferents with dopaminergic inputs that convey reward and salience signals to facilitate actions that maximize desired outcomes [5, 10,11,12]. Dopamine contributes to general locomotor activation and hedonic functions, as well as executive functions involved in decision making and precise temporal action control [13,14,15,16,17,18,19]. Parkinson’s disease (PD) patients suffer severe deficits in movement control and voluntary action initiation due to the loss of nigrostriatal dopamine [14, 20,21,22]. Interestingly, PD patients can often perform movements in response to environmental or physical stimuli [23,24,25], indicating that nigrostriatal dopamine most strongly influences voluntary movement initiation and performance.

Dopamine modulation of striatal neuronal activity and synaptic transmission influences action initiation and termination [19, 26,27,28,29] and may affect action maintenance and bias future action selection [30, 31]. Reduced dopaminergic tone and/or signaling results in motor deficits [32, 33] and reduced effortful choices in decision making tasks [17, 34, 35], suggesting that dopamine plays a critical role in modulating the strength of voluntary actions. Dopamine modulates striatal neuronal function via activation of different receptor subtypes, including the Gi/o-linked D2Rs [36, 37]. Global D2R deletion results in the reduction, slowing, and decreased initiation of movement [38, 39]. Pharmacological studies suggest D2R roles in effort-based decision making and flexibility in the face of increasing task demand [16, 17, 34, 40,41,42,43,44,45], and the dopamine system is implicated in the temporal control of action sequences. The specific striatal cellular loci of the D2Rs mediating these behaviors is not known. Also, little is known about striatal D2R roles in behavioral responses to conditioned and unconditioned environmental stimuli. Striatal D2Rs are expressed strongly by the indirect pathway-projecting medium spiny neurons (iMSNs) and cholinergic interneurons [46, 47], where they modulate neuronal activity as well as local and distal neurotransmitter release. However, D2Rs also modulate neurotransmitter release at cortical and dopaminergic afferents in striatum [15, 46, 48]. Targeted deletion of iMSN-D2Rs results in the reduction and decreased speed of motor output in a task-specific manner [49], and thus these reductions cannot simply be explained by generalized motor impairments. Recent evidence suggests that iMSN-D2Rs can modulate effort and energy expenditure during food foraging [50]. This suggests that iMSN-D2Rs may signal moment-by-moment changes in phasic dopamine induced by reward predictors and/or increased task demand [51].

Despite these findings, more information is needed to differentiate iMSN-D2R contributions to action initiation and vigor in specific tasks from their role in general locomotion. For example, it remains unclear whether iMSN-D2Rs are needed for proper initiation of self-initiated versus stimulus-signaled actions. In addition, there is little information as to whether iMSN-D2R roles in action control vary with positively versus negatively reinforced actions and are related to the effort requirement of the task. Finally, it is not clear if iMSN-D2Rs contribute to associative learning. To address these questions, we used mice in which D2Rs were constitutively removed from iMSNs to investigate the receptor role in action learning, initiation, and execution in self-initiated instrumental and maze learning tasks and stimulus-generated Pavlovian conditioning. Our findings indicate distinct roles for iMSN-D2Rs in action initiation in self-initiated tasks independent of task outcome, despite learning being intact. However, slower action initiation was only evident under conditions where increased effort was required for task completion in the instrumental learning paradigm. In contrast, iMSN-D2Rs appear to contribute very little to acquisition and extinction during simple Pavlovian conditioning. Our findings highlight the functional heterogeneity in iMSN-D2R modulation of action control based on factors involved in action initiation and task effort requirements.

Materials and methods

Animals

All experiments and animal care were performed in compliance with the National Institutes of Health Care and Use of Animals guidelines and approved by the Institutional Animal Care and Use Committees of the National Institute on Alcohol Abuse and Alcoholism and the National Institute of Diabetes and Digestive and Kidney Diseases. Adora2a Cre and Drd2loxP/loxP were purchased from MMRRC (036158-UCD) and Jackson laboratory (020631), respectively. Drd2loxP/loxP mice, which have a floxed exon 2 dopamine D2 (Drd2) allele, were bred with Drd2loxP/loxP Adora2a Cre mice to generate iMSN-Drd2KO mice as previously described [49, 52, 53]. Mice were group-housed and maintained on a 12-h light/dark schedule in a temperature and humidity-controlled environment, and given free access to food and water, unless otherwise stated. All experiments were done during the light cycle, except the food preference test. Mice (9–15 weeks) of both sexes were used.

Operant conditioning chambers

Mice were trained in Med Associates (VT, USA) operant chambers, enclosed in sound-attenuating and light-resistant ventilated boxes. Each chamber contained two retractable levers on either side of the food receptacle, in which a food pellet (Bio-Serv, NJ, USA; formula F0071) or 20 µl of 20% sucrose solution was delivered. The food receptacle was equipped with a lickometer to measure consumption.

Instrumental self-paced lever-press training

Mice were food restricted to ~90% of their initial body weight for the entire experiment. Mice were first trained on a random interval (~60 s) schedule for food delivery. The following day, mice began training with a single lever (left or right counterbalanced for genotype and extended throughout the session) on a fixed-ratio schedule of 1 (FR1) for 5 days followed by a fixed-ratio schedule of 5 (FR5) for 9 days. Training sessions ended after 60 min or 30 rewards were earned, whichever occurred first. Note that no predictive or discriminative stimuli were included in training, and thus the training and performance was self-initiated and self-paced.

Water T-maze test

Mice were tested for short-term spatial and working memory using a water T-maze [54], which consisted of two L-shaped pieces of opaque white acrylic (18.5-cm long, 25-cm tall) placed in a bin (42 × 23.5 cm) filled with 23 °C water, ~13 cm deep. The water was made opaque with nontoxic white acrylic paint to prevent mice from seeing the hidden platform, and the T-maze was encircled by white curtains to prevent mice from using contextual cues to navigate. Prior to training, mice were placed in the T-maze and allowed to explore for 60 s. The first arm each mouse entered was recorded, and the platform was placed in the opposite arm. The following day, acquisition trials began (10 trials/day), where mice were placed at the base of the T-maze and allowed to swim to the platform (5 × 5 × 12 cm). Each trial lasted 60 s, or until mice remained on the platform for 5 s. Trials were considered successful if a mouse located the platform within 60 s, or otherwise were considered failures. Mice failing to reach the platform within 60 s were gently nudged towards the platform and allowed to sit in place for 10 s before returning to their cage. All mice were dried off and returned to home cages between trials, with an intertrial interval (ITI) of 10 min. Acquisition sessions continued until 80% of the daily trials were successful for 4 consecutive days, at which point individual mice proceeded to reversal learning. Reversal learning trials (10 trials/day) were run identically to acquisition trials, but the location of the hidden platform was reversed for each mouse. Reversal sessions continued until 80% of trials were successful for 2 consecutive days. All trials were video recorded and analyzed using EthoVision XT tracking system (Noldus, Netherlands). Errors were counted if the mice (1) entered the incorrect arm or (2) left the correct arm without locating the platform.

Pavlovian training

Mice were trained in the operant chambers previously described on a cue-induced licking Pavlovian task. The mice were food restricted to ~90% of their initial body weight for the entire experiment. Mice were first trained to approach/consume sucrose on a 90 s random delivery schedule. Next, mice were trained for 15 daily sessions by pairing presentation of a conditioned stimulus (CS+) with sucrose delivery.

A non-reward-paired conditioned stimulus (CS−) was also presented. Each training session (~1 h) consisted of 24 trials (12 reinforced CS+/12 non-reinforced CS−) of a 20-s presentation of either a 3-kHz tone or white noise (70 dB) presented on a pseudorandom ITI of 90 s with 12-s step increments. For half the mice, the tone was the CS+ and the white noise was the CS−, while the remaining mice received the opposite contingency. For CS+ presentation, the sucrose was delivered twice during two of the first three 5-s epoch in a pseudorandom sequence determined by the computer. Sucrose was not delivered during the last 5-s epoch to ensure that there was enough time for sucrose consumption during the CS+ presentation. Sucrose delivered during the first 5-s epoch was delayed 3 s. Trials were not initiated by the mice, nor were the sucrose deliveries contingent on their behavior. After 15 conditioning sessions, an extinction test was given that was similar to the conditioning sessions except for the absence of sucrose delivery during the CS+. Licks and head entries into the sucrose port during the CS+, CS−, and ITI during the Pavlovian sessions and extinction test were measured to assess learning.

Accelerating rotarod

Mice were habituated to a static rotarod (Med Associates), then, trained for 7 days with four trials per day on an accelerating rotarod (4–40 rpm in 300 s). The ITI was 5 min. The latency to fall was recorded. The trial was terminated at 300 s or if the mouse held onto the rod for one complete rotation.

Statistical analysis

Statistical tests were performed in GraphPad Prism, using repeated-measure ANOVA with the post hoc Sidak’s test for multiple factors comparison and student’s t tests when appropriate. Results were considered significant if p < 0.05. Averaged data are presented as mean ± SEM.

Results

iMSN-D2R deletion “slows” lever press responding at higher cost, and impairs action initiation

Mice carrying a loxP-flanked Drd2 gene were bred with Adora2A Cre+/ mice to generate iMSN-selective D2R knockout mice (iMSN-Drd2KO) and the appropriate littermate controls (Drd2loxP/loxP). Our group and others have shown that this breeding scheme results in 80–90% reduction in striatal Drd2 mRNA expression [49, 53]. In striatum, there was decreased striatal D2R immunoreactivity as would be expected for loss of the major cellular source of D2Rs in this region. No immunoreactivity reduction was observed in the midbrain, an area that expresses high levels of D2R but not Adora2A Cre [55] (Fig. S1).

We trained the iMSN-Drd2KO and littermate Drd2loxP/loxP mice to press a lever for food reward on fixed-ratio schedules, FR1 and FR5. However, as the cost of reward increased under the FR5 schedule, the iMSN-Drd2KO mice pressed less (Fig. 1a, RM ANOVA, main effect of genotypes F1,11 = 6.78, p = 0.02; FR5 training × genotype interaction F8,88 = 1.26, p = 0.28) and at a slower rate, for food reward (Fig. 1b, RM ANOVA, main effect of genotypes F1,11 = 11.24, p = 0.006; FR5 training × genotype interaction F8,88 = 1.24, p = 0.29). There was a significant escalation of lever-press rate on the FR5 schedule in the Drd2loxP/loxP later in training compared to the start of FR5 (t-test, p < 0.05 Day 8, p > 0.05 Day 2–7, 9). This level of escalation was not seen in the iMSN-Drd2KO mice (t-test, Day 2–9 p > 0.05). To further understand the effects of iMSN-D2R signaling on the initiation of lever pressing, we quantified the latency to initiate the first press. There was a negligible increase in the latency to press the lever on the FR1 schedule in iMSN-Drd2KO mice relative to Drd2loxP/loxP mice (Fig. 1c, t-test, p > 0.05). The deficit was more pronounced and was significant on the FR5 schedule (Fig. 1c; t-test, p < 0.0001). On the FR1 schedule, there was no difference in total lever presses (Fig. 1a, RM ANOVA main effect of genotypes F1,11 = 0.01, p = 0.92; FR1 training × genotype interaction F4,44 = 0.18, p = 0.95) but a small interaction for lever press responding across training (Fig. 1b, RM main effect of genotypes F1,11 = 1.07, p = 0.32; FR1 training × genotype interaction F4,44 = 3.09, p = 0.03) between genotypes. The latency to initiate lever pressing decreased with training in the Drd2loxP/loxP on the FR1 (Fig. 1d, FR1 t-test, p < 0.05) and not on the FR5 schedule (Fig. 1d, FR5 t-test, p > 0.05). The iMSN-Drd2KO mice decreased their latency to initiate lever-pressing behavior on the FR1 (Fig. 1e, t-test, p < 0.01), but not on the FR5 schedule (Fig. 1e, t-test, p > 0. 05). On Day 1 of FR1 training, the iMSN-Drd2KO mice took longer to initiate lever pressing compared to Drd2loxP/loxP (Fig. 1d, e, t-test, p < 0.05). Across multiple days of FR training, the Drd2loxP/loxP mice significantly decreased their latency to first lever press on the last day of FR5 training compared to the beginning of FR1 training (Fig. 1d, t-test, p < 0.01), suggesting that exploratory behavior contribution is minimal and/or significantly reduced after multiple training days. A similar pattern is seen in the iMSN-Drd2KO mice, with a significant decrease in the latency to initiate lever-pressing behavior from the first day of FR1 compared to the last day of FR5 training (Fig. 1e, t-test, p < 0.01). In the iMSN-Drd2KO mice, latency to first press was similar to that observed at the end of FR1 training and did not decrease further during FR5 training (Fig. 1e, t-test, p > 0.05), remaining higher than in Drd2loxP/loxP mice. We also examined the latency to press following reward retrieval (Fig. 1f, g). The iMSN-Drd2KO mice took more time to initiate a new lever-pressing sequence on the FR5 schedule (Fig. 1g, RM ANOVA, main effect of genotypes F1,11 = 11.12, p = 0.0067), and this delayed initiation persisted with no improvement during training (RM ANOVA main effect of FR5 training F8,88 = 1.17, p = 0.33). There was no significant difference between groups during FR1 training (Fig. 1f, RM ANOVA main effect of genotypes F1,11 = 2.55, p = 0.14), although the latency was a bit longer on the last few days of FR1 training. There was a signficant effect of training (RM ANOVA main effect of FR1 training F4,44 = 4.58, p = 0.004). The iMSN-Drd2KO mice also had longer inter-press intervals on the FR5 (Fig. 1h, t-test, p < 0.0001), but not the FR1 schedule (Fig. 1h, t-test, p > 0.05). The length of time to complete a five-press sequence on the FR5 schedule was increased in the iMSN-Drd2KO mice (Fig. 1i, t-test, p < 0.0001). Also, the iMSN-Drd2KO mice completed fewer FR5 sequences than the Drd2loxP/loxP mice (Fig. 1j, t-test, p < 0.001). On the FR1 schedule, all mice earned maximum reward pellets per session (Fig. 1k, t-test, p > 0.05). However, iMSN-Drd2KO mice decreased the number of pellets earned on the higher cost FR5 schedule compared to Drd2loxP/loxP mice (Fig. 1k, t-test, p < 0.0001).

Fig. 1: iMSN-D2R deletion produces deficits in instrumental lever-press performance.
figure 1

a iMSN-Drd2KO (n = 6; orange circles) mice displayed decreased lever presses compared to Drd2loxP/loxP (n = 7, gray circles) mice in the FR5 component of an escalating fixed-ratio scheduled task. b The rate of responding was also decreased during FR5 learning and performance in the iMSN-Drd2KO mice. ce iMSN-Drd2KO mice had increased latency to initiate lever-pressing behavior at the beginning of each session and f, g after reward retrieval. h The iMSN-Drd2KO mice had longer inter-press interval, i, j took longer to complete five sequential presses and completed fewer sequences than controls on an FR5 schedule. k iMSN-Drd2KO mice earned fewer pellets on the FR5 schedule. l However, the iMSN showed similar reward retrieval times after lever press(es). Data expressed as mean ± SEM. ****p < 0.0001, ***p < 0.001, **p < 0.01, *p < 0.05, including post hoc tests.

To ensure that differences in lever-press behavior were not driven by reduced preference, a food pellet preference test was administered. Both genotypes preferred food reward over standard chow to a similar degree in this test (Fig. S4, t-test, p > 0.05). As an additional measure of learning, the latencies between the lever press and reward retrieval were compared between genotypes. As training progressed and the animals learned the contingency, one would expect a decrease in the reward retrieval time. Indeed, the latency to retrieve reward following lever press(s) decreased with training (Fig. 1l, RM ANOVA main effect of training F3,30 = 11.13, p < 0.0001). There was no significant difference in the latencies to retrieve reward after lever press(s) between genotypes (Fig. 1l, RM ANOVA training × genotype interaction F3,30 = 2.78, p = 0.06). These results indicate that iMSN-D2Rs play a critical role in determining the pace of self-initiated responding in an effort-dependent manner.

Action initiation is impaired in a spatial memory task in iMSN-Drd2KO mice

It was previously demonstrated that iMSN-Drd2KO mice can learn specific behaviors, suggesting that iMSN-D2Rs contribute to task-specific deficits and not generalized motor deficits [49]. Those authors also noted that iMSN-Drd2KO mice showed no deficits in the forced swim assay, despite a profound bradykinesia in the open field. Consistently, iMSN-Drd2KO mice showed no deficits in the forced swim test (Fig. S5). We took advantage of this to avoid the confound of the motor impairment on tests of learning and tested the spatial learning ability of these mice using a water submerged T- maze test (Fig. 2a). During the acquisition phase, the iMSN-Drd2KO mice took longer to reach the escape platform across the training days (Fig. 2b, c, RM ANOVA main effect of day F2,26 = 8.71, p = 0.0013, main effect of genotype F1,13 = 22.28, p = 0.0004, day × genotype interaction, F2,26 = 9.88, p = 0.0006). However, the iMSN-Drd2KO mice did not make more errors than the Drd2loxP/loxP mice (Fig. 2d, RM ANOVA main effect of genotype F1,13 = 1.08, p = 0.32). Rather, the iMSN-Drd2KO mice took longer to initiate each trial than Drd2loxP/loxP (Fig. 2e, RM ANOVA main effect of genotype F1,13 = 29.42, p = 0.0001). Once swimming was initiated, the iMSN-Drd2KO mice completed the trials in similar time as Drd2loxP/loxP (Fig. S6), confirming that the deficit in task performance was not due to a motor impairment or learning impairment, but was specific to the initiation of swimming at the start of the trial. This delay in action initiation worsened with increased training (RM ANOVA main effect of day F2,26 = 20.07, p < 0.0001, day × genotype interaction, F2,26 = 15.08, p < 0.0001). Mice were then trained in a reversal version of the task in which the location of the rewarded arm was switched. During reversal, a similar pattern emerged, where iMSN-Drd2KO mice took longer to reach the new escape platform than Drd2loxP/loxP mice (Fig. 2f, RM ANOVA main effect of genotype F1,13 = 32.19, p < 0.0001). The iMSN-Drd2KO mice did not improve across days (RM ANOVA main effect of day F1,13 = 1.94, p = 0.1875), but the iMSN-Drd2KO mice learned similarly to Drd2loxP/loxP (Fig. 2g, RM ANOVA; main effect of genotype F1,13 = 0.02, p = 0.8839). The iMSN-Drd2KO mice displayed a similar deficit as seen during initial learning, in which these mice took longer to initiate swimming to the new location (Fig. 2h, RM ANOVA; main effect of genotype F1,13 = 32.19, p < 0.0001). Collectively, the results indicate that action initiation in the iMSN-Drd2KO is impaired, while learning and cognitive flexibility are intact.

Fig. 2: iMSN-Drd2KO mice show delayed action initiation in a cognitive task.
figure 2

a Schematic diagram of the water T-maze. b Representative heat maps across days during the acquisition (initial) training of both genotypes. c Average time to complete trials across training days for the Drd2loxP/loxP (gray bar; n = 6) and iMSN-Drd2KO (orange bar; n = 9) mice during the acquisition phase. d Total number of errors. e Latency to leave entry (start) arm. f Average time to complete trials during reversal learning training. g Total number of errors and h latency to leave entry position to find new escape platform. Data displayed as ±SEM.

Pavlovian acquisition and extinction learning are intact in iMSN-Drd2KO mice

To investigate whether the response/action initiation slowing seen in self-initiated tasks was associated with task demand or conditioning paradigm, we trained iMSN-Drd2KO mice in a Pavlovian paradigm, in which cue presentation signaled sucrose availability. Very little movement was required in this task. Mice were trained on an auditory CS task with two 20-s cues, one of which predicted reward delivery (CS+) and the other did not (CS−) (Fig. 3a). Both genotypes learned to discriminate cue presentations. Consumption of CS+-paired sucrose reward proved to be a more reliable measurement of learning than head entries (Fig. S7). Total consumption during the CS+ increased with training compared to a lower level of stable CS− responding in both genotypes (Fig. 3b, c, Drd2loxP/loxP, RM ANOVA main effect of training F3.91,78.21 = 10.29, p < 0.0001, main effect of cue presentation F1,20 = 239.5, p < 0.0001; training × cue interaction, F14,280 = 12.11, p < 0.0001; iMSN-Drd2KO, main effect of training F5.627,101.3 = 5.12, p = 0.0002, main effect of cue presentation F1,18 = 124.4, p < 0.0001; training × cue interaction, F14,252 = 6.11, p < 0.0001) during Pavlovian acquisition. Total CS+-associated reward consumption per session and frequency of consumption were higher in Drd2loxP/loxP mice than in iMSN-Drd2KO (Fig. 3b, c, total consumption: Drd2loxP/loxP 891 ± 50.74, iMSN-Drd2KO 700.2 ± 38.19; t-test, p < 0.01, RM ANOVA main effect of training F19,266 = 14.23, p < 0.0001; consumption frequency: see Fig. S8). There was no genotype difference in the time it took mice to approach the food receptacle and initiate licking (Fig. S9, RM ANOVA main effect of genotype F1,19 = 0.81, p = 0.38), indicating no slowed movement in this low-motor-demand task. Distinct patterns of CS+ responding emerged after 15 days of conditioning compared to Day 1 in both genotypes (Fig. 3d, e). Even on Day 1, both genotypes increased their licks/min during CS+ versus CS− trials (Drd2loxP/loxP, RM ANOVA main effect of cue presentation F1,20 = 42.36, p < 0.0001; iMSN-Drd2KO, RM ANOVA main effect of cue presentation F1,18 = 40.87, p < 0.0001). There was no significant difference in CS+-associated licks/min early (trials 1–5) compared to late in training (trials 8–12) on Day 1 in the iMSN-Drd2KO (early 154.1 ± 27.93; late 170.5 ± 11.55, t-test, p > 0.05) and Drd2loxP/loxP (early 108.8 ± 27.33; late 185.5 ± 26.33, t-test, p > 0.05) mice. In addition, CS+-associated responding was similar between genotypes across the trials (Drd2 loxP/loxP, 145.0 ± 18.23; iMSN-Drd2KO, 162.0 ± 12.05, t-test, p > 0.05). On Day 15, consummatory rates were higher during the CS+ (Fig. 3d, e; blue region) compared to Day 1 in both the iMSN-Drd2KO (234.8 ± 14.23, t-test, p = 0.002) and Drd2loxP/loxP (285.9 ± 11.47, t-test, p < 0.0001) mice. Although the data indicate that iMSN-Drd2KO mice have learned the contingency associated with CSs, the frequency of consummatory behavioral responding was lower in the iMSN-Drd2KO mice on Day 15 compared to Drd2loxP/loxP (t-test, p < 0.05).

Fig. 3: iMSN-D2R deletion does not impair the Pavlovian acquisition and extinction learning.
figure 3

a CS task schematic diagram. CS+ cue presentation signaled sucrose reward availability. Only during the patterned light blue area (Epoch 1–3) was sucrose delivered twice during CS+ presentation. Sucrose was not delivered during the last epoch, dark blue area. No sucrose deliveries were made during CS− cue presentation. Licks were recorded during the CS+, CS−, and ITI trial periods. b, c Total licks across training session for CS+ and CS− for the Drd2loxP/loxP (n = 11; gray circles) and iMSN-Drd2KO (n = 10; orange circles) mice. d, e Average licks per minute during CS presentations (blue vertical bars), CS+ and CS-, and the ITI trial periods (white vertical bars- black circles) for day 1 and 15 of Pavlovian conditioning (Pav) and extinction-day 16 (Ext) in both genotypes. Data shown as ±SEM. ITI = intertrial interval.

Since both genotypes discriminated between the CS+ and CS−, we examined extinction learning to differentiate conditioning from habitual responding. During extinction, conditioned sucrose seeking was measured by presenting the CS+/CS− without reward delivery. There was a significant reduction in the total CS+-associated consumption during extinction on Day 16 testing compared to the last day of Pavlovian conditioning (Day 15) in both genotypes (Fig. 3d, e; Drd2loxP/loxP, 54.77 ± 10.17, t-test, p < 0.0001; iMSN-Drd2KO, 82.36 ± 15.58, t-test, p < 0.0001), and this reduction was similar across genotypes (t-test, p > 0.05). Overall, there was a decrease in the frequency to consume, and consumption during extinction in response to the former CS+, suggesting that the iMSN-D2R deletion may not play a role in Pavlovian extinction.

Skill learning is impaired in iMSN-Drd2KO mice

To determine if iMSN-Drd2KO mice show altered skill learning, we examined the performance of Drd2loxP/loxP and iMSN-Drd2KO mice on the accelerating rotarod. As published [49], the iMSN-Drd2KO mice showed impaired performance on the rotarod compared to Drd2loxP/loxP mice (Fig. 4a, RM ANOVA; main effect of trial latency F27,405 = 5.35, p < 0.0001; main effect of genotype F1,15 = 41.67, p < 0.0001, trial latency × genotype interaction, F27,405 = 3.440, p < 0.0001). The iMSN-Drd2KO mice showed decreased latency to fall on all trials across training (Fig. 4b, RM ANOVA; main effect of genotype F1,6 = 40.01, p = 0.0007). These findings indicate that iMSN-D2Rs may be important in action learning, including fine motor control.

Fig. 4: Targeted deletion of iMSN-D2Rs impairs skill learning.
figure 4

Latency to fall from the accelerating rotarod in the Drd2loxP/loxP (n = 8, gray circles) and iMSN-Drd2KO (n = 9, orange circles) mice within trials (a) and averaged trials (a) across days. Error bars = SEM.

Discussion

Among the proposed roles for striatal dopamine and dopamine receptors is control of action learning and performance. We sought to determine if iMSN-D2Rs contribute to these behaviors using several tasks that involve different actions, with different task demands and outcomes. The tasks were chosen based on striatal involvement, as well as roles for dopamine, in instrumental and Pavlovian learning, and variants of the water maze task. Use of the separate tasks allowed us to determine if impairments were specific to behaviors driven by positive reinforcement (instrumental and Pavlovian conditioning) versus negative reinforcement (water maze). We specifically chose self-paced instrumental conditioning as this task requires animals to initiate action sequences without the aid of predictive stimuli, and thus might be especially susceptible to impairment of mechanisms involved in controlling effort. Self-initiated actions are also impaired in PD patients and dopamine-depleted animals and thus we could also determine if iMSN-D2Rs might have roles that are disrupted in these depleted states. Our findings, and previous studies [49, 50, 52] indicate that the major role of iMSN-D2Rs is in action vigor and initiation, and this role becomes more prominent with greater task demand requirements.

In the self-paced instrumental task, the observation that lever-pressing was altered under the FR5, but not the FR1 schedule, indicates that iMSN-D2Rs are not required for learning or performing a new action but are recruited when the effort required to obtain reward is increased. These findings may also indicate that iMSN-D2Rs are necessary for sustaining repetitive actions. The increased latency to initiate lever-press sequences in the instrumental task and swimming toward the goal in the water maze task also supports a role for iMSN-D2Rs in action initiation. This idea aligns with the findings in open-field experiments where the mice show less movement initiation and overall movement. Thus, these mice show hypokinesis, “slowness” and decreased action initiation regardless of the motivational aspects of the task.

Our findings may also be consistent with a recent study indicating altered responding in a home cage progressive ratio (PR) task [50], indicating that iMSN-D2Rs can override “thrifty” behaviors [50]. However, we did not observe reduced responding or breakpoint in a single-session PR task (Fig. S3). The differences in the PR tasks likely account for the different effects of iMSN-D2R knockout. In the Mourra et al. study, mice obtained all their food via the PR task in their home cage. The PR contingency was reset if mice failed to make the requisite number of responses, allowing them to obtain food at a lower response cost after the reset. This appears to be the case in the iMSN-Drd2KO mice. In the present study, the single-session PR task ended when mice failed to make the requisite responses within 1 h. The task demand in this PR design requires eventual high effort in both groups of mice, perhaps obscuring small differences in response rate under very-high-demand circumstances. The findings from Mourra et al. and the present study have the common feature that responding is slower under conditions in which animals experience a higher-demand task. In our FR5 paradigm, this experience is likely evident early in training, as mice increase their pressing rate above FR1 levels during Day 1 of FR5 training. In the home cage, continuous PR task used by Mourra et al. [50], the increased demand and alternative resetting strategy become evident only after breakpoints have been achieved. Nonetheless, in both tasks animals decrease response rate when faced with the greater effort requirement. The view that this represents thriftier behavior makes sense in the context of responding for food reward, although the strategy is less than optimal in the FR5 task as iMSN-Drd2KO mice never receive as many rewards as controls even after several days of FR5 training. Thus, loss of signaling by this receptor can result in suboptimal thrift. Alternatively, receptor loss may simply set an effort (i.e., response rate) barrier beyond which animals cannot work, and this will result in suboptimal action production in a task with set criteria, but not when animals have the option to revert to an easier path to reward. It should also be noted that slowed responding and action initiation extends beyond the case when mice are seeking food, as receptor loss clearly slows actions driven, at least in part, by negative reinforcement in the water T-maze task. Overall, our findings indicate that the iMSN-D2Rs are required to invigorate self-initiated behavior in a variety of settings, consistent with past pharmacological studies that could not target a specific cellular D2R population [16].

The response rate decrease in the iMSN-Drd2KO mice in the instrumental task indicates that the receptor regulates the vigor with which animals carry out a newly learned behavior. The observation that slower action initiation was only seen with increased task demand indicates that iMSN-D2Rs may contribute to neuronal signals involved in anticipation of the effort necessary to complete a task [56, 57], as well as the action execution itself. Indeed, the findings of Mourra and coworkers indicate that iMSN-Drd2KO mice restart PR tasks at higher rates than controls (i.e., show lower breakpoint), and this resetting appears to be enhanced with increasing PR task demand [50]. This finding is consistent with iMSN-D2R involvement in enhancing the effort needed for task completion once requirements are understood. In fact, some iMSN-Drd2KO mice showed large increases in the IPI on the last day of FR5 training, which may be indicative of “giving up” (Fig. S2). The observation that iMSN-Drd2KO mice earn fewer rewards when the cost is increased might suggest a reduction in the motivation to work for food reward. However, we didn’t observe reduced breakpoint in a single-session PR task (Fig. S3). Thus, it is the decrease in effort (i.e., response rate) under higher task demand that accounts for altered performance.

Mice lacking iMSN-D2Rs showed little impairment in either learning or performance in the Pavlovian paradigm despite the proposed roles of dopamine in reward-based learning [58]. Studies of PD patients and dopamine-depleted animals [22, 59,60,61,62,63,64] indicate that loss of dopaminergic transmission impairs self-initiated actions while actions associated with specific environmental events are spared. The lack of performance deficits in the Pavlovian task is consistent with a role for dopamine in self-initiated, well-learned actions. However, the lack of impairment of learning in the iMSN-Drd2KO mice is somewhat surprising given the evidence that iMSNs encode and regulate reward and aversion-based responses [65,66,67]. In addition, the activity of dopaminergic neurons provides a well-known reward prediction error signal [10], iMSNs respond to outcome presentation in Pavlovian conditioning [67], CS-induced dopamine increases are observed following Pavlovian conditioning [68], and dopaminergic neuron activation produces Pavlovian learning by providing both incentive value and movement invigoration signals [69, 70]. Our findings indicate that the iMSN-D2Rs have no role in the former process, and only a limited role in the latter function for a well-learned CS-driven behavior. This finding also agrees with the observations of Kelly et al. [39]. There is evidence that different dopaminergic inputs and striatal subregions have larger roles in different types of conditioning, with the ventral tegmental-nucleus accumbens (NAc) systems playing a prominent role in Pavlovian associations and the dorsal striatum (DS) controlling instrumental action learning and production [5, 58, 71,72,73]. A recent study indicated that D2Rs act to refine discrimination learning in a Pavlovian task via a role in plasticity at synapses onto iMSN [73]. There appeared to be no D2R role in acquisition or extinction, consistent with our findings. It must be noted that these investigators did not explicitly examine the role of iMSN-D2Rs in this learning deficit, and it is known that D2Rs on cholinergic neurons can influence plasticity at iMSN synapses [53, 74]. We must emphasize that our findings cannot be extrapolated to indicate no role for iMSN-Drd2KOs in associative learning. We have simply provided evidence that basic instrumental and Pavlovian associations, as well as Pavlovian extinction, are intact with receptor loss.

The deficits in the rotarod task might suggest a learning-related role for iMSN D2 receptors in agreement with published findings [49] that mice lacking these receptors do not improve their performance over several days of training in this skill. However, this task requires constant physical exertion and thus it is difficult to disentangle learning from effort. Notably, we did not observe a deficit in rotarod performance in the iMSN-Drd2KO mice on the first training day, only on subsequent days, consistent with the observation in other tasks that the performance of these mice plateaus at some level of task demand. In contrast, a previous study found a deficit in initial rotarod performance [39]. However, this is most likely due to the procedures used, as this group provided mice with experience at a fixed speed prior to training on the accelerating version of the task, giving the wild-type mice opportunity to improve, while we only habituated mice to an immobile rod prior to training. The findings in the rotarod task support the idea that iMSN-D2Rs regulate the effort that mice will expend in a movement-based task. However, it would be interesting to examine performance in a skill-learning task in which learning can more easily be distinguished from performance.

The most prominent physiological consequence of the loss of iMSN-D2Rs is removal of the suppression of GABA release at synapses in the Globus Pallidus external segment (GPe) and at local collaterals in striatum. The net effect on striatal output of this loss will be increased indirect pathway output relative to the direct pathway due to enhanced GABAergic inhibition in GPe and increased collateral inhibition of direct pathway MSNs (dMSNs) [49]. These mechanisms have been implicated in bradykinesia observed in the iMSN-Drd2KO mice [49]. However, iMSN-D2Rs have other cellular roles, including inhibition of adenylyl cyclase/cAMP production. Activation of cAMP signaling increases iMSN activity [75], and thus the loss of D2 suppression of signaling would likely increase iMSN/indirect pathway output. However, Lemos and coworkers observed decreased activity of both d- and iMSNs in the iMSN-Drd2KO mice [49]. Thus, the changes in striatal activity with iMSN-D2R loss are more complicated than predicted by changes in cAMP actions, and likely involve greater collateral inhibition of the dMSNs. Nonetheless, the decreased GPe neuronal activity in these mice is consistent with increased indirect pathway function [49]. The D2 inhibition of cAMP signaling also contributes to short-term modulation and plasticity at glutamatergic synapses that result in decreased activation of iMSNs [48, 53, 76,77,78,79]. Losing this iMSN-D2R modulation and plasticity would also enhance indirect pathway output. The iMSN-D2R influence on LTD at glutamatergic synapses may contribute to action learning and performance. However, our findings indicate that simple associative learning was not altered in the iMSN-Drd2KO mice, suggesting that this mechanism has minimal overall influence on task learning, or that LTD has a larger influence on action initiation/performance or more complex associations such as those studied by Iino et al. [73]. iMSN-D2Rs may have important roles in patterning and specificity of striatal output needed for optimal action performance both in the short-term and with extended experience. While more work is needed to assess iMSN activity and GABA release in vivo following iMSN-D2R loss, the available data indicate that receptor loss will increase indirect pathway output resulting in impaired action initiation perhaps involving impaired ability to select actions.

We did not examine if iMSN-D2Rs in different striatal subregions have different roles. Knockdown of D2Rs in NAc mimicked the open-field locomotion phenotype but not rotarod deficits in the study by Lemos and coworkers [49], suggesting that regional differences are important for performance of different actions. In addition, NAc dopamine signaling has been shown to be required for effort exertion when behavioral task demands are high [80], but may not be needed when demands are low [81,82,83]. Accumbens D2Rs may facilitate this willingness to work during high cost responding [84]. While it is likely that the slowing and initiation deficits seen in the iMSN-Drd2KO mice under higher task demand in our self-paced instrumental task involve the DS given the role of this subregion in instrumental learning [5, 72], we cannot rule out NAc contributions.

Our findings are consistent with a growing literature [85,86,87,88] indicating that one of the major roles of striatal dopamine is to invigorate performance of self-initiated actions. This stimulation of effort can be adaptive when faster or more concerted effort is advantageous to minimize exposure (e.g., during foraging behavior) or is needed for successful task completion. Interestingly, the iMSN-D2Rs appear to be an important molecular component for this action invigoration. However, the deficits in action initiation observed with complete loss of striatal dopamine are more severe than those seen in these mice [89, 90]. Thus, other dopamine-activated processes are important for self-initiated movement. Identifying these mechanisms and how they interact with the iMSN-D2Rs will be important for understanding the severe akinesia in disorders such as in PD.

Funding and disclosure

This work was supported by Division of Intramural Clinical and Biological Research of the National Institute on Alcohol Abuse and Alcoholism, ZIA AA000416 (DML), National Institute of Diabetes and Digestive and Kidney Diseases, DK075096 (AVK), NARSAD, YI 27461 (AVK), and in part by K99 AA027740 (SMA). The authors declare no competing interests.