Everyday learning is often self-regulated. For example, students preparing for an exam decide when, how, and how long to study each topic. Research on metacognition has paid a great deal of attention to the ways in which people allocate study time (ST) during learning. This work has shown a close link between learners’ predictions of their own future memory performance (judgments of learning; JOLs) and time allocation (e.g., Metcalfe & Finn, 2008).

Within a given task, one may expect that allocating more ST to an item would result in higher JOLs. Consistent with this idea, items that are awarded high incentives for recall are studied longer and receive higher JOLs than items awarded low incentives (e.g., Koriat, Ma’ayan, & Nussinson, 2006). Within incentive levels, however, items that are studied longer receive lower JOLs, resulting in inverse JOL-ST relations. Koriat et al. (2006) argued that these inverse relations are data-driven and stem from variability in item difficulty: For each item, ST is determined in a bottom-up manner by the individual learner’s memorizing effort or processing fluency, which people use as a basis for JOLs (see also Undorf & Erdfelder, 2011).

Some findings suggest that data-driven ST allocation alone cannot explain JOL-ST relations. First, while data-driven ST allocation predicts strong inverse JOL-ST relations, correlations are often moderate (e.g., −.42, Koriat et al., 2006, Exp. 1; −.30 or less, Undorf & Erdfelder, 2013). Second, while data-driven ST allocation predicts linear decreases in ST with increasing JOLs, a few studies reported inverted U-shaped curvilinear JOL-ST relations (Little & McDaniel, 2015; Mazzoni & Cornoldi, 1993; Metcalfe & Kornell, 2005; Pelegrina, Bajo, & Justicia, 2000). For instance, Undorf and Erdfelder (2013), Experiment 1 asked undergraduates to self-pace their study of related and unrelated word pairs for a cued recall test and provide a JOL for each item. When we reanalyzed the data, ST was similar across the lowest JOL levels and decreased with increasing JOLs thereafter (Fig. 1).

Fig. 1
figure 1

The graph shows means of median study time (in seconds) for ten judgment of learning (JOL) levels (0–10, 11–20, …, 91–100) found by Undorf and Erdfelder (2013), Exp. 1. Numbers near the graph represent mean percentages of items per JOL level. Error bars represent one standard error of the mean. The table below the graph presents means (and standard deviations) of recall performance (% correct)

Curvilinear patterns may be artifacts of aggregation that occur when each individual’s JOL-ST relation is linear negative, but some individuals spend little ST and make low JOLs. To test this possibility, we fitted a multilevel regression model to Undorf and Erdfelder’s (2013) data (see ‘Procedure’ section of Experiment 1 for details). Results revealed that JOL-ST relations were curvilinear at the individual level.

As explained above, data-driven ST allocation predicts inverse linear JOL-ST relations and cannot account for curvilinearity as depicted in Fig. 1. Consequently, curvilinear JOL-ST relations may inform theories of top-down strategic ST allocation. According to the influential Discrepancy Reduction Model (Nelson & Narens, 1990), people study each item until its JOL meets a preset target level. Because people stop studying irrespective of invested ST, this model predicts that JOLs and ST are unrelated. Hence, the discrepancy reduction model cannot account for curvilinear JOL-ST relations.

According to the Region of Proximal Learning Model (RPL; Metcalfe & Kornell, 2005), people stop studying each item when the perceived rate of learning falls below a stopping criterion. When items are extremely difficult, this occurs very quickly. Thus, learners stop quickly and make very low JOLs (see Fig. 2, short ST). This results in inverted U-shaped JOL-ST relations with shorter ST for items with lowest JOLs than with moderately low JOLs.

Fig. 2
figure 2

Mean z-transformed study time for six judgment of learning (JOL) levels for participants with short and long overall study time (ST), reported by Metcalfe and Kornell (2005). Participants with short overall ST stopped quickly on the most challenging items. This produced inverted U-shaped JOL-ST relations with shorter ST for items with lowest JOLs than for items with moderately low JOLs. In contrast, participants with long overall ST did not stop quickly, resulting in linear negative JOL-ST relations. From “A region of proximal learning model of study time allocation,” by J. Metcalfe and N. Kornell, 2005, Journal of Memory and Language, 52, Fig. 6, p. 473. Copyright 2005 by Elsevier Inc.

The Diminishing Criterion Model (DCM, Ackerman, 2014) was proposed to explain negative correlations between problem-solving time and confidence (e.g., Koriat et al., 2006). Like the discrepancy reduction model, the DCM assumes that people aim at a target level of confidence. Over time, however, they are willing to compromise and provide responses accompanied by lower confidence. According to the DCM, curvilinear JOL-ST relations are due to people setting a time limit for each item (see Fig. 3). If an item hits the time limit, people stop studying and make a rather low JOL. ST thus reaches a plateau at the time limit for items with lowest and moderately low JOLs, resulting in curvilinear JOL-ST relations.

Fig. 3
figure 3

The Diminishing Criterion Model proposed by Ackerman (2014). Panel A was adapted from “The Diminishing Criterion Model for metacognitive regulation of time investment,” by R. Ackerman, 2014, Journal of Experimental Psychology: General, 143(3), 1349–1368, Fig. 2, p. 1351. Copyright 2013 American Psychological Association. Panel B represents the model with switched axes in order to be consistent with the other figures in the present article. Circles represent hypothetical confidence ratings for Items A, B, C, and D. The thick line shows how the stopping criterion diminishes as the time limit approaches. The dashed line represents a time limit—the maximum time the participant is willing to invest in an item, regardless of the level of confidence. Thin lines show how confidence for Items B and C changes over time until the stopping criterion is reached

In sum, both the RPL and the DCM predict curvilinear JOL-ST relations as presented in Fig. 1. However, the two models make different predictions for the exact shape of the curve and the effects of varying motivation on JOL-ST relations. As explained above, the RPL attributes curvilinearity to stopping quickly on the most challenging items. Therefore, ST should be shorter for items with lowest JOLs (quick stops) than for items with moderately low JOLs (slow stops, Metcalfe, 2009). By this model, high motivation increases the willingness to invest ST despite a slow learning progress. Thus, curvilinearity in JOL-ST relations should decrease with increasing motivation. Indeed, Metcalfe and Kornell (2005, Fig. 2) found curvilinear JOL-ST relations among participants who worked quickly, but not among those who worked slowly. The authors concluded that only highly motivated learners tried to master the most challenging items despite perceiving slow progress. The DCM, in contrast, attributes curvilinearity to similar ST across items with lowest and moderately low JOLs that hit the time limit (Fig. 3B). By the DCM, high motivation raises the time limit so that ST increases equally for items with lowest and moderately low JOLs. Importantly, this should leave curvilinearity unaffected.Footnote 1 In the following experiments, we manipulated motivation with three methods and examined which model best described the pattern of JOL-ST relations.

Experiment 1

In Experiment 1, we tested whether high motivation would reduce curvilinearity as predicted by the RPL or would leave curvilinearity unaffected as predicted by the DCM. We therefore awarded different incentives to the recall of different items (cf. Koriat et al., 2006). Based on Koriat et al.’s (2006) findings, we expected higher JOL, ST, and recall for high-incentive items than for low-incentive items.

Method

Participants and materials

Participants were 50 University of Mannheim undergraduates. Stimuli were 60 pairs of German 4–10 letter nouns, half of which were related and half of which were unrelated. Four additional practice pairs were excluded from the analyses.

Procedure

Participants were told that the experiment was about studying topics of different importance. For practice, participants studied four stories that were either more (5 points) or less (1 point) important (cf. Koriat et al., 2006). Participants then studied the word pairs one at a time. They were instructed to allocate their time so that they could recall the target words while spending as little ST as possible. For each participant, 5 points were awarded to 20 randomly chosen pairs (ten related and ten unrelated) and 1 point was awarded to the remaining pairs. At study, related and unrelated 1-point and 5-point pairs were randomly intermixed. On each trial, the incentive value appeared on the screen 2 s before the pair appeared and both remained until participants clicked a button. Immediately afterwards, the JOL prompt “Chance to recall (0–100%)?” and a percentage scale appeared. Participants indicated their JOL by clicking the scale. After study, they worked on a 90-s filler math task and then took a cued recall test in which cue words were presented in a random order and participants typed targets.

Results and discussion

Table 1 presents a summary of results (for all experiments, correlations between ST, JOL, recall, and item difficulty are shown in the Appendix). Replicating Koriat et al. (2006), ST, JOL, and recall were greater for 5-point than for 1-point items. Figure 4 reveals curvilinear JOL-ST relations in both incentive levels.

Table 1 Means (and standard deviations) of study time (ST), judgment of learning (JOL), and memory performance (% correct), separately for each condition of Experiments 13
Fig. 4
figure 4

The graphs show means of median study time (in seconds) for 10 judgment of learning (JOL) levels (0–10, 11–20, …, 91–100) for 1-point and 5-point items in Experiment 1. Numbers near the graphs represent mean percentages of items per JOL level. Error bars represent one standard error of the mean. The table below the graphs presents means (and standard deviations) of recall performance (% correct)

To investigate JOL-ST relations, we fitted a multilevel regression model (level 1: items, level 2: participants) using the R packages lme4 and lmerTest (Bates, Maechler, & Bolker, 2015; Kuznetsova, Brockhoff, & Christensen, 2015; R Core Team, 2015). Prior to the analyses, we log-transformed ST to reduce the skewness of the ST distributionFootnote 2 and we centered JOL at the group mean (see Cohen, Cohen, West, & Aiken, 2003). In two separate regression models, one for each incentive level, we regressed ST on JOL and JOL2.The linear predictor JOL denotes the overall slope of the JOL-ST relation, with a negative regression coefficient indicating inverse linear JOL-ST relations. The quadratic predictor JOL2 denotes curvilinear JOL-ST relations, with a negative coefficient indicating an inverted U-shaped relation, a positive coefficient indicating a U-shaped relation, and an insignificant coefficient indicating that there is no curvilinearity. Including both predictors in each model allows examining slope (JOL) and curvilinearity (JOL2) separately, while holding the other predictor constant. Random intercepts were specified for items and participants, allowing for variance in intercept across items and participants.

Table 2 presents regression results. Negative effects of JOL revealed negative JOL-ST slopes in both incentive levels. Negative effects of JOL2 showed that JOL-ST relations were curvilinear in both incentive levels. Allowing for variance in slope and curvilinearity across participants through including random effects for JOL and JOL2 did not improve model fit, both χ2s < 1.

Table 2 Fixed effects regression estimates (and standard errors) for Experiments 13

To test for differences between 5-point and 1-point items in slope and curvilinearity, we predicted ST in both incentive levels in a single regression model that included condition and condition interaction terms for JOL and JOL2. Nonsignificant interactions of incentive level with JOL, t(2906) = 1.49, p = .137, and JOL2, t(2901) = 1.90, p = .057, revealed that neither slope nor curvilinearity differed across incentive levels. Notably, the marginal difference in curvilinearity was in the opposite direction to that predicted by the RPL. To further examine the lack of a curvature difference across incentive levels, we compared the full model described above to a null model without interaction terms using Bayesian analyses (R package BayesFactor; Morey & Rouder, 2015). Results revealed a posterior probability of > .99 for the null model, indicating very strong evidence in favor of equal curvilinearity across incentive levels (cf. Raftery, 1995).

To explore ST for items with lowest and moderately low JOLs, JOLs were grouped into bins of 20%. We compared ST for items from the two lowest bins (JOL of 20 or below vs. JOL from 21 to 40; Table 3). A 2 (incentive level) × 2 (JOL bin) within-participant ANOVA revealed that 5-point items were studied longer than 1-point items, F(1, 27) = 7.69, p = .010, ηp 2 = .22. Neither the main effect of JOL bin, F(1, 27) = 2.59, p = .119, nor the interaction, F < 1, were significant. Thus, ST was similar for items with lowest and moderately low JOLs.

Table 3 Means (and standard deviations) of study time (in seconds) for items from judgment of learning (JOL) bins 1 (JOLs of 20 or below) and 2 (JOLs from 21 to 40)

In sum, equally curvilinear JOL-ST relations and similar ST across items with low JOLs in both incentive levels supported the DCM but not the RPL.

Experiment 2

Experiment 1 revealed that awarding high incentives to a subset of items did not affect curvilinearity. Experiment 2 tested whether high motivation would impact curvilinearity in a between-participants design. This prevents carryover effects between incentive levels and approximates motivational differences across participants as observed by Metcalfe and Kornell (2005).

In the standard condition, we emphasized both speed and accuracy of learning, which is common in self-paced learning studies (e.g., Koriat et al., 2006; Mazzoni & Cornoldi, 1993). In the accuracy condition, participants were instructed to attain complete mastery of all items and were allowed unlimited time to do so (e.g., Nelson & Leonesio, 1988). Nelson and Leonesio (1988) found that accuracy instructions increased ST but affected neither metamemory judgments nor recall. We therefore expected higher ST in the accuracy than in the standard group.

Method

Participants and materials

Sixty-two University of Mannheim undergraduates were randomly assigned to the standard group (n = 30) or to the accuracy group (n = 32). Stimuli were the same as in Experiment 1.

Procedure

For the standard group, the procedure was identical to Experiment 1 except that there was no practice phase or incentive values. The accuracy group was instructed to study hard and to spend as much time as needed to master each pair. Participants were informed that they would receive five points for each correctly recalled target and that prizes (€30.00, €20.00, and €10.00) would be awarded to the three best participants.

Results and discussion

The accuracy group studied longer than the standard group, while JOLs and recall did not differ across groups (Table 1). Figure 5 reveals curvilinear JOL-ST relations in both groups.

Fig. 5
figure 5

The graphs show means of median study time (in seconds) for ten judgment of learning (JOL) levels (0–10, 11–20, …, 91–100) for the standard and accuracy groups in Experiment 2. Numbers near the graphs represent mean percentages of items per JOL level. Error bars represent one standard error of the mean. The table below the graphs presents means (and standard deviations) of recall performance (% correct)

As in Experiment 1, we regressed ST in each group on JOL and JOL2. Results revealed negative JOL-ST slopes and curvilinear JOL-ST relations in both groups (Table 2). Allowing for variance in slope and curvilinearity across participants did not improve model fit, χ2 standard < 1, χ2(2)accuracy = 1.80, p = .407. When testing for differences across groups, a significant condition × JOL interaction, t(3622) = 4.76, p < .001, revealed a more pronounced JOL-ST slope in the accuracy group. A nonsignificant condition × JOL2 interaction, t(3619) = 1.49, p = .137, demonstrated that curvilinearity did not differ across groups. Bayesian analyses indicated very strong evidence in favor of equal curvilinearity across groups, posterior probability of a null model without condition interaction terms > .99.

ST for the two lowest JOL bins (Table 3) was submitted to a mixed 2 (condition) × 2 (JOL bin) ANOVA. Results revealed longer ST in the accuracy group than in the standard group, F(1, 45) = 6.92, p = .012, ηp 2 = .13. Neither the main effect of JOL bin, F(1, 52) = 1.63, p = .208, nor the interaction, F < 1, were significant.

In sum, the between-participant manipulation of motivation increased ST but not JOLs or recall. Replicating Experiment 1, equally curvilinear JOL-ST relations and similar ST across items with low JOLs in both groups support the DCM but not the RPL.

Experiment 3

It may be premature to conclude that the DCM best explains curvilinear JOL-ST relations. In the previous experiments, differences in item difficulty may have been too subtle for participants to quickly identify the most challenging items. Moreover, prior research suggested that strategy changes are particularly likely when the level of motivation is decreased in comparison to a standard condition (Ackerman, 2014) and that time pressure decreases motivation (e.g., Thiede & Dunlosky, 1999). Therefore, in Experiment 3, we used materials with a wider range of difficulty and examined the effects of time pressure on JOL-ST relations. Finally, we used a different population in order to verify the generalizability of our results.

In the time pressure condition, we imposed a severe constraint on the overall study time available. The RPL predicts that time pressure promotes stopping quickly when the perceived learning progress is slow (i.e., raises the stopping criterion) and thus increases curvilinearity in JOL-ST relations. In contrast, the DCM predicts that time pressure lowers the time limit. If the time limit remains considerably above the time invested in high JOL items, curvilinearity remains unchanged. Alternatively, if the time limit gets close to the time invested in high JOL items, ST variability and curvilinearity decrease.

Method

Participants and materials

Sixty Technion undergraduates were randomly assigned to the standard group or to the time pressure group. Stimuli were 60 item-category pairs with very large differences in difficulty used before by Ackerman (2014), Experiment 1. The easiest pairs were known concepts and categories with known associations (e.g., CUBA–ISLAND). The most difficult items were unknown concepts associated with unknown categories (e.g., ERUCARIA–CRUCIFERAE). Two additional pairs were used for practice.

Procedure

The procedure was identical to that of Experiment 2 with the following exceptions. The standard group was instructed to allocate their time (30 min) as they wished. The time pressure group was told that they had only 6 min for study. Instructions explained that they must manage their time efficiently and study about 12 items per minute. The 6-min time frame corresponded to 75% of mean overall ST in Ackerman’s (2014) Experiment 1. The experimenter indicated when 3 min and 1 min remained. Before the cued recall test, participants worked on a 60-s filler task (numeric puzzle).

Results and discussion

The time pressure group studied shorter than the standard group, while JOLs and recall did not differ between groups (Table 1). Figure 6 reveals curvilinear JOL-ST relations in both conditions. Notably, JOL-ST relations in the standard groups of Experiment 3 and Experiment 2 (see Fig. 5) were very similar despite differences in materials and populations.

Fig. 6
figure 6

The graphs show means of median study time (in seconds) for ten judgment of learning (JOL) levels (0–10, 11–20, …, 91–100) for the standard and time pressure groups in Experiment 3. Numbers near the graphs represent mean percentages of items per JOL level. Error bars represent one standard error of the mean. The table below the graphs presents means (and standard deviations) of recall performance (% correct)

As before, we regressed ST in each group on JOL and JOL2. Results revealed negative JOL-ST slopes and curvilinear JOL-ST relations in both groups (Table 2). Allowing for variance in slope and curvilinearity across participants did not improve model fit, χ2 standard < 1, χ2(2)time pressure = 3.14, p = .208. When testing for differences across groups, a significant condition × JOL interaction, t(3527) = 4.83, p < .001, demonstrated a reduced JOL-ST slope in the time pressure group. A significant condition × JOL2 interaction, t(3543) = 2.83, p = .005, demonstrated attenuated curvilinearity in the time pressure group. Bayesian analyses indicated positive evidence for attenuated curvilinearity under time pressure, posterior probability of the full model with condition interaction terms = .86.

ST for the two lowest JOL bins (Table 3) was shorter in the time pressure group than in the standard group, F(1, 52) = 18.42, p < .001, ηp 2 = .26. Neither the main effect of JOL bin, F < 1, nor the interaction, F(1, 52) = 2.56, p = .115, were significant.

In sum, time pressure reduced ST but not JOLs or recall. Decreased curvilinearity under time pressure and similar ST across items with low JOLs in both groups support the DCM but not the RPL.

General discussion

In this study, we addressed the puzzle of time allocation for the most challenging items in learning tasks. Our starting points were moderately negative correlations between JOL and ST that have been often reported in the literature (e.g., Ackerman, 2014; Koriat et al., 2006; Undorf & Erdfelder, 2013) and the less common finding of inverted U-shaped relations between JOL and ST (e.g., Metcalfe & Kornell, 2005).

We compared two models (RPL and DCM) that may account for curvilinear JOL-ST relations and make different predictions about how motivation affects curvilinearity. Three experiments revealed robust curvilinear JOL-ST relations. Neither increasing motivation to master a subset of items (Experiment 1) nor manipulating motivation between subjects (Experiment 2) affected curvilinearity. Severe time pressure decreased, but did not eliminate curvilinearity (Experiment 3). These findings are inconsistent with the predictions of the RPL. However, they are compatible with the DCM. Importantly, this was true both when the DCM predicted unchanged curvilinearity (Experiment 1 and Experiment 2) and differences in curvilinearity (Experiment 3).

Much research has shown that, compared with extrinsic cues such as motivation and the available time, intrinsic cues such as item difficulty have stronger effects on metacognition (e.g., Koriat, 1997; Rabinowitz, Ackerman, Craik, & Hinchley, 1982). Our finding of robust curvilinear JOL-ST relations suggests that item difficulty also determines the way of ST allocation. Easy items trigger bottom-up data-driven ST allocation that is governed by processing fluency. In contrast, challenging items trigger top-down strategic ST allocation that takes extrinsic cues into account.

Finally, our results nicely converge with findings from the metacognition of problem solving (Ackerman, 2014) and reveal previously unrecognized parallels between time allocation in memorizing and more complex problem-solving tasks. It remains to be seen whether curvilinearity in judgment-time relations can also inform theorizing about metacognition in other domains, such as decision making or problem solving (meta-reasoning; Ackerman & Thompson, 2015).