Introduction

Working memory (WM) capacity is defined as the ability to maintain and manipulate a limited number of relevant items in memory over a short period of time (Baddeley & Hitch, 1974; Cowan, 2001; Daneman & Carpenter, 1980). It varies widely among individuals and correlates with many cognitive processes, such as general fluid intelligence, problem solving, planning, or language comprehension (Engle, Tuholski, Laughlin, & Conway, 1999; Kane et al., 2004; Unsworth, Redick, Heitz, Broadway, & Engle, 2009). Recently, a few studies from the emerging field of gesture research have also related WM to the production of co-speech gestures and have shown a gesture effect on working memory—that is, an increase in WM accuracy due to gesturing (Cook, Yip, & Goldin-Meadow, 2011; Goldin-Meadow, Nusbaum, Kelly, & Wagner, 2001; Ping & Goldin-Meadow, 2010; Wagner, Nusbaum, & Goldin-Meadow, 2004). Co-speech gestures are meaningful hand movements that are semantically and temporally integrated with the speech they accompany. Importantly, they differ from sign language and other communicative hand signs in that they cannot replace speech and show a low degree of conventionalization; that is, the relationship between their meaning and form is highly dependent on the context (Kendon, 2004; McNeill, 1992).

Research studies investigating the relation of co-speech gestures and WM typically use a dual-task paradigm in which participants concurrently perform a memory task and an explanation task. For instance, Goldin-Meadow et al. (2001) measured the number of letters that participants remembered while giving an explanation to a math problem. The results showed that participants recalled more letters correctly when they were allowed to gesture during the explanation (gesture condition) than when they were not gesturing (no-gesture condition). In a similar paradigm, Wagner et al. (2004) showed improved WM accuracy using spatial patterns rather than letters as to-be-remembered items. Replacing a mathematical problem with Piagetian conservation as the explanation task, Ping and Goldin-Meadow (2010) found that the WM effect can also be observed without a visual representation of the explanation problem, suggesting that gestures may be related to internal, rather than external, processes. Finally, the results of Cook et al.’s (2011) recent study confirmed that improved WM in the dual-task paradigm is specifically related to co-speech gestures, not meaningless hand movements. In sum, these empirical findings suggest that co-speech gestures have a facilitative effect on WM performance.

In a dual-task paradigm, the execution of two tasks results in a competition for a shared and limited resource—that is, WM—and the resulting trade-offs are reflected in the accuracy on the WM task (Wagner et al., 2004). Individual differences in WM capacity relate to such trade-offs and, hence, should be expected in the effect gesturing has on WM. However, up to now, no study has addressed the question of whether the gesture effect depends on the speaker’s WM capacity. The purpose of this experiment was to test the gesture effect with a different task and to examine whether it is related to individual differences in WM. To test these hypotheses, we designed a novel gesture task and combined it with a standard operation span task that measured individual WM capacity.

Method

We tested 58 students enrolled in a 1st-year undergraduate psychology course in exchange for a partial course credit (48 females, 2 left-handed; mean age = 20.5, age range = 17–46). Participants were presented with two independent tasks: first, an operation span task to assess working memory capacity and then a gesture task.

Operation span task

On the basis of Unsworth, Heitz, Schrock and Engle’s (2005) task, on each trial, participants were presented with a series of simple mathematical equations and were asked to solve them. Each equation was followed by the presentation of a possible solution. Participants were asked to indicate whether the proposed solution was correct or not by pressing one of two buttons. Each solution was followed by a single letter, which the participants were asked to retain for later recall. After five, six, or seven sequences of equations, possible solutions, and letters, participants were asked to verbally recall as many letters as possible in the correct order. Each participant solved 3 practice trials and 30 experimental trials (10 for each WM load of five, six, or seven letters).

Gesture task

Participants first saw an equation in the form of “3 4→7 12” and were asked to judge whether it was correct or not (see Fig. 1). An equation was correct if the addition of the two numbers on the left equaled the first number on the right and if the multiplication of the two numbers on the left equaled the second number on the right. On each trial, equations were randomly selected from a set containing 15 correct and 15 incorrect equations. Participants were given feedback about the accuracy of their decision. Then a series of letters was presented at the rate of one letter per second, and participants were asked to remember the letters. The number of letters was randomized across trials, with the amount of presented letters ranging from five to seven. Participants were then presented with the same equation as in the beginning of the trial and were asked to explain their previous decision about whether the equation was correct or false. Participants were instructed to use a formulaic explanation and say, for example, “Three plus four is seven, and three times four is twelve” or “Three plus four is not eight, but three times four is twelve.” At the beginning of the experiment, participants were randomly assigned to one of two groups: gesture or no gesture. Participants in the gesture group were asked to point to the numbers during their explanation, whereas participants in the no-gesture group were asked to keep their hands stationary on the response pad. Since the effect of gesturing on WM is not dependent on the instruction not to gesture but on whether participants gesture or not (Goldin-Meadow et al., 2001), we felt comfortable in directly instructing our participants not to gesture. Our reasons for choosing pointing as the prototypical gesture in this experiment were threefold: First, to reduce the heterogeneity of gesturing across individuals, since different gesture types and their features are processed differently and might thus yield different effects; second, because the majority of gestures produced spontaneously in a similar task have been pointing gestures (Wagner et al., 2004); and finally, because pointing was the predominant type of gesture in a pilot study we conducted among 47 individuals, using the paradigm described in Goldin-Meadow et al., discussed below. After having explained their decision, participants pressed a button and were asked to verbally recall the letters in the correct order. Participants performed 3 practice trials and 30 experimental trials (10 for each set size of five, six, or seven letters).

Fig. 1
figure 1

Gesture task. Participants judge an equation. After a short feedback, they are shown five, six, or seven letters. They explain their prior judgment and either point at the numbers (gesture) or not (no gesture). Finally, they verbally recall the letters

We chose an independent measures design as a consequence of a pilot study, which we conducted with 47 undergraduate students in an attempt to replicate Goldin-Meadow et al.’s (2001) findings. Using an identical 2 × 2 repeated measures design with gesturing (gesture, no gesture) and WM load (low = 2 letters, high = 6 letters) as variables, we failed to replicate the study but found that a significantly large number of participants showed reduced accuracy and even confusion as a direct effect of switching between the gesture and no-gesture conditions. This causes a problem, since task-switching costs have been shown to reduce WM accuracy (Liefooghe, Barrouillet, Vandierendonck, & Camos, 2008). Since our experiment was intended to determine whether there exist individual differences in the interplay of gesturing and WM, we decided to reduce the potential for switching costs at the expense of statistical power by using an independent measures design. To minimize the possibility of introducing confounding variables, we randomly assigned each participant to a group and ensured that both groups were comparable with respect to education, age, gender, and handedness, as well as to performance on the operation span task.

Scoring

Each individual score was calculated as the proportion of the maximum possible score in that trial—that is, the number of correctly recalled letters divided by the number of total letters presented per trial. A letter was considered correctly recalled if it was recalled in the correct position (e.g., in a set of R–T–K–P, we considered the response R–L–K–S to include the two correctly recalled letters R and K; Conway et al., 2005). This method yields a number between 0 and 1 and can be read as the percentage of correctly recalled letters. As a measure of an individual’s WM capacity, we chose the highest consecutive number of letters correctly recalled by each participant in the operation span task (e.g., in a set of R–T–K–P–L, we considered the response R–T–K–S–L to consist of three correctly recalled consecutive letters R, T, and K).

Results

Operation span task

Data from 1 participant were excluded from all further analyses because they were three standard deviations below the group mean. Across all three WM loads, participants scored .62 (SD = .17) on average; that is, they recalled 62 % of all letters. The average score was .77 (SD = .19) for a set size of five letters, .63 (SD = .22) for six letters, and .51 (SD = .19) for seven letters. Assessing individual WM capacity, participants remembered a mean of 6.26 (SD = 0.77) consecutive letters in the correct order across all set sizes, resulting in an average WM capacity of six letters.

Gesture task

Our results did not show a significant difference in recall accuracy between the gesture and no-gesture groups (gesture, M = .33, SD = .15, N = 28; no gesture, M = .27, SD = .12, N = 29), F(1, 55) = 2.91, p = .09. Explanation times did not differ significantly between the gesture (M = 6.89 s, SD = 1.01 s) and no-gesture (M = 7.11 s, SD = 1.4 s) conditions, F(1, 55) = 0.5, p = .48. Similarly, recall times did not differ significantly between the gesture (M = 12.18 s, SD = 2.96 s) and no-gesture (M = 12.06 s, SD = 2.04 s) conditions, F(1, 55) = 0.03, p = .86.

Gesture task and WM capacity

The relationship between individual differences in WM capacity and the gesture effect on WM was investigated using a generalized linear model. Both WM capacity (four, five, six, and seven letters) and gesture conditions (gesture, no gesture) were modeled as predictors for the performance on the gesture task as measured by the continuous dependent variable gesture score—that is, the proportion of the maximum possible number of remembered letters across conditions. Our results show significant main effects of the goodness of fit for WM capacity, χ 2(3) = 13.56, p < .01, and gesture condition, χ 2(1) = 4.125, p = .04. To further investigate the relation between load and capacity in our data, we distinguished between individuals with different WM capacities, using a categorical measure because it showed broad differences between groups. We used the median WM capacity to divide the participants into two groups and categorized participants who remembered six or fewer consecutive letters as low-capacity individuals (N = 32) and participants who remembered seven letters as high-capacity individuals (N = 25). A comparison between the gesture and no-gesture conditions for both high- and low-capacity individuals showed a significant difference for the low-capacity individuals. Low-capacity individuals who gestured remembered significantly more items (M = .31, SD = .15, N = 16) than did low-capacity individuals who did not gesture (M = .21, SD = .07, N = 16), t(30) = 2.318, p = .03. On the other hand, high-capacity individuals who gestured (M = .35, SD = .15, N = 12) did not outperform those who did not gesture (M = .34, SD = .12, N = 13), t(23) = 0.302, p = .77 (see Fig. 2).

Fig. 2
figure 2

Differences between gesture and no-gesture conditions for low and high working memory capacity groups. Gesture score refers to the average proportion of the maximum possible remembered letters across conditions

To test whether the difference between the two groups was an effect of WM capacity, we also investigated whether the accuracy in the gesture condition differed between high- and low-capacity individuals. Our comparison did not show a significant difference between the low- and high-capacity individuals in the gesture condition, t(26) = −0.782, p = .44. In the no-gesture condition, however, we found a significant difference between high- and low-capacity participants, t(27) = −3.216, p < .01.

Discussion

The objective of this experiment was to test for individual differences in the interaction between gesturing and WM that has been observed in several studies (Goldin-Meadow et al., 2001; Wagner et al., 2004). Our results show that individual differences in WM capacity determine whether gestures have an effect on WM. First, by applying a generalized linear model, we found significant main effects for WM capacity and for whether individuals gestured or not. Second, using a median split, we found that only low, but not high, WM capacity individuals showed significant differences between the gesture and no-gesture conditions; that is, only low-capacity participants who were instructed not to gesture performed worse than low-capacity participants who were allowed to gesture. There was no significant difference between low- and high-capacity participants when they were allowed to gesture. These data show that, in general, gestures have an effect on WM, but only for low WM capacity individuals. An alternative view of the findings is that they reflect an interaction between capacity and load such that low-capacity individuals are affected by lower loads than are high-capacity individuals. According to this view, the maximum load in our experiment was at or even below the capacity of the high-load individuals, and hence, no gesture effect was observed for that group. This view assumes that such an effect could have been observed under a higher maximum load and that the effect of gestures is sensitive not to capacity but to a “load capacity differential”—that is, the load in relation to an individual’s capacity. However, without further testing to support this interpretation, it is parsimonious to assume an effect of gestures on WM for low-capacity individuals.

The nature of the effect of gestures on WM is still unclear. There are two accounts that differ in that they attribute the effect either to the encoding and maintenance of information in WM or to the executive control of attention and the access to information in WM. Cook and colleagues (2011) argued that the production of gestures reduces WM load by either restructuring or externalizing information. In this view, WM capacity is increased by adding gestures to speaking, because they provide additional resources for “offloading” of information. In contrast, Hostetter and Alibali (2008) argued that the inhibition of gestures diverts attention away from rehearsal toward cognitive control of movement and, thereby, increases WM load. According to this interpretation, WM capacity is reduced because the inhibition of a prepotent response such as gesturing interferes with attention and the management of the access to information.

The available evidence seems to support the view of Hostetter and Alibali (2008). Following the original finding of Goldin-Meadow et al. (2001) that participants remembered more letters when they were gesturing than when they were not gesturing during the explanation, further studies have shown that this effect is independent of the type of to-be-remembered items (spatial patterns vs. letters; Wagner et al., 2004) or the presence of a visual representation of the problem (Ping & Goldin-Meadow, 2010). This suggests that the effect is general and does not relate on a specific form of encoding or maintenance of information. Finally, Cook et al.’s (2011) recent study showed that the effect is specifically related to co-speech gestures, and not to meaningless hand movements. Co-speech gestures, but not meaningful hand movements, can be seen as prepotent responses,—that is, behavior that is typically or more likely to be executed in relation to a cue. Numerous studies have shown that WM is related to the ability to inhibit behavioral responses. Low-capacity individuals are less able to inhibit a contextually inappropriate prepotent behavioral response than are high-capacity individuals (Kane, Bleckley, Conway, & Engle, 2001; Redick, Calvo, Gay, & Engle, 2011; Unsworth, Schrock, & Engle, 2004). In addition, the ability to inhibit responses is reduced when WM load is increased (Grandjean & Collette, 2011; Hester & Garavan, 2005; Lawrence, Myerson, & Abrams, 2004; Mitchell, Macrae, & Gilchrist, 2002; Theeuwes, Belopolsky, & Olivers, 2009). Taken together, these findings suggest that gestures interact with the executive control of attention, rather than the encoding or maintenance of information. This view receives further support from studies showing that the kinds of gestures that our participants used—that is, pointing—are used to direct attention (Bangerter, 2004; Gregory & Hodgson, 2012; Tomasello, Carpenter, & Liszkowski, 2007). However, WM capacity might be the result of a complex interaction between information retention and attention control (Unsworth & Engle, 2007). Consequently, further research is necessary to assess the mechanism that underlies the interaction of co-speech gesturing and WM.

In conclusion, our results do show that hand movements and WM are not controlled by separate systems and that memory and attention are intimately related to sensorimotor interaction. In general, our findings provide evidence for an important role of the body in higher cognition.