Introduction

Studies on multisensory temporal perception have demonstrated that the brain corrects for small temporal asynchronies between the different senses that may arise naturally due to differences in transmission and processing time (Harris et al. 2009; Keetels and Vroomen 2009). Corrections may occur either immediately while a multisensory stimulus is being processed—as demonstrated in ‘temporal ventriloquism’ where an abrupt sound or touch ‘attracts’ the temporal occurrence of a visual flash (Scheier et al. 1999; Morein-Zamir et al. 2003; Vroomen and de Gelder 2004; Vroomen and Keetels 2006, 2009; Keetels et al. 2007; Keetels and Vroomen 2008a)—or on a larger time scale reflecting adaptive changes in synchrony perception (i.e., ‘temporal recalibration’; Fujisaki et al. 2004; Vroomen et al. 2004). Temporal recalibration has originally been demonstrated between vision and audition, but ever since it has been reported to occur in other modalities as well (visuo-tactile or visuo-motor; Sugita and Suzuki 2003; Navarra et al. 2005; Miyazaki et al. 2006; Keetels and Vroomen 2007, 2008b; Hanson et al. 2008; Takahashi et al. 2008; Vatakis et al. 2008; Haggard and Tsakiris 2009). As an example, Vroomen and Keetels exposed participants for 3 min to sound-first or light-first stimulus pairs (a tone and a flash) presented at ~100–200 ms lags. After this exposure phase to delayed flashes or tones, participants performed a temporal order judgment task (TOJ; “which came first, sound or light?”) or a simultaneity judgement task (“Simultaneous or Successive?”) about a sound/light test stimulus. The results showed that the point of subjective simultaneity (the PSS, the relative time at which the two stimuli are perceived as maximally simultaneous) was shifted towards the adapted lag. So, after adaptation to light-first exposure, sound/light stimuli in which the light came slightly before the sound were perceived as synchronous, while after sound-first exposure, sound-first stimuli were perceived as simultaneous.

The mechanism underlying temporal recalibration, though, remains at this point elusive. One option is that only the criterion for simultaneity between the adapted modalities is adjusted. As an example, after exposure to light-first sound/light pairings, participants may change their criterion for audiovisual simultaneity in such a way that light-first stimuli are taken to be simultaneous. On this view, other modality pairings (e.g., vision/touch) would be unaffected and the change in criterion should also not affect unimodal processing of visually and auditorily presented stimuli. Alternatively, it may also be the case that one modality (vision, audition, touch) is ‘shifted’ towards the other, possibly because the sensory threshold for stimulus detection in the adapted modality is changed. For example, as an attempt to perceive simultaneity during light-first exposure, participants might delay processing time in the visual modality by adopting a more stringent criterion for sensory detection. After exposure to light-first audiovisual stimuli, one would then expect slower processing times of visual stimuli in general, and other modalities pairings that involve the visual modality, say vision/touch, should also be affected. Since for the audiovisual case it is a common belief that the auditory system codes temporal information more precisely than the visual (Welch 1978), one might expect that after audiovisual lag adaptation there is a shift of vision towards audition. In line with this prediction, Harrar and Harris (2008) indeed observed that the simple reaction time to a light was increased after exposure to lights-first audiovisual pairings, whereas simple reaction time to a sound or touch was unaffected by this exposure regime. Possibly, then, participants adopted a more stringent criterion for visual detection after light-first exposure. Others, though, did not observe that the threshold for visual stimuli was adjusted, but rather that of sounds. For example, Navarra et al. (2009) exposed participants to vision-first audiovisual asynchronies and reported that participants’ simple reaction time to sounds but, critically, non-visual stimuli were changed, possibly because here the criterion for auditory detection was adjusted.

In an attempt to further examine the mechanism underlying temporal recalibration, Hanson et al. (2008) explored whether a ‘supramodal’ (a general and modality a-specific) mechanism underlies temporal recalibration by examining lag adaptation to audiovisual, audio-tactile and tactile-visual asynchronies. The data showed that a brief period of repeated exposure to ±90 ms asynchrony in any of these pairings resulted in shifts of about 70 ms of the PSS in subsequent TOJ tasks, and that the size of the shifts was similar across the three pairings. This made the authors conclude that there is single mechanism underlying temporal recalibration. Different results, though, were reported by Harrar and Harris (2008). They exposed participants for 5 min to ~100 ms lags of light-first stimuli for the audiovisual case, and touch-first stimuli for the auditory-tactile and visual-tactile case. The expected shift of the PSS in the direction of the exposed lag was only found for audiovisual exposure and audiovisual test stimuli, but no shifts—or a shift in the opposite direction—were found for test stimuli presented in other modalities or for audio-tactile and visual-tactile exposure stimuli. These results might lead one to conclude that there was only a change in criterion for audiovisual simultaneity, as other modality pairings were not affected in the predicted direction. Conflicting results, though, were obtained by Di Luca et al. (2007). They exposed participants to asynchronous audiovisual pairs (~200 ms lags of sound-first and light-first) and measured the PSS for audiovisual, audio-tactile and visual-tactile test stimuli. Besides obtaining a shift in the PSS for audiovisual pairs, the effect was found to generalize to audio-tactile, but not to visual-tactile test pairs, a pattern that made the authors conclude that adaptation resulted in a phenomenal shift of the auditory event. Taken together, it thus appears that some have obtained results compatible with a criterion shift of audiovisual simultaneity, while others obtained results that can be accounted for delays in either the auditory, or the visual modality. Clearly, then, more research is needed to understand the full pattern of results and the way temporal recalibration generalizes across the specific exposure stimuli.

Here, we further examined the mechanism underlying temporal recalibration using a motor task (i.e., tapping) rather than a purely sensory one. A motor task is interesting because active motion of a self-initiated tap not only involves the sensory feedback from the finger that touched a key or a pad, but also the plan of the motor action that is converted into a series of muscle activations which carry out the movement. A copy of that motor command—the so-called efference copy—is available to many parts of the brain long before the actual movement occurs (~250 ms, Libet et al. 1983), and this efference copy might be used to predict the timing of an action and its sensory feedback (Winter et al. 2008). As a first approximation, one might expect the timing of motor actions and their sensory feedback to be rather rigid because there is extra information available about the timing of the motor component and because sensory feedback is normally expected to occur only after motor actions are initiated. In line with this, some have argued that lag adaptation only occurs for the audiovisual case—because the relative arrival times of sound and light vary with distance—, but not for somatosensory stimuli (Miyazaki et al. 2006). Nevertheless, the ability to correctly judge motor-sensory temporal order has been demonstrated to be flexible as well (see also Cunningham et al. 2001; Stetson et al. 2006). As an example, Stetson et al. adapted participants to short delays between self-initiated key presses and subsequently delivered light flashes. After a short exposure phase to delayed flashes, participants performed a TOJ task about a tap and flash test stimulus (tap-first or flash-first?). The results showed that the PSS was shifted towards the adapted lag, consistent with previous reports on audio–visual temporal recalibration (Fujisaki et al. 2004; Vroomen et al. 2004; Keetels and Vroomen 2007; Hanson et al. 2008). In fact, in the most dramatic case, a visual flash presented at an unexpectedly short delay after a finger tap was actually perceived as occurring before the tap, an experience that runs against the law of causality. At present, though, it is still unclear whether the criterion for simultaneity between the two specific stimuli was adjusted, or whether it is the visual or motor component that was shifted towards the other one.

In the present study, we adopted a motor-sensory task to examine the generalization of temporal recalibration across modalities. Participants tapped their finger on a touch pad during an exposure phase for about 3 min. After a delay of either 50 or 150 ms following each tap, either a tone pip or a flash was presented. After exposure to these motor-auditory or motor-visual lags, a motor-visual or motor-auditory test stimulus was presented, and participants judged whether the stimulus had occurred before or after the tap. If lag adaptation affects the criterion of a specific combination of two modalities (i.e., the criterion for motor-visual or motor-auditory simultaneity), there should be no transfer to the other modality. It might also be the case that lag adaptation shifts a specific modality (e.g., a shift in audition, vision, or the motor component). If the auditory modality was shifted (when did the sound occur?), one would expect a shift of the PSS in the motor-auditory test after motor-auditory adaptation, but not for the other combination. Likewise, if only the visual modality were shifted (when did the flash occur?), one would expect a shift of the PSS in the motor-visual test after motor-visual adaptation, but not for the other case. If the motor system adapts (when did I move the finger or touch the pad?), one would expect a uniform transfer of adaptation across the motor-auditory and motor-visual test stimuli, because both involve a motor component.

Method

Participants

The three authors and two skilled participants (four male, mean age 34.6) from Tilburg University participated. All had normal hearing and normal or corrected-to-normal seeing. Four of them were right-handed.

Stimuli and apparatus

Participants sat at a desk in a dimly lit and soundproof booth looking at a CRT display at about 65 cm viewing distance. The visual stimulus consisted of a 1-cm white square (9 cd/m2) flashed for 30 ms on a black background (0 cd/m2). The auditory stimulus consisted of a 2,000 Hz pure tone pip (30 ms duration, 2 ms rise/fall slope) presented via headphones (Sony MDR-XD100) at 70 dB(A). White noise was continuously presented via headphones at 59 dB(A) to mask the sound of the taps. A custom-made touch pad was used for detecting the precise timing of the finger taps. The temporal resolution of the response device was about 1 ms as verified on a multiple trace oscilloscope.

Design

There were four within-subjects factors: The adapted modality (motor-visual, motor-auditory), the exposure lag during the adaptation phase (50 ms, 150 ms), the modality of the test stimuli (same or different as adapted), and the stimulus-onset-asynchrony (SOA) between the tap and the test stimulus (0, 50, 100, 150, and 200 ms).Footnote 1 These specific SOA values were chosen because they covered the range from ‘stimulus clearly before the tap’ to ‘stimulus clearly after the tap’. The whole test consisted of 1,000 trials with 25 repetitions for each of the 40 conditions. The adapted modality, exposure lag, and the modality of the test were all blocked, while the SOA varied randomly in a block of 125 trials. The two exposure lags were split across two consecutive days and counterbalanced for order across participants.

Procedure

An adaptation-test paradigm was used with ‘top-up’ adaptation (see Fig. 1). During adaptation, participants repeatedly tapped the index of the dominant hand on a touch pad for 240 times, trying to maintain a constant inter-tap interval of ~750 ms (total duration ~3 min). After each tap, a feedback stimulus (a flash or a tone) was presented at a constant lag of either 50 or 150 ms. These values were chosen because the tap-flash and tap-tone pairings were still perceived as a single event, and they were expected to elicit quantifiable adaptive shifts. To ensure that participants attended the feedback stimulus, they had to count the occasional occurrence (1–5 times) of a deviant stimulus (a red square during visual adaptation, and a high tone of 2,250 Hz during auditory adaptation). Participants were questioned at the end of the adaptation phase about the number of deviant stimuli.

Fig. 1
figure 1

Adaptation-test paradigm. One block consisted of adaptation phase and test phase. Participants were exposed to a constant time lag (50 or 150 ms) between their voluntary tap and its feedback stimulus (flash or tone pip) in the adaptation phase. Immediately after that, they repeated the TOJ task for two tap-feedback pairs with five “top-up” adaptation pairs. a Within-modality adaptation. Adapted to motor-visual pair or motor-auditory pair, then tested to the same pair as adapted (e.g., Visual–Visual or Auditory–Auditory, respectively). b Cross-modality adaptation. Adapted to motor-auditory pair or motor-visual pair, then tested to the different pair as adapted (e.g., Auditory–Visual or Visual–Auditory, respectively) 254 × 190 mm (96 × 96 DPI)

Immediately after adaptation, testing started. A test trial consisted of five “top-up” tap-feedback pairs using the same lag as in the adaptation phase and—after a short delay varying between 850 and 1,250 ms and as signaled by the fixation cross becoming bright—participants made two taps (at an intertap interval of ~750 ms), each accompanied by a critical flash (-or—depending on condition—a tone) presented at one of the five SOAs relative to each tap. Participants then judged whether the two final sound or light stimuli had occurred before or after the two taps. The unspeeded response was made by pressing one of two buttons on a special keyboard with the non-dominant hand. Note that we used two taps rather than a single one as test stimulus because the two ‘shots’ increase sensitivity for temporal order, thus lowering JNDs and reducing noise (Morein-Zamir et al. 2004). After the response, the next top-up/test stimulus was presented. Each block of 125 trials took about 20 min with a short break after 65 trials.

To acquaint participants with the procedures, experimental trials were preceded by a practice session for tapping at a constant pace of ~750 ms. Participants were trained for ~5 min to maintain a constant tap interval as induced via an auditory pacer signal. The intertap interval between two consecutive taps was also shown continuously on the screen, and participants tried to keep it at 750 ms. Practice then continued with TOJ trials in which only the extreme SOAs were presented (0 and 200 ms).

Results

Trials of the training session were excluded from further analysis. Performance on the catch trials in the adaptation phase was completely flawless, except for one participant who missed a single catch trial. Participants were thus indeed looking at the light or listening to the sound during the exposure phase. The average inter-tap interval in the adaptation phase was 672 ms, which was somewhat faster than participants were originally trained on, but there was no correlation between tapping speed and the amount of temporal recalibration (r xy = −0.408, p = 0.50), and tapping speed as such was therefore not further analyzed.

The individual proportion of ‘tap-first’ responses was calculated as a function of the SOA for each condition, and the sigmoid function \( F(x) = 1/(1 + \exp [ - (x - \mu )/\sigma ]) \) was then fitted. The mean of the resulting distribution (μ: the interpolated 50% crossover point) was taken as the point of subjective simultaneity (PSS), and the standard deviation (σ) as the just noticeable difference (henceforth JND) representing the interval at which 27 and 73% tap-first response was given, which is a standard measure. The group-averaged data are shown in Fig. 2 and Table 1. The JNDs and the PSSs of the authors were compared with the non-authors. The non-authors tended to have slightly better JNDs, but none of the comparisons was significant (all p’s at least >0.08). Temporal recalibration was expected to manifest itself as a shift of the PSS in the direction of the exposure lag, and the temporal recalibration effect (TRE) was computed by subtracting the PSS following exposure to the 150-ms lag from the 50-ms lag.

Fig. 2
figure 2

Averaged psychometric functions of “tap-first” response for each combination of the adapted modality, the modality of the test and the exposure lag (50 ms: solid line, 150 ms: dashed line) across participants (N = 5). a Within-modality adaptation. Adapted to motor-visual pair or motor-auditory pair, then tested to the same pair as adapted (e.g., V–V or A–A, respectively). b Cross-modality adaptation. Adapted to motor-auditory pair or motor-visual pair, then tested to the different pair as adapted (e.g., A–V or V–A, respectively). The mean observed proportions of “tapfirst” responses were also displayed (50 ms: filled circle, 150 ms: empty circle). The PSS shift by the lag exposure (e.g., from 50 ms lag to 150 ms lag) is depicted as “TRE” (temporal recalibration effect) 173 × 173 mm (600 × 600 DPI)

Table 1 Mean points of subjective simultaneity (PSSs) and just noticeable differences (JNDs) in ms

As is clearly visible, exposure to the 150-ms lag indeed shifted the PSS in the predicted direction if compared to the 50-ms lag and—most importantly—this shift was uniform across conditions. This generalization was confirmed in an ANOVA on the PSSs and JNDs with as within-subjects factors adapted modality, exposure lag, and modality of test. In the ANOVA on the PSSs, only the main effect exposure lag was significant, F (1, 4) = 14.21, p = 0.02 indicating that the PSS was shifted by 29 ms (a 29% shift) in the direction of the lag. The effects of adapted modality, F (1, 4) = 1.68, p = 0.27, modality of the test, F (1, 4) = 1.01, p = 0.37, and all interactions were non-significant.

In the ANOVA on the JNDs, none of the main effects was significant: adapted modality, F (1, 4) = 2.04, p = 0.23, modality of test, F (1, 4) = 1.68, p = 0.27, exposure lag, F(1, 4) = 0.04, p = 0.85. There was a tendency that JNDs were slightly worse in motor-visual adaptation followed by motor-visual test,—possibly reflecting lesser temporal accuracy in the visual system—, but the interaction between the adapted modality and modality of the test was non-significant, F (1, 4) = 1.84, p = 0.25. All other interactions were also non-significant.

Discussion

Here we demonstrate that exposure to a voluntary action (a finger tap) and a delayed auditory or visual feedback stimulus that is associated with this action induces a shift in the subjective temporal order of both the auditory and visual event. Presumably, temporal delays were adjusted during recalibration so that the two signals moved toward simultaneity because events appearing at a consistent delay after motor actions are interpreted as consequences of those actions. The brain then recalibrates timing judgments to make them consistent with a prior expectation that sensory feedback will follow motor actions without delay. As reported before, flashes at unexpectedly short delays after a finger tap were consistently perceived as occurring before the tap (Stetson et al. 2006). This finding might—in isolation—be explained by assuming that participants had adjusted their criterion for motor-visual simultaneity. However, our study demonstrates that the same phenomenon occurs with tones, and—most importantly—that the effect generalizes across modalities as equivalent shifts were obtained when participants were tested in the same or in a different modality as the adapted one. This pattern of result is most easily explained by assuming that it is the motor system that has been shifted, rather than that the specific criteria for simultaneity were adjusted, or that the visual and auditory modalities were shifted in time. Most likely, participants thus shifted their interpretation about when they moved their finger or when they touched the pad.

At first sight, this may seem quite remarkable if one considers that we experience a strong sense of conscious control when generating self-paced motor actions. Yet, several authors have demonstrated that this sense may be illusory, and that the timing of perceived intentions and actions is quite flexible (Lau et al. 2007; Haggard and Tsakiris 2009). Together with the previously mentioned studies on pure sensory temporal recalibration, it thus seems that the timing of visual, auditory, and motor events are all flexible.

It is of interest to note that JNDs in the present study were relatively small if compared to previous reports on using crossmodal temporal order judgement where JNDs are usually in the order of about 40–80 ms (Keetels and Vroomen in press). Possibly, JNDs were small here because participants were trained and because participants were allowed to give two taps (with two accompanying tones/flashes) rather than a single one. This usually improves sensitivity and reduces noise (Morein-Zamir et al. 2004). More importantly, JNDs were also found to be invariant across modalities and adapted lags. Each of the conditions thus remained equally difficult after lag adaptation. This finding is in contrast with studies that reported that after exposure to asynchronous pairs, there is an increase in the JND rather than a shift in the PSS (Winter et al. 2008; Navarra et al. 2009). It has been argued that this increase in JND is the first stage of temporal recalibration, which may later be followed by a shift in the PSS if the adaptation regime is maintained (Navarra et al. 2009). Our results, though, suggest that the nervous system has the ability to adaptively recalibrate sensory temporal relationships without a discernable loss of sensitivity. This agrees with informal reports from observers who felt that during adaptation, the physically asynchronous stimulus pairs felt close to being perceptually synchronous. The JND data also suggest that this phenomenon is not a product of a loss in sensitivity, but rather that the signals are re-aligned relative to one another.

Further research will be needed to gain a fuller understanding of the mechanisms underlying temporal recalibration. A critical question for future work is how motor-sensory adaptation relates to pure sensory temporal recalibration. One possibility is that motor-sensory recalibration is in fact a purely sensory phenomenon because proprioception (when did I move my finger) or touch (when did my finger hit the pad) rather than the timing of the intention of the self-initiated motor command was adjusted. Another question is the extent to which motor-sensory recalibration depends on the task involved. One possibility is that attention during the exposure phase plays a role. For example, it may be that recalibration becomes even bigger if participants pay attention to the intersensory delay rather than to a unimodal aspect of the stimulus (like detecting a visual or auditory deviant, as in the present case). Previous experience with intersensory timing variability may also be of importance. For example, a delayed feedback signal after a finger tap may in fact be quite natural because humans are exposed to response keys that vary in sensitivity (e.g., it takes about ~25 ms before a stroke on a keyboard is visible as a letter on a computer screen, while there are other buttons—like those of a remote control—that are even slower). There are other examples, though, like hearing oneself speak or seeing oneself move in a mirror for which there is in real life virtually no variability between the movement and the perceptual consequences of that movement. It remains for future research to examine whether in these cases there is flexibility in the system as well.