Introduction

Almost all functional magnetic resonance imaging (fMRI) studies that have examined the human neural substrates of object processing have utilized 2-dimensional (2D) pictures of objects. Although pictures are ubiquitous in everyday life, we interact with real 3-dimensional (3D) objects far more often than 2D representations. Moreover, we have little difficulty in distinguishing between the two. Numerous cortical areas have been identified in the perception of object shape but the neural mechanisms involved in the perception of real 3D objects have received scant investigation with fMRI. In this study we ‘bring the real world into the scanner’ to examine whether the large body of evidence pertaining to human neural processing of pictorial stimuli is applicable also to real-world objects.

The processing of object shape in humans is broadly distributed across a number of cortical areas spanning both the dorsal and ventral visual pathways. Most notably, object-selective neural populations have been identified within the ventral stream along a swathe of inferior temporal cortex known as lateral occipital complex (LOC)1,2. The LOC is dedicated to processing object shape independent of the low-level image features that define the shape. Area LOC produces robust responses to objects depicted in a range of formats including greyscale images, line drawings, silhouettes, shapes defined by motion or textures, or when the percept of form is induced by an illusory contour3. Additional object-selective regions have also been identified within the ‘dorsal’ processing stream particularly along the intraparietal sulcus (IPS)3,4,5,6.

Beyond simple fMRI subtraction designs, neural coding within object-selective cortex has been further investigated using comparisons between repeated vs. unrepeated objects7,8,9,10,11. The characteristic reduction in haemodynamic response with stimulus repetition has been variously referred to as ‘fMR adaptation’ (fMR-A)7,12,13, or ‘repetition suppression’14,15. fMR-A is a robust effect that is a putative analogue of a similar effect seen in nonhuman primates in which neurons within infero-temporal cortex show reduced firing rates as a result of stimulus repetition16,17. Repetition designs have become a popular methodological approach that contrast with standard mapping techniques in their ability to probe neural selectivity in higher-order visual areas at a sub-voxel scale beyond that of traditional fMRI designs12,15,18. In the field of object perception, repetition designs have perhaps most commonly been used to determine whether object-selective neural populations are response invariant to image transformations such as changes in viewpoint, size or illumination4,7.

Repetition effects have been observed in human object-selective cortex with a variety of 2D image types. These include simplified monochrome shapes4, silhouettes19 and line drawings that convey object structure via contours5,7 or integrated elements20,21. Repetition effects have also been demonstrated with ‘richer’ stimuli such as greyscale photographs or other detailed images that provide more information about an object's 3D characteristics via shading and texture4,7,10,22,23,24, or that induce the percept of depth so that they appear to lie in front of the fixation plane19,25.

While this approach has been highly fruitful, we wondered how well this large body of results would generalize to realistic 3D objects. The choice of 2D stimuli to study object recognition has been largely one of convenience and experimental control. The presentation of 2D images simply requires projection of the images onto a flat screen viewed through a mirror by the participant who can lie comfortably in the supine position; moreover, the control of image parameters (e.g., size, depth, timing) is straightforward. Many additional challenges arise in the presentation of real world 3D stimuli; however, many of these problems have been solved in fMRI research on grasping and reaching where 3D objects are required to elicit normal object-directed actions26,27,28. Such approaches involve tilting of the head and head coil to enable direct viewing of real 3D objects within reachable space ( Figure 1a ). These configurations offer realistic presentations of objects in which (a) all binocular and monocular depth cues are consistent, (b) retinal size, viewing distance and expected size are consistent and (c) the location within reachable space means that objects may afford real actions such as manipulation29. Given these differences, we investigated whether the effects obtained with 2D images would be corroborated in a richer, more realistic context.

Figure 1
figure 1

Experimental setup, stimulus items and fMRI trial sequence.

(a) Participants lay supine in the MR scanner with the head supported within the lower portion (6 elements) of an inclined 12-channel head coil. A 4-channel flex coil was positioned over the front of the head. The head coil was tilted forward by 30o to enable direct viewing of stimuli. 2D pictures and 3D object stimuli (illustrated above) were mounted by the experimenter on a turntable positioned over the waist. The experiment was conducted in complete darkness and all trials recorded using an infra-red camera. Stimulus presentation duration was controlled by an LED (illuminator). Participants were asked to identify the objects presented on each trial while maintaining fixation on a single red LED positioned just above the stimulus plane. (b) Six sets of 5 stimulus exemplar objects were used in the fMRI-A experiment (30 in total). A different set of stimulus items was used in each run to prevent cross-adaptation. 3D stimuli from ‘Set 1’ are depicted in (a) and (c). (c) Example trial sequence. Each stimulus item was presented for 500ms within a 3 sec inter-stimulus interval. Stimuli for each upcoming trial were positioned on the turntable during the 20 sec inter-trial interval.

Here we used an fMR repetition paradigm to examine both the overall level of activation and repetition-based effects in the context of real-world 3D objects compared to 2D pictures. We expected clear activation and repetition effects within the ventral and dorsal stream areas identified across prior studies for both stimulus classes. However, the main question was how similar these effects would be for 3D objects. We anticipated that the overall level of activation as well as the strength of repetition effects for the richer, real-world 3D objects would be at least equal to, if not greater than, those for 2D pictures, particularly within the dorsal stream30. Neurophysiology research has characterized several areas within the macaque dorsal stream with 3D object-selective responses, including the anterior intraparietal area (AIP)31,32,33,34, lateral intraparietal area (LIP)35 and caudal intraparietal sulcus (cIPS)36, areas for which human homologues have been proposed37. These areas are postulated to be involved in the extraction of 3D shape for visuomotor transformations associated with the control of action38. Given that human dorsal stream areas show fMR-A with repeated 2D object images4,5 and respond strongly to 3D objects39, such areas may be expected to show larger responses and stronger repetition effects in the context of real-world objects.

Results

We investigated neural object representations associated with 2D pictures and real 3D stimuli within known object selective areas of human cortex. Previous fMR-A paradigms have reported robust repetition effects within the LOC when comparing repeated versus different 2D object images. Here we asked whether real 3D objects elicit a similar pattern. A slow event-related fMR-adaptation design ( Figure 1c ) was employed in which two objects appeared sequentially on each trial. Blood-oxygen-level dependent (BOLD) responses were compared across trials in which paired objects had the same identity (‘Repeat’ condition) versus trials where they were not the same (‘Different’ condition). Repetition effects were measured across two classes of stimuli: real-world 3D objects and 2D colour photographs of the same objects ( Figure 1a,b ) that were matched in all possible respects for size, distance, viewpoint and illumination. We examined repetition effects across the whole brain and within independently defined sub-regions of object-selective LOC.

Region of interest (ROI) analyses

Because of the wealth of past studies showing object selectivity and fMR repetition effects for object images in LOC, our initial analyses utilized a region of interest (ROI) approach to identify LOC within individuals based on an independent localizer run and then extract its pattern of activation from separate experimental runs. LOC was localized by contrasting epochs containing pictures of objects and shapes with those of their scrambled counterparts (see Methods). In accordance with early studies that reported fMR-A effects using 2D stimuli7,19, we searched within two sub-divisions of LOC: an anterior-ventral portion in the posterior fusiform sulcus (pFS) and a posterior-dorsal portion of LOC (LO). Based on previous findings we anticipated that on Different trials where object identity changes BOLD responses should be maximal, whereas on Repeat trials, where paired objects shared the same identity, the BOLD response should be comparatively attenuated. Importantly, we anticipated that the pattern of repetition effects would be similar for 2D and 3D stimuli (if not greater in magnitude for real 3D objects).

To validate our design and procedure, fMRI signals were first compared on event-related trials involving 2D pictures. Time courses of fMRI signals on Different versus Repeat trials involving pictures are displayed in Figure 2 , for LO and pFS (left upper and lower panels, respectively). To quantify repetition effects and compare them across the different stimulus types, we used an adaptation index (AI) which estimates response difference between Repeat and Different conditions relative to the overall fMRI response to a given stimulus4. Positive index values reflect higher responses on Different than Repeat trials; negative values indicate the reverse pattern and values around zero indicate a lack of repetition effects. AIs were calculated using mean activation (β coefficients) in the Different versus Repeat conditions for each stimulus type and the magnitude of repetition effects contrasted using a one-sample t-test against zero and paired-samples t-tests.

Figure 2
figure 2

Timecourse of fMRI signals within LO and pFS for 2D-pictures and 3D-objects.

Data are group results (n = 13). (a–b) Upper panels show responses within LO for Different versus Repeat 2D-pictures (left) or real 3D-objects (right). (c–d) Lower panels show responses within pFS for Different versus Repeat 2D-pictures (left) or real 3D-objects (right). (—) Trials in which a different stimulus appeared on each trial (Different condition). (---) Trials in which identical stimuli appeared (Repeat condition).

Figure 3 plots the AIs for 2D pictures and 3D objects in LO and pFS. To provide meaningful data interpretation in a within-subjects design40,41 error bars in Figure 3 represent 95% confidence interval (CI) of the difference from zero. Robust repetition effects for 2D pictures was observed within both LO (t(12) = 3.68, p = 0.003) and pFS (t(12) = 5.38, p <0.0001) sub-regions of LOC. These findings replicate those of previous studies4,5,7,10,12 and confirm that our design and stimuli were sufficiently sensitive to demonstrate repetition effects.

Figure 3
figure 3

Repetition effects for 2D pictures and 3D objects within LO and pFS.

The magnitude of repetition effects for each stimulus type within each region was quantified using an Adaptation Index (AI)4,93. The AI represents differences in responses between Repeat and Different conditions relative to the overall fMRI response, thereby providing a measure of repetition effects scaled according to activation levels for each stimulus in each ROI. Positive index values reflect higher responses on Different than Repeat trials; negative values indicate the reverse pattern and values around zero indicate a lack of repetition effects. To provide meaningful data interpretation in a within-subjects design, error bars for the difference scores are based on the 95% confidence intervals, which indicate whether or not the average difference was significantly greater than zero (with probabilities equal to those from the t-test).

Next we examined whether similar effects would be observed on 3D object trials that were randomly intermixed with 2D picture trials. Time courses of fMRI signals for 3D objects on Different versus Repeat trials within LO and pFS are displayed in Figure 2 (right upper and lower panels, respectively). Although a qualitatively small change in BOLD signal was evident in the time courses of the Repeat condition relative to the Different condition, the magnitude of this effect was qualitatively attenuated compared to that observed for 2D pictures. Planned comparisons confirmed that for 3D objects, repetition effects did not reach statistical significance in LO (t(12) = 0.88, p = 0.392). In pFS, repetition effects also did not reach statistical significance (t(12) = 1.99, p = 0.057), although there was a clear trend in this direction in this more anterior sub-portion of the LO complex. Finally, a paired-samples t-test contrasting the AIs for 2D versus 3D stimuli in each ROI revealed a trend toward significance between the AIs for 2D versus 3D stimuli in LO (t(12) = 2.04, p = 0.06), but no significant differences between AIs in pFS (t(12) = 0.05, p = 0.29).

As an index of between-subject consistency the proportion of observers who showed greater fMRI BOLD response on Different versus Repeat trials was calculated for each stimulus type and ROI. The observed direction of β coefficients (e.g., a binary score reflecting Different > Repeat, or Repeat > Different) across all participants was compared to the distribution of scores to be expected by chance alone (e.g., a test of the null hypotheses that Different > Repeat in 50% of subjects) using Pearson's chi-square test. For 2D picture trials, 12/13 subjects showed effects in the expected direction (i.e., Different > Repeat) within LO (χ2 = 9.31, p<0.005) and all subjects showed this pattern within pFS, indicating that the frequency of the pattern was not attributable to chance alone. Conversely, for 3D object trials fewer subjects showed effects in the expected direction. The observed proportions were not significantly above chance levels in LO (8/13 subjects; χ2 = 0.69, p>0.40), or pFS (10/13 within pFS; χ2 = 3.77, p>0.05), although there was a trend toward significance in pFS.

In summary, we found robust repetition effects for repeated 2D pictures within both LO and pFS sub-regions of LOC and this pattern was highly consistent across individuals. Surprisingly, however, repetition effects were attenuated for trials involving real 3D objects; we did not observe significant repetition effects within LO or pFS sub-regions of object-selective cortex. Furthermore, the direction of effects in Different versus Repeat conditions varied across subjects in both ROIs suggesting that changes in 3D object identity did not have a reliable influence on the BOLD response.

Voxel-wise group analyses

Group-based voxel-wise GLM analyses were subsequently performed to explore repetition effects at the whole-brain level and specifically to determine whether there was evidence for repetition-based BOLD changes on 3D object trials outside of LOC. We first ran the contrast [+2D Different −2D Repeat] to identify regions showing significant repetition effects for 2D pictures (using a threshold of p<0.005, cluster size threshold corrected). Figure 4 illustrates the group results displayed on the cortical surface of a representative participant. As expected, significant areas of activation were observed within established regions of object-selective cortex. Large bilateral clusters were observed along lateral and ventral occipito-temporal cortex, including fusiform, lingual, lateral occipital and inferior temporal regions. Similar activation was also evident within ‘dorsal stream object areas’, extending from the expected location of anterior V3, dorsally into the intraparietal sulcus (IPS) anterior to the expected location of IPS-042. In sharp contrast, an analogous comparison for 3D stimuli (using the contrast [+3D Different −3D Repeat] at the same p-value threshold) revealed no significant areas of positive activation, either cortically or sub-cortically ( Table 1 ). In fact, the reverse contrast [+3D Repeat −3D Different] revealed several clusters of significant activation consistent with a pattern of ‘repetition enhancement’ (i.e., greater BOLD response on Repeat than Different trials).

Table 1 Voxelwise Group Results. Talairach coordinates and cluster size for identified regions.
Figure 4
figure 4

Group functional activation for the contrast [2D-Different > 2D-Repeat] in the whole-brain voxel-wise analysis.

Activation is overlaid on the inflated cortical surface of a representative observer. Widespread repetition-based changes in activation for pictures of objects were observed across temporal and parietal cortex. Conversely, no such activation changes were identified for real 3D objects [i.e., using the contrast 3D-Different > 3D-Repeat] at the same threshold. Dorsal surface (far left), Right Hemisphere (top middle), Left Hemisphere (lower middle) and Ventral Surface (far right). Sulci are represented in dark grey and gyri in light grey.

We then searched for areas in which activation was significantly different for 2D than 3D stimuli (collapsed across Repeat and Different trials) using the contrasts [+2D−3D] and [+3D−2D] ( Table 1 ). The comparison [+2D−3D] revealed two small clusters of positive activation: one cluster centered at the occipital pole (V1) of the RH calcarine sulcus and another in the inferior temporal gyrus of the RH. The comparison [+3D−2D] revealed no positive activation. The representation of our 2D pictures and real-world 3D instances of the same objects therefore shared the same anatomical loci. Finally, any interaction between Stimulus Type and Repetition was examined using the contrasts (a): +3D Different −3D Repeat +2D Different +2D Repeat (i.e., greater repetition effects for 3D than 2D stimuli) and (b): +2D Different −2D Repeat −3D Different +3D Repeat (i.e., greater repetition effects for 2D than 3D stimuli). Brain areas showing greater repetition effects for 2D than 3D stimuli again included largely bilateral swathes of activation around the lingual and fusiform gyri and superior temporal sulci, as well as clusters in the left parieto-occipital fissure and middle frontal gyrus of the RH. The reverse interaction contrast (i.e., greater repetition effects for 3D than 2D stimuli) revealed no positive activation clusters.

Comparisons with Foci from Prior Studies

Finally, we sampled group activation within a number of additional ROIs that correspond to areas previously implicated in 3D form processing4,39,43 (see Figure 5 ). Across a total of 14 ROIs spanning early visual, temporal and parietal cortex, we found significant 3D repetition effects in just two areas; one roughly corresponding to V3A and another within left-sided ‘LOtv’ – a putative visuo-tactile ‘multimodal’ sub-component of the LO complex situated along the ventro-lateral bank of the temporal lobe43. In contrast, significant (or close to significant) 2D repetition effects were found in almost all of the additional ROIs (see Supplementary Table 1 ).

Figure 5
figure 5

Loci of additional group-based region of interest (ROI) analyses displayed on the inflated cortex of a representative subject.

The cortex is illustrated from a posterior-ventral viewpoint. Group-based region of interest analyses were conducted at the marked loci in each hemisphere (see Supplementary Table 1). (A–I): Sites within occipital cortex, intraparietal sulcus, inferior temporal gyrus and premotor cortex in which second-order disparity-selective neurons are thought to extract and process 3D depth structure from stereo39. Points (H–I) lie anterior to the central sulcus and are not visible from the above viewpoint. (J–M): Topographically organized areas within the intraparietal sulcus (IPS) areas 1–4, as reported by Konen & Kastner4. Using a variety of 2D greyscale picture stimuli these authors report significant adaptation effects within IPS1 and IPS2, but not more dorsally within IPS 3 and IPS4. (K) Loci correspond to LOtv, located along the ventro-lateral bank of the temporal lobe43. Area LOtv is selective for both visual and haptic object properties and is argued to support abstract 3D shape representations. (A) V3A complex (1); (B) V3A complex 2; (C) ITG; (D) VIPS/V7; (E) POIPS; (F) DIPSM; (G) DIPSA; (H) dPrCS; (I) vPrCS; (J) IPS1; (K) IPS2; (L) IPS3; (M) IPS4; (N) LOTV. (See also Table S1.)

Discussion

Here we used slow event-related fMRI to contrast repetition-related changes in fMRI responses to 2D pictures of objects with real-world 3D exemplars. Whereas presentation of 2D pictures elicited strong repetition-related changes in the BOLD response, the same effect was surprisingly weak, if not absent, in the context of real-world 3D objects. We searched for repetition effects within discrete regions of object-selective cortex and across the whole brain. Contrary to our expectations, manipulating 3D object identity (using Repeated versus Different objects) did not produce a significant change in BOLD response within LOC. Further, within this area there was marked variability across participants in the relative magnitude of the BOLD response in Repeat versus Different 3D object conditions. Indeed, within area LO individual participants were just as likely to show a stronger BOLD response on Repeat object-identity trials for 3D objects than on Different trials for 3D objects – a pattern sometimes labeled as ‘repetition enhancement’2,7,22,24,44,45,46,47. In line with these results, an analysis of group effects at the whole brain level also revealed no evidence of fMR-repetition effects on 3D object trials.

The results for real-world 3D objects contrast sharply with those for 2D object images. In line with previous reports, participants in our study showed robust fMRI repetition-based changes on randomly interleaved trials that involved 2D pictures. In the ROI analyses, significant 2D repetition effects were observed within both LO and pFS sub-regions of LOC and BOLD response patterns were highly consistent across observers. Accordingly, whole-brain analyses revealed robust repetition effects for 2D objects that spread anteriorly and bilaterally along classical ventral stream object-selective cortex and dorsally along putative object-selective cortex in the vicinity of the IPS. Finally, we found evidence for 2D repetition effects within a number of additional ROIs that correspond to areas previously implicated in 3D form processing4,39,43. The same pattern was not observed for 3D stimuli.

Whole brain analyses confirmed that activation patterns were strikingly similar for our 2D pictures and 3D object trials, confirming that our stimulus sets were matched for low-level properties (including illumination, size, colour and viewpoint). We further quantified repetition effects using an adaptation index to account for possible underlying differences in responsivity across different brain areas to our paired 2D and 3D stimulus events4. The effect we observed for 2D vs. 3D stimulus classes is unlikely to be attributable to differences in eye movement patterns or shifts of attention. Our tilted-head setup precluded the use of an eye-tracker; however, all participants reported that they were able to easily discriminate all stimuli while maintaining their gaze on the fixation point. Moreover, no activation differences between 2D and 3D objects were found in eye-movement- and attention-related areas, such as the frontal eye fields or parietal cortex48,49,50. Further, given that participants merely passively viewed the stimuli, differences in task-related attentional demands were also unlikely. It is possible that observers found the 3D objects “more interesting” than their 2D counterparts. If that were the case, however, then one would have expected to see greater activation in LOC and other object-related areas with 3D as opposed to 2D and amplified repetition effects for 3D compared to 2D stimuli51,52. But we found exactly the opposite.

Given that explanations based on attention or eye-movements are unlikely, our results may reflect differences in the way real world 3D objects are processed as compared to 2D pictures. Real objects differ from pictures in several important respects: (a) they possess additional shape information from stereoscopic cues such as vergence and disparity, (b) both monocular and binocular cues to object shape are consistent for real objects and (c) 3D objects are tangible substances that exist in the environment. The possible contribution of each of these differences between pictures and real objects to our observed findings is considered in turn below.

Given that real objects possess additional shape information from stereoscopic cues compared to pictures, this raises the question of whether or not the same pattern observed for real objects would arise with objects defined by stereopsis alone (i.e. stereograms) where the percept of 3-dimensionality arises entirely from binocular disparity. Neurophysiological studies have identified neurons that are sensitive to shapes defined by binocular disparity within early visual areas53,54,55,56,57,58,59, dorsal areas such as MT and parietal cortex60,61,62,63,64,65 and in the inferior temporal cortex66,67,68,69,70,71,72,73. To our knowledge, no human fMRI studies to date have directly compared repetition effects for stereo versus real-world 3D objects, or stereo displays involving objects with 3D structure. Kourtzi and Kanwisher25, used stereo displays involving planar shapes to show that responses within LOC were identical despite changes in the stereoscopic depth of the shape. Similarly, Kourtzi et al.,19 found equivalent BOLD responses on trials depicting identical silhouette shapes and trials where a 2D silhouette was followed by a stereo silhouette image (so that the shape appeared to lie in front of the fixation plane). These findings imply that object shape is processed similarly within LOC, whether the shape is depicted in a purely 2D format or with additional stereo cues. Importantly, however, the stimulus objects in these studies had no 3D structure; the stimuli simply defined figure from ground and provided information about the outer contours of the shape (i.e., first-order stereo). Unlike real objects, they contained no information about intrinsic curvature or shape (i.e., second-order stereo). Therefore, it remains an open question as to whether the effects observed here for real world objects would also emerge with stereo displays with objects that possess different second-order shape cues.

Another important difference between pictures and real objects is that the binocular and monocular cues to object shape are completely consistent for 3D objects but are in conflict for 2D pictures. Looking at a picture, binocular cues indicate that it is completely flat whereas monocular cues such as shading, texture gradients, occlusion, specular highlights and other pictorial cues signify a 3D representation. It is possible that classical repetition and release effects typically observed in picture viewing may be attributable to processes associated with resolving such depth cue conflict. For example, the additional processing required to decipher object identity from 2D pictures as a result of cue conflict could result in a higher fMR response (release from adaptation) on ‘Different’ 2D trials. Further, the similarity in stereo information conveyed by pictures may result in stereo cues being discounted in the analysis of object shape and other pictorial cues weighted more highly. Given that some pictorial cues can be more effective than others in conveying object shape for particular objects, these differences in the cues that are used across trials would result in greater release from adaptation on ‘different’ trials, because different sets of neurons, each tuned to particular pictorial cues, would be engaged in each case. In contrast, because binocular cues like stereo are such powerful indicators of object shape in the case of 3D objects (which may therefore be weighted more highly in the analysis of object shape), the same set of stereo-sensitive neurons that analysis object shape would be engaged – even for different objects.

Finally, our preliminary fMRI results raise the provocative suggestion that the presence of real-world objects (i.e., as indicated initially via stereoscopic cues) invokes qualitatively different computations to those elicited by 2D images. Researchers in the field of behavioral psychophysics have expressed long-standing concern about the extent to which pictures of objects capture the properties of their real-world counterparts (i.e., their ecological validity), with reservations as to their appropriateness as stimuli with which to examine the nature of human object perception74,75. Indeed, there are clear differences between pictures and objects that suggest some degree of caution in assuming equal neuronal response patterns between the two stimulus classes. Whereas images consist merely of patterns of light arising from a 2D projection surface, real objects are tangible substances that exist in 3D space with a definite texture, reflectance, colour and shape. Real objects, unlike pictures, have an unambiguous size, distance and location relative to the observer – factors that are known to alter single unit responses in macaque inferior temporal cortex76. Moreover, as discussed earlier all the cues to depth structure, both binocular and monocular, are congruent for 3D objects. Finally, real objects have properties that relate specifically to the motives and needs of the observer – that is, they provide affordances74. An object placed within arm's length affords reaching, grasping and manipulation. Indeed, fMRI studies demonstrate that information about 3D form is critical for the visual control of grasping and manipulation26,29.

Although comparatively few research studies have been carried out with real-world objects than with 2D images in humans, numerous findings point to the possibility that real objects are cognitively distinct from their 2D counterparts. For example, patients with visual agnosia often show a ‘real object advantage’ in which identification of objects depicted as line-drawings or silhouettes is impaired while recognition performance for real objects remains intact77,78,79,80,81. Similarly, in healthy observers, the value applied to objects is affected by the format in which they are viewed. For example, Bushong et al.,82 gave university students a small monetary endowment that could be used to purchase a range of test objects (i.e., food or trinkets). The test items were depicted in one of three formats: text displays, high-resolution images, or actual real-world objects. Surprisingly, students were willing to pay between 40–61% more for objects they viewed as real-world exemplars over the same items depicted in text format or image displays. Moreover, this effect went away when the objects were placed behind a transparent barrier, suggesting that the effect was driven by the potential for interaction with the objects.

In summary, relative to previous research using 2D pictures5,25,83, our findings indicate that the neural analysis of 3D objects may not fit within the classically defined pattern and that adaptation and corresponding release effects may not be an obligatory consequence of object repetition manipulations13. Our results further suggest that the analysis and/or representation of object structure does not proceed independently of the cues that define the object – in this case, when the term ‘object’ is extended to include actual real-world exemplars. The neural mechanisms involved in the perception of real-world 3D objects may therefore be distinct from those that arise when we encounter a 2D planar representation of the very same items. Furthermore, such processes may also change with environmental context – such as whether an object is located within reachable space29. We have highlighted a number of possible routes for future investigation to further elucidate the cognitive and neural mechanisms responsible for the pattern of repetition effects reported here for 2D versus 3D objects. As we have argued, many of the simpler explanations seem unlikely (eye movements, attention), leaving the possibility of inherent differences in the processing of real objects vs. photographs. Whether the invariant neural response we observed for real-world 3D objects is attributable to the additional depth cues provided by binocular vision or the physical presence of the objects, the important finding here is that the underlying response pattern is different from that observed in the context of 2D planar images. Although many fMRI studies have used repetition designs to probe neural sensitivity to different types of stimuli, the computational mechanisms that underlie this effect are not fully understood84,85,86,87,88,89. Regardless of which particular mechanisms account for repetition effects, however, there is no doubt that differential adaptation effects for 2D pictures and 3D objects reflect differences in neuronal processing and interactions.

Due to the technical challenges associated with presenting real world objects within the scanner, we used a slow event-related design. It is possible that the different pattern of repetition effects reported here for 2D versus 3D stimuli are specific to the temporal dynamics of our stimulus presentation. Similarly, the paired adaptation paradigm used in the present study may have a small dynamic range and in the presence of noise, small but nevertheless significant repetition effects may be missed. An important question for future investigation therefore is whether or not the patterns observed here also emerge in the context of different stimulus durations or alternative fMRI designs, such as blocked or rapid event-related designs with more repetitions that yield stronger repetition effects. In any case, if the statistical power of the present design were to be increased, then it is likely that the differences that we have already observed between 2D and 3D stimuli would be amplified rather than reduced.

Our ability to perceive real 3D objects from patterns of light that project on the retina remains one of the most remarkable and yet perplexing aspects of human vision. Yet our understanding of the neural substrate of perception is largely based upon studies that have utilized 2D images. The conventional use of 2D images in fMRI research, in particular, may pose underestimated limits to our understanding of the neural underpinnings of human vision. The human visual system has largely evolved to perceive and interact with a 3-dimensional environment, rather than pictures. Surprisingly, however, there is a paucity of controlled published studies involving real objects and fewer still that directly contrast behavioral or fMR measures across objects and images. We argue here that pictures might represent a limited class of stimuli with which to characterize the neural computations associated with human object recognition74. Our findings for real 3D objects suggest some caution in extrapolating experimental results based upon the presentation of abstract or simplified stimuli, or findings drawn from within artificial or constrained environments. Notwithstanding, these results provide an important first step in understanding how real-world stimuli are coded by the human brain and complement a growing body of research90,91 emphasizing the importance of studying behavior in ecologically valid contexts.

Methods

Subjects

Sixteen healthy observers with normal or corrected-to-normal vision participated in two scanning sessions, one for the fMR-A experiment and one session for localizing LOC. The data from two subjects was removed due to excessive head movement (between 2 to > 4mm translation or 2 to >4 degrees of rotation). Data from an additional participant was eliminated due to technical problems with the LED illuminators. Informed consent was obtained in accordance with procedures approved by the University of Western Ontario's Health Sciences Review Ethics Board and of the Queen's University Human Research Ethics Board. All participants were naive with respect to the experimental hypothesis.

Visual stimuli

Stimuli for the fMR-A experiment comprised of a set of 30 easily recognizable real 3D objects and a corresponding set of 30 2D coloured photographs of the same objects (see Figure 1(b) ). Although it was not our intention to directly compare 2D-to-3D stimulus presentations within a given trial, the 2D photographs were nonetheless closely matched to the 3D objects in aspects of luminance, shading, position and orientation. Stimulus position and orientation were controlled using mountings beneath each stimulus that attached to the viewing platform. The rear side of each stimulus was fitted with a wooden pedestal block. The pedestal blocks fit into a concave holder attached to the viewing platform. On each trial stimuli were mounted on a black turntable placed over the participant's waist and fixed to the scanner bed (see Figure 1(a) ). The turntable had a central divider, yielding two semicircular platforms for stimulus presentation. The pedestal holders, one fixed to the midline of each semicircular platform, held the stimuli firmly in place and ensured identical viewing conditions within and between trials. The 2D stimuli were constructed by photographing each 3D object with a Sony Alpha DSLR-A100 camera (with flash) held on a tripod. Each object was photographed mounted on the viewing platform, with the platform fixed at a comparable angle and viewing distance to that used in the scanner. High resolution 2D colour images of each object were printed on matte paper and mounted upon card backing that was cut to match the outline of turntable divider. The paradigm and all object stimuli were pilot-tested in the scanner with an inert phantom to ensure that they did not produce any artifacts (i.e., from turntable movement or object transition).

Procedure and design

The main fMR-A experiment had a 2 × 2 design with the factors of Repetition (Repeat versus Different objects) and Stimulus Type (2D pictures versus 3D objects). In the 2D-Repeat condition, both pictures within the trial depicted the same object, while in the 2D-Different condition the two pictures depicted different object identities. In the 3D-Repeat condition both stimuli within the trial were the same real 3D object, while in the 3D-Different condition the objects presented within a trial had different identities. Each scan consisted of 20 trials, 5 trials for each of the four conditions. The order of conditions was counterbalanced so that trials from a given condition were preceded equally often by trials from each of the other conditions. The 60 stimuli were divided into 6 sets of 10 items, one set per scan (five 3D objects plus five matching 2D pictures). Each stimulus object exemplar appeared equally often in each of the four conditions, ensuring that activation differences were due to the relationship between the paired stimuli and not differences in the stimulus objects used in each condition. A new set of stimuli was used for each scan to prevent long-term adaptation. Participants each completed 5–6 scans (depending on time constraints) and the order of scans (object sets) was counterbalanced across subjects.

The setup (see Figure 1(a) ) enabled participants to directly view the stimuli without the need for a mirror. The experiment was conducted in complete darkness, except for a small red LED fixation light positioned in-front of the stimulus plane. The fixation point remained on throughout the entire scan but was too dim to illuminate the scene. Each trial lasted for 24 s. Picture or object stimuli were presented for 500 ms with a 3 s inter-stimulus interval ( Figure 1c ). Stimulus duration was controlled by the onset of a white LED ‘illuminator light’ positioned just above and in front of the turntable. A 20 s inter-trial interval (fixation only) followed each stimulus pair and served as the baseline against which to compare trial-related neural activity. An additional 10 s of fixation baseline were collected at the start of each scan and 20 s at the end. Timing of stimulus illumination, fixation and auditory events were controlled using E-Prime software.

On each trial, stimuli were manually positioned in the turntable by the experimenter. The experimenter received an auditory cue via headphones as to which objects or pictures to mount on the turntable on upcoming trials. Small glow-in-the-dark shapes attached to the base of each pedestal block enabled the experimenter to locate the relevant stimulus items. An infra-red MR-compatible bore camera (MRC Systems GmbH) positioned just behind the participant's head was used to record the accuracy of the experimenter's stimulus presentations. Participants were instructed to observe and identify the objects presented on each trial, while maintaining their gaze at fixation throughout the entire experiment, including the stimulus events.

All participants completed a separate LOC localizer scan (2 runs) in which visual stimuli were presented using a video projector connected to a personal computer laptop. Stimuli for the LOC localizer consisted of 300 × 300 pixel greyscale images and line drawings of familiar and novel objects and scrambled versions of each set, each with overlapping grid-lines, as described in numerous previous studies5,19,25. The images were back-projected onto a screen which was viewed via a mirror attached to the top of the head coil. The LOC localizer had a blocked design with sixteen stimulus epochs and interleaved fixation periods of 16 s each. Twenty images were presented within each epoch. Images were presented for 250 ms with a blank interval of 550 ms between stimuli. Participants were instructed to passively view the images while fixating.

MRI acquisition

Scanning was carried out on a 3 Tesla Siemens Magnetom Tim Trio imaging system. From the participants whose data were used in the analysis, ten participants were scanned at Queen's University (Kingston, Ontario, Canada) and three participants were scanned on an identical machine at the Robarts Research Institute at The University of Western Ontario (London, Ontario, Canada), each using identical scanning parameters. For all participants in the fMR-A experiment, the functional data were acquired with a T2*-weighted single-shot gradient-echo echo-planar imaging sequence with interleaved slice acquisition. Rather than using a standard head coil configuration, we positioned subjects within the tilted the posterior half (6 channels) of a 12-channel (Siemens) receive-only head coil to enable direct viewing of the stimuli. Participants scanned in London also had an additional 4-channel flex coil suspended over the front of the head to enhance signal-to-noise ratio in anterior regions. Foam padding was used to reduce head motion.

For the main experiment the parameters for obtaining functional data were: field of view (FOV) = 211 mm × 211 mm; in-plane resolution = 3.3 mm × 3.3 mm; slice thickness = 3.3 mm; 32 axial slices; echo time (TE) = 30 ms; repetition time (TR) = 2000 ms; flip angle (FA) = 78°. For the LOC localizer, subjects were scanned using a 12-channel Siemens head coil (un-tilted). Scanning parameters for the localizer were identical to that of the main experiment except for number of slices (33). Functional data were aligned to high-resolution anatomical images obtained using a 3D T1-weighted MPRAGE sequence (TE = 2.98 ms; TR = 2300 ms; TI (inversion time) = 900 ms; FA = 9°; 192 contiguous slices of 1 mm thickness; FOV = 240 mm × 250 mm2).

Data Preprocessing and Analysis

Data were preprocessed and analyzed using Brain Voyager QX (Version 1.10.2, Brain Innovation, Maastricht, Netherlands). Functional data were assessed for head motion and/or magnet artifacts by viewing cine-loop animation and examining motion detection parameter plots following 3D motion correction algorithms on the untransformed two-dimensional data, aligned to the functional volume closest in time to the anatomical scan. Any runs where head motion exceeded 1 mm of translation and/or 1 degrees of rotation were excluded from the analyses (5 runs in total across all 13 subjects). Functional data were preprocessed with high-pass temporal filtering to remove frequencies below 3 cycles/run. Functional volumes were then superimposed on anatomical brain images transformed into Talairach space92.

Region of interest (ROI) analyses

We first performed ROI analyses to determine whether neural populations within the LOC respond similarly to repetitions of 2D and to 3D objects. As in previous object fMR-A studies7,19, two subregions of the LOC were identified: LO (lateral occipital) located at the posterior end of the inferior temporal sulcus and pFS (posterior fusiform sulcus). For each individual, ROIs were identified by selecting voxels within these anatomically defined regions of ventral occipitotemporal cortex that were activated more strongly by intact than scrambled images of objects presented in the localizer scans. ROIs were isolated by first locating the peak voxel of activation within each region. ROI size was constrained by setting the threshold to a desired minimum (t>3.0) before selecting a volume of interest up to 10 mm3 around the peak voxel. All single-subject analyses were performed on unsmoothed data. fMRI signal time-courses and β weights were extracted for each scan and hemisphere. The data were averaged to produce means for each condition in the two ROIs. These data were then averaged across subjects to yield group results. Repetition effects were quantified using an adaptation index (AI). The AI is defined based on responses elicited in the Different versus Repeat conditions using the following formula: AI = (Rdifferent − Rrepeat)/(Rdifferent + Rrepeat), where Rrepeat is the mean fMRI signal obtained on Repeat trials and Rdifferent is the mean fMRI signal obtained on Different trials4,93. β weights were positive for all subjects in both ROIs; consequently, no negative values were entered into the denominator term of the AI. Statistical significance was assessed using single-sample t-tests against zero and paired samples t-tests.

Voxel-wise group analyses

We subsequently performed a whole-volume voxel-wise analysis of the group data to determine the extent to which repetition-based effects occurred for 2D and 3D objects at the whole-brain level. Data for each subject were spatially smoothed (6 mm full-width at half-maximum Gaussian kernel) and separate predictor functions generated for the four experimental conditions. Predictor functions were generated for the four conditions by convolving a rectangular wave function with a standard haemodynamic response function. Group data were then analyzed using a random effects (RFX) general linear model (GLM). The data were processed using a percentage signal change transformation.

Repetition effects were examined separately for each stimulus category (2D, 3D). Activation in Different trials was contrasted with that on Repeat stimulus trials (e.g., +Different -Repeat). For each contrast, the resultant group activation maps were set to a minimum statistical threshold (p<0.005) and minimum cluster size threshold of 5 functional voxels of 3 mm3 each, totaling 135 mm3 or greater (based on Brain Voyager's cluster threshold estimation plug-in). In addition, we examined whether there was a main effect of Stimulus Type by searching for areas in which activation was significantly different for 2D than 3D stimuli (and vice versa) using the contrast (+2D −3D). Finally, the interaction between Stimulus Type and Repetition was examined using the contrasts (a): +3D Different −3D Repeat −2D Different +2D Repeat (i.e., greater adaptation for 3D than 2D stimuli) and (b): +2D Different −2D Repeat −3D Different +3D Repeat (i.e., greater adaptation for 2D than 3D stimuli).

Comparisons with Foci from Prior Studies

Further to conducting the ROI analyses for individual subjects, we then compared BOLD responses for our 2D and 3D conditions across the group as a whole within brain areas identified in previous fMRI studies of ‘3D object perception’4,39,43 (see Figure 5 & Supplementary Table 1 ). Group activation for all 4 conditions of the main experiment were contrasted with Fixation (i.e., +2D Different +2D Repeat +3D Different +3D Repeat). The resultant activation map was set to a minimum statistical threshold (t>3.0) and displayed on the anatomical surface of a representative observer. ROI size was constrained by setting the activation threshold to a minimum (t>3.0) before selecting a volume up to 10 mm3 around the selected voxel (except for IPS points 1–4 in which, due to the proximity of neighboring regions, a 5 mm3 cluster size was applied to prevent ROI overlap). MNI co-ordinates of the nine regions involved in processing 3D depth structure from stereo identified by Georgieva et al.,39 were converted to TAL points using the MNI to Talairach Coordinate Converter (http://www.bioimagesuite.org/Mni2Tal/index.html).