Skip to main content
Top
Published in:
Cover of the book

Open Access 2023 | OriginalPaper | Chapter

10. Audio in Multisensory Interactions: From Experiments to Experiences

Author : Stefania Serafin

Published in: Sonic Interactions in Virtual Environments

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
download
DOWNLOAD
print
PRINT
insite
SEARCH
loading …

Abstract

In the real and virtual world, we usually experience sounds in combination with at least an additional modality, such as vision, touch or proprioception. Understanding how sound enhances, substitutes or modifies the way we perceive and interact with the world is an important element when designing interactive multimodal experiences. In this chapter, we present an overview of sound in a multimodal context, ranging from basic experiments in multimodal perception to more advanced interactive experiences in virtual reality.

10.1 Introduction

This book examines the role of sound in virtual environments (VEs). However, most of our interactions with both the physical and virtual worlds appear through a combination of different sensory modalities. Auditory feedback is often the consequence of an action produced by touch and is presented in the form of a combination of auditory, haptic and visual feedback. Let us consider for example the simple action of walking: The auditory feedback is given by the sound produced by the shoes interacting with the floor, the visual feedback is the surrounding environment, and the haptic feedback is the feeling of the surface one is stepping on. It is important that these different sensory modalities are perceived in synchronization, in order to experience a coherent action.
Since sound can be perceived from all directions, it is ideal for providing information when the eyes are otherwise occupied. This could be a situation where someone’s visual attentionshould be entirely devoted to a specific task such as driving or a surgeon operating on a patient [46]. Another notable property of the human auditory system is its sensitivity to the temporal aspects of sound [3]. In many instances, response times for auditory stimuli are faster than those for visual stimuli [55]. Furthermore, given the higher temporal resolution of the auditory system compared to the visual system, people can resolve subtle temporal dynamics in sounds more readily than in visual stimuli; thus the rendering of data into sound may manifest periodic or other temporal information that is not easily perceivable in visualizations [17]. Moreover, the ears are capable of decomposing complex auditory scenes [3] and selectively attending to certain sources, as seen, for example, in the cocktail party problem [7]. Audition, then, may be the most appropriate modality for simple and intuitive (see [9, 37]) information display when data have complex patterns, express meaningful changes in time, or require immediate action.
In this chapter, an overview is presented of how knowledge of human perception and cognition can be helpful in the design of multimodal systems where interactive sonic feedback plays an important role. Table 10.1 presents a typology of different kinds of cross- modal interactions, adapted from [2].
Table 10.1
Typology of different kinds of cross-modal interactions
Cross-modal interaction
Description
Example
Amodal mapping
Use of VEs or other representational system to map abstract or amodal information (e.g., time, amount, etc.) to some continuous or discrete sensory cue
The use of colour mapping and relative size in graphics and scientific visualization (e.g., colour, size, depth, etc.)
Cross-modal mapping
Use of a VE to map one or more dimensions of a sensory stimulus to another sensory channel
An oscilloscope
Intersensory biases
Stimuli from two or more sensory channels may represent discrepant/conflicting information
Ventriloquism effect [24]
Cross-modal enhancement
Stimuli from one sensory channel enhance or alter the perceptual interpretation of stimulation from another sensory channel
Increased perceived visual fidelity of display as a result of increased auditory fidelity
Cross-modal transfers or illusions
Stimulation in one sensory channel leads to the illusion of stimulation in another sensory channel
Synesthesia
Sonic feedback can interact with visual or haptic feedback in different ways. As an example, cross-modal mapping represents the situation where one or more dimensions of a sound are mapped to visual or haptic feedback: A beeping sound combined with a flashing light. In cross-modal mapping, there is no specific interaction between the two modalities, but simply a function that connects some parameters of one modality to the parameters of another.
Intersensory biases become important where audition and a second modality provide conflicting cues. In the following section, several examples of intersensory biases will be provided. In most of these situations, the user tries to perceptually integrate the conflicting information. This conflict might lead to a bias towards a stronger modality. One classic example is the ventriloquism effect [24], which illustrates the dominance of visual over auditory information when spatially discrepant audio and visual cues are experienced as co-localized at the location of the visual cue.
The name clearly derives from the ventriloquists, who are able to give the impression that the speaking voice is originated from the dummy they are holding, as opposed as from the person herself. This effect is commonly used in cinemas and home theatres where, although the sound physically originates at the speakers, it appears as if coming from the moving images on screen. The ventriloquist effect occurs because the visual estimates of location are typically more accurate than the auditory estimates of location, and therefore the overall perception of location is largely determined by vision. This phenomenon is also known as visual capture [64]. Another classic example is the Colavita effect [8]. In the original experiment, Colavita presented participants with an auditory (tone) or visual (light) stimulus, to which they were instructed to respond by pressing the tone key or light key respectively. When presented with bimodal stimuli, the visual dominance effect refers to the phenomenon where participants respond more often to the visual component.
Vision is indeed the dominant sense in many circumstances. On one hand, visual dominance over hearing and other sensory modalities has been frequently demonstrated (e.g., [45]), and a neural basis has been posited for visual dominance in processing audiovisual objects (e.g., [48]). Cross-modal enhancement refers to stimuli from one sensory channel enhancing or altering the perceptual interpretation of stimuli from another sensory channel. As an example, three studies presented in [57] show how high-quality auditory displays coupled with high-quality visual displays increase the quality perception of the visual displays relative to the evaluation of the visual display alone. Moreover, the same study shows how low-quality auditory displays coupled with high-quality visual displays decrease the perception of quality of the auditory displays relative to the evaluation of the auditory display alone. These studies were performed by manipulating the pixel resolution of the visual display and Gaussian white noise level, and by manipulating the sampling frequency of the auditory display and Gaussian white noise level. These findings strongly suggest that the quality of realism in an audiovisual display must be a function of both auditory and visual display fidelities and their interactions. Cross-modal enhancements can occur even when extra-modal input does not provide information directly meaningful for the task. An early study by Stein asked subjects to rate the intensity of a beam of light. Their findings showed that the test subjects believed the light to be brighter when it was accompanied by a brief, broadband auditory stimulus than when it was presented alone. The auditory stimulus produced more enhancement for lower visual intensities, regardless of the relative location of the auditory cue source.
Cross-modal transfers or illusions are situations where stimulation in one sensory channel leads to the illusion of stimulation in another sensory channel. An example of this is synesthesia, which in the audio-visual domain is expressed as the ability to see a colour while hearing a sound. When considering inter-sensory discrepancies, Welch and Warren propose a modality appropriateness hypothesis [64] that suggests that various sensory modalities are differentially well-suited to the perception of different events. Generally, it is supposed that vision is more appropriate for the perception of spatial location than audition, with touch sited somewhere in between. Audition is most appropriate for the perception of temporally structured events. Touch is more appropriate than audition for the perception of texture, where vision and touch may be about equally appropriate for the perception of textures. The appropriateness is a consequence of the different temporal and spatial resolution of the auditory, haptic and visual systems. Moreover, especially when it is combined with touch stimulation, sound increases the sense of immersion [63].
Apart from the way that the different senses can interact, the auditory channel also presents some advantages when compared to other modalities. For example, humans have a complete sphere of receptivity around the head, while visual feedback has a limited spatial region in terms of field-of-view and field-of-regard. Because auditory information is primarily temporal, the temporal resolution of the auditory system is more precise. We can discriminate between a single click and a pair of clicks when the gap is only a few tens of microseconds [30]. Perception of temporal changes in the visual modality is much poorer, and the fastest visible flicker rate in normal conditions is about 40–50 Hz [4]. In multi-sensory interaction, therefore, audio tends to elicit the shortest response time [33].
In contrast, the maximum spatial resolution (contrast sensitivity) of the human eye is approximately 1/30 degrees, a much finer resolution than that of the auditory system, which is approximately 1 degree. Humans are sensitive to sounds arriving from anywhere within the environment whereas the visual field is limited to the frontal hemisphere, with good resolution limited specifically to the foveal region. Therefore, while the spatial resolution of the auditory modality is cruder, it can serve as a cue to events occurring outside the visual field-of-view.
In the rest of this chapter, we provide an overview of the interaction between audition and vision and between audition and touch, together with guidelines on how such knowledge can be used in the design of interactive sonic systems. By understanding how we naturally interact in a world where several sensorial stimuli are provided, we can apply this understanding to the design of sonic interactive systems. Research on multisensory perception and cognition can provide us with important guidelines on how to design virtual environments where interactive sound plays an important role. Through technical advancements such as mobile technologies and 3D interfaces, it has become possible to design systems that have similar natural multimodal properties as the physical world. These future interfaces understand human multimodal communication and can actively anticipate and act in line with human capabilities and limitations. A large challenge for the near future is the development of such natural multimodal interfaces, something that requires the active participation of industry, technology, and the human sciences.

10.2 Audio-Visual Interactions

Research into multimodal interaction between audition and other modalities has primarily focused on the interaction between audition and vision. This choice is naturally due to the fact that audition and vision are the most dominant modalities in the human perceptual system [29]. A well-known multimodal phenomenon is the McGurk effect [38]. The McGurk effect is an example of how vision alters speech perception; for instance, the sound “ba” is perceived as “da” when viewed with the lip movements for “ga”. Notice that in this case, the percept is different from both the visual and auditory stimuli, so this is an example of intersensory bias, as described in the previous section.
The different experiments described until now show a dominance of vision over audition, when conflicting cues are provided. However, this is not always the case. As an example, in [53, 54], a visual illusion induced by sound is described. When a single visual flash is accompanied by multiple auditory beeps, the single flash is perceived as multiple flashes. These results were obtained by flashing a uniform white disk for a variable number of times, 50 milliseconds apart, on a black background. Flashes were accompanied by a variable number of beeps, each spaced 57 milliseconds apart. Observers were asked to judge how many visual flashes were presented on each trial. The trials were randomized and each stimulus combination was run five times on eight naive observers. Surprisingly, observers consistently and incorrectly reported seeing multiple flashes whenever a single flash was accompanied by more than one beep [53]. This experiment is known as sound-induced flash illusion. A follow-up experiment investigated whether the illusory flashes could be perceived independently at different spatial locations [26]. Two bars were displayed at two locations, creating an apparent motion. All subjects reported that an illusory bar was perceived with the second beep at a location between the real bars. This is analogous to the cutaneous rabbit perceptual illusion, where trains of successive cutaneous pulses delivered at a few widely separated locations produce sensations at many in-between points [19]. As a matter of fact, perception of time, wherein auditory estimates are typically more accurate, is dominated by hearing.
Another experiment explored whether two objects appear to bounce of each other or simply cross, if observers hear a beep when the objects could be in contact. In this particular case, a desktop computer displayed two identical objects moving towards each others. The display was ambiguous to provide two different interpretations after the objects met: They could either bounce off each other or cross. Since collisions usually produce a characteristic impact sound, introducing such sound when objects met promoted the perception of bouncing versus crossing. This experiment is usually known as motion-bounce illusion [51]. In a subsequent study, Sekuler and Sekuler found that any transient sound temporally aligned with the would-be collision increased the likelihood of a bounce percept [50]. This includes a pause, a flash of light on the screen, or a sudden disappearance of the discs. Auditory dominance has also been found in other examples with respect to time-based abilities such as precise temporal processing [47], temporal localization [5], and estimation of time durations [43]. Lipscomb and Kendall [34] provide another example of auditory dominance in a multimedia context (film). These researchers found that variation in participant semantic differential ratings was influenced more by the musical component than by the visual element. Particularly interesting in its implications for processing multisensory experiences is [22] pointing to the disappearance of visual dominancewhen a visual signal is presented simultaneously with an auditory and haptic signal (i.e., as a tri-sensory combination). The authors concluded that while vision can dominate both the auditory and the haptic sensory modalities, this is limited to bi-sensory combinations in which the visual signal is combined with another single stimulus.
More recent investigations examined the role of ecological auditory feedback in affecting multimodal perception of visual content. As an example, in a study presented in [15], the combined perceptual effect of visual and auditory information on the perception of a moving object’s trajectory was investigated. Inspired by the experimental paradigm presented in [27], the visual stimuli consisted of a perspective rendering of a ball moving in a three-dimensional box. Each video was paired with one of three sound conditions: Silence, the sound of a ball rolling, or the sound of a ball hitting the ground. It was found that the sound condition influenced whether observers were more likely to perceive the ball as rolling back in depth on the floor of the box or jumping in the frontal plane.
Another interesting study related to the role of auditory cues in the perception of visual stimuli is the one presented in [60]. Two psychophysical studies were conducted to test whether visual sensitivity to point-light depictions of human gait reflects the action specific co-occurrence of visual and auditory cues typically produced by walking people. To perform the experiment, visual walking patterns were captured using a motion capturesystem, and a between-subject experimental procedure was adopted. Specifically, subjects were randomly exposed to one of the three experimental conditions: No sound, footstep sounds, or a pure tone at 1000 Hz, which represented a control case. Visual sensitivity to coherent human gait was measured by asking subjects if they could detect a person walking or not. Such sensitivity was greatest in the presence of temporally coincident and action-consistent sounds, in this case, the sound of footsteps. Visual sensitivity to human gait with coincident sounds that were not action-consistent, in this case the pure tone, was significantly lower and did not significantly differ from visual sensitivity to gaits presented without sound.
As an additional interaction between audition and vision, sound can help the user search for an object within a cluttered, continuously changing environment. It has been shown that a simple auditory pip drastically decreases search times for a synchronized visual object that is normally very difficult to find. This is known as the pip and pop effect [62]. Visual feedback can also affect several aspects of a musical performance, although in this chapter affective and emotional aspects of a musical performance are not considered. As an example, Schutz and Lipscomb report an audio-visual illusion in which an expert musician’s gestures affect the perceived duration of a note without changing its acoustic length [49]. To demonstrate this, they recorded a world-renowned marimba player performing single notes on a marimba using long and short gestures. They paired both types of sounds with both types of gestures, resulting in a combination of natural (i.e., congruent gesture-note pairs) and hybrid (i.e., incongruent gesture-note pairs) stimuli. They informed participants that some auditory and visual components had been mismatched, and asked them to judge tone duration based on the auditory component alone. Despite these instructions, the participants’ duration ratings were strongly influenced by visual gesture information. As a matter of fact, notes were rated as longer when paired with long gestures than when paired with short gestures. These results are somehow puzzling, since they contradict the view that judgments of tone duration are relatively immune from visual influence [64], that is, in temporal tasks visual influence on audition is negligible. However, the results are not based on information quality, but rather on perceived causality, given that visual influence in this paradigm is dependent on the presence of an ecologically plausible audiovisual relationship.
Indeed, it is also possible to consider the characteristics of vision and audition to predict which modality will prevail when conflicting information is provided. In this direction, [31] introduced the notion of auditory and visual objects. They describe the different characteristics of audition and vision, claiming that a primary source of information for vision is a surface, while a secondary source of information is the location and colour of sources. On the other hand, a primary source of information for audition is a source and a secondary source of information is a surface.
In [16], a theory is suggested on how our brain merges the different sources of information coming from the different modalities, specifically audition, vision, and touch. The first is what is called sensory combination, which means the maximization of information delivered from the different sensory modalities. The second strategy is called sensory integration, which means the reduction of variance in the sensory estimate to increase its reliability. Sensory combination describes interactions between sensory signals that are not redundant. By contrast, sensory integration describes interactions between redundant signals. Ernst and coworkers [16] describe the integration of sensory information as a bottom-up process.
The modality precision, also called modality appropriateness hypothesis, by [64], is often cited when trying to explain which modality dominates under what circumstances. This hypothesis states that discrepancies are always resolved in favour of the more precise or more appropriate modality. In spatial tasks, for example, the visual modality usually dominates, because it is the most precise at determining spatial information. However, according to [16], this terminology is misleading because it is not the modality itself or the stimulus that dominates. Rather, the dominance is determined by the estimate and how reliably it can be derived within a specific modality from a given stimulus.
A major design dilemma involves the extent to which audio interfaces should maintain the conventions of visual interfaces [40]. Indeed, most attempts at auditory display seek to emulate or translate elements of visual interfaces to the auditory modality. While retrofitting visual interfaces with sound can offer some consistencies across modalities, the constraints of this approach may hinder the design of auditory interfaces. While visual objects exist primarily in space, auditory stimuli occur in time. A more appropriate approach to auditory interface design, therefore, may require designers to focus more strictly on auditory capabilities. Such interfaces may present the items and objects of the interface in a fast, linear fashion over time rather than attempting to provide auditory versions of the spatial relationships found in visual interfaces.

10.3 Embodied Interactions

The experiments described until now assume a passive observer, in the sense that a subject is exposed to a fixed sequence of audiovisual stimuli and is asked to report on the resulting perceptual experience. When a subject is interacting with the stimuli provided, a tight sensory motor coupling is enabled, that is an important characteristic of embodied perception. According to embodimenttheory, a person and the environment form a pair in which the two parts are coupled and determine each other. The term embodied highlights two points: First, cognition depends upon the kinds of experience that are generated from specific sensorimotor capacities. Second, these individual sensorimotor capacities are themselves embedded in a biological, psychological, and cultural context [14].
The notion of embodied interaction is based on the view that meanings are present in the actions that people engage in while interacting with objects, with other people, and with the environment in general. Embodied interfaces try to exploit the phenomenological attitude of looking at the direct experience, and let the meanings and structures emerge as experienced phenomena. Embodiment is not a property of artefacts but rather a property of how actions are performed with or through the artefacts.
The central role of our body in perception, cognition and interaction, has been previously addressed by philosophers (e.g., [39]), psychologists (e.g., [41]) and neuroscientists (e.g., [10]). A rather recent approach to the understanding of the design process, especially in its early stages, has been to focus on the role of multimodality and the contribution of non-verbal channels as key means of communication, kinaesthetic thinking, and more generally of doing design [59]. Audio-haptic interactions, described in the following section, also require a continuous action-feedback loop between a person and the environment, an important characteristic of embodied perception. Another approach, called embodied sound design, proposes to place the bodily experience (i.e., communication of sonic concepts through vocal and gestural imitations) at the centre of the sound creation process [12].
The role of the body in HCI has overall recently gained more attention, and interested readers can refer to the book by Hook [23] and to Chap. 7 in this volume.

10.4 Audio-Haptic Interactions

Although the investigation of audio-haptic interactions has not received as much attention as audiovisual interactions, it is certainly an interesting field of research, especially considering the tight connections existing between the sense of touch and audition. As a matter of fact, both audition and touch are sensitive to the very same kind of physical property, that is, mechanical pressure in the form of oscillations. The tight correlation between the information content (oscillatory patterns) being conveyed in the two senses can potentially support interactions of an integrative nature at a variety of levels along the sensory pathways. Auditory cues are normally elicited when one touches everyday objects, and these sounds often convey useful informational regarding the nature of the objects [18]. The feeling of skin dryness or moistness that arises when we rub our hands against each other is subjectively referred to the friction forces at the epidermis. Yet, it has been demonstrated that acoustic information also participates in this bodily sensation, because altering the sound arising from the hand rubbing action changes our sensation of dryness or moistness at the skin. This phenomenon is known as the parchment-skin illusion [25].
The parchment-skin illusion is an example of how interactive auditory feedback can affect subjects’ haptic sensation. Specifically, in the experiment demonstrating the rubber-skin illusion, subjects were asked to sit with a microphone close to their hands, and then to rub their hands against each other. The sound of hands rubbing was captured by a microphone; they were then manipulated in real time, and played back through headphones. The sound was modified by attenuating the overall amplitude and by amplifying the high frequencies. Subjects were asked to rate the haptic sensation in their palms as a function of the different auditory cues provided, in a scale ranging from very moist to very dry. Results show that the provided auditory feedback significantly affected the perception of the skin’s dryness. This study was extended in [20], by using a more rigorous psychophysical testing procedure. Results reported a similar increase in smooth-dry scale correlated to changes in auditory feedback, but not in the roughness judgments per se. However, both studies provide convincing empirical evidence demonstrating the modulatory effect of auditory cues on people’s haptic perception of a variety of different surfaces. A similar experiment was performed combining auditory cues with haptic cues at the tongue. Specifically, subjects were asked to chew on potato chips, and the sound produced was again captured and manipulated in real time. Results show that the perception of potato chips’ crispness was affected by the auditory feedback provided [56]. A surprising audio-haptic bodily illusion that demonstrates human observers rapidly update their assumptions about the material qualities of their body is the marble hand illusion [52]. By repeatedly gently hitting participants’ hand while progressively replacing the natural sound of the hammer against the skin with the sound of a hammer hitting a piece of marble, it was possible to induce an illusory misperception of the material properties of the hand. After 5 min, the hand started feeling stiffer, heavier, harder, less sensitive, and unnatural, and showed enhanced galvanic skin response to threatening stimuli. This bodily illusion demonstrates that the experience of the material of our body can be quickly updated through multisensory integration. Another interesting example where sounds again affect body perception is shown in [58]. Here, the illusion is applied to footstep sounds. By digitally varying sounds produced by walking, it is possible to vary one’s perception of weight.
Lately, artificial cues are appearing in audiohaptic interfaces, allowing us to carefully control the variations to the provided feedback and the resulting perceived effects on exposed subjects [13, 42, 61]. Artificial auditory cues have also been used in the context of sensory substitution, for artificial sensibility at the hands using hearing as a replacement for loss of sensation [35]. In this particular study, microphones placed at the fingertips captured and amplified the friction sound obtained when rubbing hard surfaces.
In [28], a nice investigation on the interaction between auditory and haptic cues in the near space is presented. The authors show an interesting illusion of how sounds delivered through headphones, presented near to the head induces an haptic experience. The left ear of a dummy head was stroked with a paintbrush and the sound was recorded. The sound was then presented to the participants who felt a tickling sensation when the sound was presented near to the head, but not when it was presented distant from the head.
Another kind of dynamic sonic objecthood is that obtained through data physicalization, which is the 3D rendering of a dataset in the form of a solid physical object. Although there is a long history of physicalization, this area of research has become increasingly interesting through the facilitation of 3D printing technology. Physicalizations allow the user to hold and manipulate a dataset in their hands, providing an embodied experience that allows rich naturalistic and intuitive interactions such as multi-finger touch, tapping, pressing, squeezing, scraping, and rotating [36].
Physical manipulation produces acoustic effects that are influenced by the material properties, shape, forces, modes of interaction and events over time. The idea that sound could be a way to augment data physicalization has been explored through acoustic sonifications in which the 3D printed dataset is super-imposed on the form of a sounding object, such as a bell or a singing bowl [1]. Since acoustic vibrations are strongly influenced by 3D form, the sound that is produced is influenced by the dataset that is used to shape the sounding object. On a similar vein, the design of musical instruments has also inspired the design of new interfaces for human-computer interaction. As stated by Jaron Lanier, musical instruments are the best user interfaces (see [1]), and we can learn to design new interfaces by looking at musical instruments. An example is the work of [32], where structural elements along the speaker-microphone pathway characteristically alter the acoustic output. Moreover, Chap. 12 proposes several case studies in the context of musical haptics.
In designing multimodal environments, several elements need to be taken into consideration. However, technology imposes some limitations, especially when the ultimate goal is to simulate systems that react in realtime. This issue is nicely addressed by Pai, who describes a tradeoff between accuracy and responsiveness, a crucial difference between models for science and models for interaction (see [44]). Specifically, computations about the physical world are always approximations. In general, it is possible to improve accuracy by constructing more detailed models and performing more precise measurements, but this increased accuracy comes at the cost of latency, i.e., the elapsed time before an answer is obtained. For multisensory models, it is also essential to ensure synchronization of time between different sensory modalities [44]. groups all of these temporal considerations, such as latency and synchronization, into a single category called responsiveness. The question then becomes how to balance accuracy and responsiveness. The choice between accuracy and responsiveness depends also on the final goal of the multimodal system design. Often, scientists are more concerned with accuracy, so responsiveness is only a soft constraint based on available resources. On the other hand, for interaction designers, responsiveness is an essential parameter that must be satisfied.

10.5 Conclusions

This chapter has provided an overview of several experiments whose goals were to achieve a better understanding of how the human auditory system is connected to visual and haptic channels. A better understanding of multimodal perception can have several applications. As an example, systems based on sensory substitution help people lacking a certain sensorial modality by replacing it with another sensorial modality. Moreover, cross-modal enhancement allows reduced stimuli in one sensorial modality to be augmented by a stronger stimulation in another modality.
Contemporary advances in hardware and software technology allow us to experiment in several ways with technologies for multimodal interaction design, building for example, haptic illusions with equipment available in a typical hardware store [21] or easily experimenting with sketching and rapid prototyping [6, 11]. These advances in technology create several possibilities for discovering novel cross-modal illusions and interactions between the senses, especially when a collaboration between cognitive psychologists and interaction designers is facilitated. A research challenge is not only to understand how humans process information coming from different senses, but also how information in a multimodal system should be distributed to different modalities in order to obtain the best user experience.
As an example, in a multi-modal system such as a system for controlling an haptic display, seeing a visual display and listening to interactive auditory display, it is important to determine which synchronicities are more important. At one extreme, a completely disjointed distribution of information over several modalities can offer the highest bandwidth, but the user may be confused in connecting the modalities and one modality might mask another and distract the user by focusing attention on events that might not be important. At the other extreme, a completely redundant distribution of information is known to increase the cognitive load and is not guaranteed to increase user performance.
Beyond the research on multimodal stimuli processing, studies are needed on the processing of multimodal stimuli that are connected via interaction. We would expect that the human brain and sensory system have been optimized to cope with a certain mixture of redundant information, and that information displays are better the more they follow this natural distribution. Overall, the more we achieve a better understanding of the ways humans interact with the everyday world, the more we can obtain inspiration for the design of effective natural multimodal interfaces.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Literature
1.
go back to reference Barrass, S.: Diagnosing blood pressure with Acoustic Sonification singing bowls. International Journal of Human-Computer Studies 85, 68–71 (2016). Barrass, S.: Diagnosing blood pressure with Acoustic Sonification singing bowls. International Journal of Human-Computer Studies 85, 68–71 (2016).
2.
go back to reference Biocca, F., Kim, J., Choi, Y.: Visual touch in virtual environments: An exploratory study of presence, multimodal interfaces, and cross-modal sensory illusions. Presence: Teleoperators & Virtual Environments 10, 247–265 (2001). Biocca, F., Kim, J., Choi, Y.: Visual touch in virtual environments: An exploratory study of presence, multimodal interfaces, and cross-modal sensory illusions. Presence: Teleoperators & Virtual Environments 10, 247–265 (2001).
3.
go back to reference Bregman, A. S.: Auditory scene analysis: The perceptual organization of sound (MIT press, 1994). Bregman, A. S.: Auditory scene analysis: The perceptual organization of sound (MIT press, 1994).
4.
go back to reference Bruce, V., Green, P. R., Georgeson, M. A.: Visual perception: Physiology, psychology, & ecology (Psychology Press, 2003). Bruce, V., Green, P. R., Georgeson, M. A.: Visual perception: Physiology, psychology, & ecology (Psychology Press, 2003).
5.
go back to reference Burr, D., Banks, M. S., Morrone, M. C.: Auditory dominance over vision in the perception of interval duration. Experimental Brain Research 198, 49 (2009). Burr, D., Banks, M. S., Morrone, M. C.: Auditory dominance over vision in the perception of interval duration. Experimental Brain Research 198, 49 (2009).
6.
go back to reference Buxton, B.: Sketching user experiences: getting the design right and the right design (Morgan kaufmann, 2010). Buxton, B.: Sketching user experiences: getting the design right and the right design (Morgan kaufmann, 2010).
7.
go back to reference Cherry, E. C.: Some experiments on the recognition of speech, with one and with two ears. The Journal of the acoustical society of America 25, 975–979 (1953). Cherry, E. C.: Some experiments on the recognition of speech, with one and with two ears. The Journal of the acoustical society of America 25, 975–979 (1953).
8.
go back to reference Colavita, F. B.: Human sensory dominance. Perception & Psychophysics 16, 409–412 (1974). Colavita, F. B.: Human sensory dominance. Perception & Psychophysics 16, 409–412 (1974).
10.
go back to reference Damasio, A. R.: Descartes’ error (Random House, 2006). Damasio, A. R.: Descartes’ error (Random House, 2006).
11.
go back to reference Delle Monache, S., Polotti, P., Rocchesso, D.: A toolkit for explorations in sonic interaction design in Proc. Int. Conf. Audio Mostly (AM2010) (Piteå, 2010), 1–7. Delle Monache, S., Polotti, P., Rocchesso, D.: A toolkit for explorations in sonic interaction design in Proc. Int. Conf. Audio Mostly (AM2010) (Piteå, 2010), 1–7.
12.
go back to reference Delle Monache, S. et al.: Embodied sound design. International Journal of Human-Computer Studies 118, 47–59 (2018). Delle Monache, S. et al.: Embodied sound design. International Journal of Human-Computer Studies 118, 47–59 (2018).
13.
go back to reference DiFilippo, D., Pai, D. K.: The AHI: An audio and haptic interface for contact interactions in Proceedings of the 13th annual ACM symposium on User interface software and technology (2000), 149–158. DiFilippo, D., Pai, D. K.: The AHI: An audio and haptic interface for contact interactions in Proceedings of the 13th annual ACM symposium on User interface software and technology (2000), 149–158.
14.
go back to reference Dourish, P.: Where the action is: the foundations of embodied interaction (MIT press, 2004). Dourish, P.: Where the action is: the foundations of embodied interaction (MIT press, 2004).
15.
go back to reference Ecker, A. J., Heller, L. M.: Auditory Visual Interactions in the Perception of a Ball’s Path. Perception 34, 59–75 (2005). Ecker, A. J., Heller, L. M.: Auditory Visual Interactions in the Perception of a Ball’s Path. Perception 34, 59–75 (2005).
16.
go back to reference Ernst, M. O., Bülthoff, H. H.: Merging the senses into a robust percept. Trends in cognitive sciences 8, 162–169 (2004). Ernst, M. O., Bülthoff, H. H.: Merging the senses into a robust percept. Trends in cognitive sciences 8, 162–169 (2004).
17.
go back to reference Flowers, J. H., Buhman, D. C., Turnage, K. D.: Data sonification from the desktop: Should sound be part of standard data analysis software? ACM Transactions on Applied Perception (TAP) 2, 467–472 (2005). Flowers, J. H., Buhman, D. C., Turnage, K. D.: Data sonification from the desktop: Should sound be part of standard data analysis software? ACM Transactions on Applied Perception (TAP) 2, 467–472 (2005).
18.
go back to reference Gaver,W.: What in the world do we hear?: An ecological approach to auditory event perception. Ecological Psychology 5, 1–29 (1993). Gaver,W.: What in the world do we hear?: An ecological approach to auditory event perception. Ecological Psychology 5, 1–29 (1993).
19.
go back to reference Geldard, F. A., Sherrick, C. E.: The cutaneous" rabbit": A perceptual illusion. Science 178, 178–179 (1972). Geldard, F. A., Sherrick, C. E.: The cutaneous" rabbit": A perceptual illusion. Science 178, 178–179 (1972).
20.
go back to reference Guest, S., Catmur, C., Lloyd, D., Spence, C.: Audiotactile interactions in roughness perception. Experimental Brain Research 146, 161–171 (2002). Guest, S., Catmur, C., Lloyd, D., Spence, C.: Audiotactile interactions in roughness perception. Experimental Brain Research 146, 161–171 (2002).
21.
go back to reference Hayward, V.: A brief taxonomy of tactile illusions and demonstrations that can be done in a hardware store. Brain research bulletin 75, 742–752 (2008). Hayward, V.: A brief taxonomy of tactile illusions and demonstrations that can be done in a hardware store. Brain research bulletin 75, 742–752 (2008).
22.
go back to reference Hecht, D., Reiner, M.: Sensory dominance in combinations of audio, visual and haptic stimuli. Experimental brain research 193, 307–314 (2009). Hecht, D., Reiner, M.: Sensory dominance in combinations of audio, visual and haptic stimuli. Experimental brain research 193, 307–314 (2009).
23.
go back to reference Höök, K.: Designing with the body: Somaesthetic interaction design (MIT Press, 2018). Höök, K.: Designing with the body: Somaesthetic interaction design (MIT Press, 2018).
24.
go back to reference Jack, C. E., Thurlow,W. R.: Effects of degree of visual association and angle of displacement on the ?ventriloquism? effect. Perceptual and motor skills 37, 967–979 (1973). Jack, C. E., Thurlow,W. R.: Effects of degree of visual association and angle of displacement on the ?ventriloquism? effect. Perceptual and motor skills 37, 967–979 (1973).
25.
go back to reference Jousmäki, V., Hari, R.: Parchment-skin illusion: sound-biased touch. Current biology 8, R190–R191 (1998). Jousmäki, V., Hari, R.: Parchment-skin illusion: sound-biased touch. Current biology 8, R190–R191 (1998).
26.
go back to reference Kamitani, Y., Shimojo, S.: Sound-induced visual ?rabbit? Journal of vision 1, 478–478 (2001). Kamitani, Y., Shimojo, S.: Sound-induced visual ?rabbit? Journal of vision 1, 478–478 (2001).
27.
go back to reference Kersten, D., Mamassian, P., Knill, D. C.: Moving cast shadows induce apparent motion in depth. Perception 26, 171–192 (1997). Kersten, D., Mamassian, P., Knill, D. C.: Moving cast shadows induce apparent motion in depth. Perception 26, 171–192 (1997).
28.
go back to reference Kitagawa, N., Zampini, M., Spence, C.: Audiotactile interactions in near and far space. Experimental Brain Research 166, 528–537 (2005). Kitagawa, N., Zampini, M., Spence, C.: Audiotactile interactions in near and far space. Experimental Brain Research 166, 528–537 (2005).
29.
go back to reference Kohlrausch, A., van de Par, S.: Auditory-visual interaction: from fundamental research in cognitive psychology to (possible) applications in Human Vision and Electronic Imaging IV 3644 (1999), 34–44. Kohlrausch, A., van de Par, S.: Auditory-visual interaction: from fundamental research in cognitive psychology to (possible) applications in Human Vision and Electronic Imaging IV 3644 (1999), 34–44.
30.
go back to reference Krumbholz, K., Patterson, R., Seither-Preisler, A., Lammertmann, C., Lütkenhöner, B.: Neuromagnetic evidence for a pitch processing center in Heschl?s gyrus. Cerebral Cortex 13, 765–772 (2003). Krumbholz, K., Patterson, R., Seither-Preisler, A., Lammertmann, C., Lütkenhöner, B.: Neuromagnetic evidence for a pitch processing center in Heschl?s gyrus. Cerebral Cortex 13, 765–772 (2003).
31.
go back to reference Kubovy, M., Van Valkenburg, D.: Auditory and visual objects. Cognition 80, 97–126 (2001). Kubovy, M., Van Valkenburg, D.: Auditory and visual objects. Cognition 80, 97–126 (2001).
32.
go back to reference Laput, G., Brockmeyer, E., Hudson, S. E., Harrison, C.: Acoustruments: Passive, acoustically-driven, interactive controls for handheld devices in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (2015), 2161–2170. Laput, G., Brockmeyer, E., Hudson, S. E., Harrison, C.: Acoustruments: Passive, acoustically-driven, interactive controls for handheld devices in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (2015), 2161–2170.
33.
go back to reference Li, T., Wang, D., Peng, C., Yu, C., Zhang, Y.: Speed-accuracy tradeoff of fingertip force control with visual/audio/haptic feedback. International Journal of Human-Computer Studies 110, 33–44 (2018). Li, T., Wang, D., Peng, C., Yu, C., Zhang, Y.: Speed-accuracy tradeoff of fingertip force control with visual/audio/haptic feedback. International Journal of Human-Computer Studies 110, 33–44 (2018).
34.
go back to reference Lipscomb, S. D., Kendall, R. A.: Perceptual judgement of the relationship between musical and visual components in film. Psychomusicology: A Journal of Research in Music Cognition 13, 60 (1994). Lipscomb, S. D., Kendall, R. A.: Perceptual judgement of the relationship between musical and visual components in film. Psychomusicology: A Journal of Research in Music Cognition 13, 60 (1994).
35.
go back to reference Lundborg, G., Rosén, B., Lindberg, S.: Hearing as substitution for sensation: a new principle for artificial sensibility. The Journal of hand surgery 24, 219–224 (1999). Lundborg, G., Rosén, B., Lindberg, S.: Hearing as substitution for sensation: a new principle for artificial sensibility. The Journal of hand surgery 24, 219–224 (1999).
36.
go back to reference Lupton, D.: Feeling your data: Touch and making sense of personal digital data. New Media & Society 19, 1599–1614 (2017). Lupton, D.: Feeling your data: Touch and making sense of personal digital data. New Media & Society 19, 1599–1614 (2017).
37.
go back to reference Mcguire, J. M., Scott, S. S., Shaw, S. F.: Universal design and its applications in educational environments. Remedial and special education 27, 166–175 (2006). Mcguire, J. M., Scott, S. S., Shaw, S. F.: Universal design and its applications in educational environments. Remedial and special education 27, 166–175 (2006).
38.
go back to reference McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746 (1976). McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746 (1976).
39.
go back to reference Merleau-Ponty, M.: Phenomenology of perception Routledge. UK.[France, 1945] (1962). Merleau-Ponty, M.: Phenomenology of perception Routledge. UK.[France, 1945] (1962).
40.
go back to reference Mynatt, E. D., Edwards, W. K.: Mapping GUIs to auditory interfaces in Proceedings of the 5th annual ACM symposium on User interface software and technology (1992), 61–70. Mynatt, E. D., Edwards, W. K.: Mapping GUIs to auditory interfaces in Proceedings of the 5th annual ACM symposium on User interface software and technology (1992), 61–70.
41.
go back to reference Niedenthal, P. M., Barsalou, L. W., Winkielman, P., Krauth-Gruber, S., Ric, F.: Embodiment in attitudes, social perception, and emotion. Personality and social psychology review 9, 184–211 (2005). Niedenthal, P. M., Barsalou, L. W., Winkielman, P., Krauth-Gruber, S., Ric, F.: Embodiment in attitudes, social perception, and emotion. Personality and social psychology review 9, 184–211 (2005).
42.
go back to reference Nordahl, R. et al.: Preliminary experiment combining virtual reality haptic shoes and audio synthesis in International Conference on Human Haptic Sensing and Touch Enabled Computer Applications (2010), 123–129. Nordahl, R. et al.: Preliminary experiment combining virtual reality haptic shoes and audio synthesis in International Conference on Human Haptic Sensing and Touch Enabled Computer Applications (2010), 123–129.
43.
go back to reference Ortega, L., Guzman-Martinez, E., Grabowecky, M., Suzuki, S.: Auditory dominance in time perception. Journal of Vision 9, 1086–1086 (2009). Ortega, L., Guzman-Martinez, E., Grabowecky, M., Suzuki, S.: Auditory dominance in time perception. Journal of Vision 9, 1086–1086 (2009).
44.
go back to reference Pai, D. K.: Multisensory interaction: Real and virtual in Robotics Research. The Eleventh International Symposium (2005), 489–498. Pai, D. K.: Multisensory interaction: Real and virtual in Robotics Research. The Eleventh International Symposium (2005), 489–498.
45.
go back to reference Posner, M. I., Nissen, M. J., Klein, R. M.: Visual dominance: an informationprocessing account of its origins and significance. Psychological review 83, 157 (1976). Posner, M. I., Nissen, M. J., Klein, R. M.: Visual dominance: an informationprocessing account of its origins and significance. Psychological review 83, 157 (1976).
46.
go back to reference Recarte, M. A., Nunes, L. M.: Mental workload while driving: effects on visual search, discrimination, and decision making. Journal of experimental psychology: Applied 9, 119 (2003). Recarte, M. A., Nunes, L. M.: Mental workload while driving: effects on visual search, discrimination, and decision making. Journal of experimental psychology: Applied 9, 119 (2003).
47.
go back to reference Repp, B. H., Penel, A.: Auditory dominance in temporal processing: new evidence from synchronization with simultaneous visual and auditory sequences. Journal of Experimental Psychology: Human Perception and Performance 28, 1085 (2002). Repp, B. H., Penel, A.: Auditory dominance in temporal processing: new evidence from synchronization with simultaneous visual and auditory sequences. Journal of Experimental Psychology: Human Perception and Performance 28, 1085 (2002).
48.
go back to reference Schmid, C., Büchel, C., Rose, M.: The neural basis of visual dominance in the context of audio-visual object processing. NeuroImage 55, 304–311 (2011). Schmid, C., Büchel, C., Rose, M.: The neural basis of visual dominance in the context of audio-visual object processing. NeuroImage 55, 304–311 (2011).
49.
go back to reference Schutz, M., Lipscomb, S.: Hearing gestures, seeing music: Vision influences perceived tone duration. Perception 36, 888–897 (2007). Schutz, M., Lipscomb, S.: Hearing gestures, seeing music: Vision influences perceived tone duration. Perception 36, 888–897 (2007).
50.
go back to reference Sekuler, A. B., Sekuler, R.: Collisions between moving visual targets: what controls alternative ways of seeing an ambiguous display? Perception 28, 415–432 (1999). Sekuler, A. B., Sekuler, R.: Collisions between moving visual targets: what controls alternative ways of seeing an ambiguous display? Perception 28, 415–432 (1999).
51.
go back to reference Sekuler, R.: Sound alters visual motion perception. Nature 385, 308 (1997). Sekuler, R.: Sound alters visual motion perception. Nature 385, 308 (1997).
52.
go back to reference Senna, I., Maravita, A., Bolognini,N., Parise, C.V.: The marble-hand illusion. PloS one 9 (2014). Senna, I., Maravita, A., Bolognini,N., Parise, C.V.: The marble-hand illusion. PloS one 9 (2014).
53.
go back to reference Shams, L., Kamitani, Y., Shimojo, S.: Illusions: What you see is what you hear. Nature 408, 788 (2000). Shams, L., Kamitani, Y., Shimojo, S.: Illusions: What you see is what you hear. Nature 408, 788 (2000).
54.
go back to reference Shams, L., Kamitani, Y., Shimojo, S.: Visual illusion induced by sound. Cognitive Brain Research 14, 147–152 (2002). Shams, L., Kamitani, Y., Shimojo, S.: Visual illusion induced by sound. Cognitive Brain Research 14, 147–152 (2002).
55.
go back to reference Spence, C., Driver, J.: Audiovisual links in exogenous covert spatial orienting. Perception & psychophysics 59, 1–22 (1997). Spence, C., Driver, J.: Audiovisual links in exogenous covert spatial orienting. Perception & psychophysics 59, 1–22 (1997).
56.
go back to reference Spence, C., Zampini, M.: Auditory contributions to multisensory product perception. Acta Acustica united with Acustica 92, 1009–1025 (2006). Spence, C., Zampini, M.: Auditory contributions to multisensory product perception. Acta Acustica united with Acustica 92, 1009–1025 (2006).
57.
go back to reference Storms, R. L., Zyda, M. J.: Interactions in perceived quality of auditoryvisual displays. Presence: Teleoperators & Virtual Environments 9, 557–580 (2000). Storms, R. L., Zyda, M. J.: Interactions in perceived quality of auditoryvisual displays. Presence: Teleoperators & Virtual Environments 9, 557–580 (2000).
58.
go back to reference Tajadura-Jiménez, A. et al.: As light as your footsteps: altering walking sounds to change perceived body weight, emotional state and gait in Proc. ACM Conf. on Human Factors in Computing Systems (Seoul, Apr. 2015), 2943–2952. Tajadura-Jiménez, A. et al.: As light as your footsteps: altering walking sounds to change perceived body weight, emotional state and gait in Proc. ACM Conf. on Human Factors in Computing Systems (Seoul, Apr. 2015), 2943–2952.
59.
go back to reference Tholander, J., Karlgren, K., Ramberg, R., Sökjer, P.: Where all the interaction is: sketching in interaction design as an embodied practice in Proceedings of the 7th ACM conference on Designing interactive systems (2008), 445–454. Tholander, J., Karlgren, K., Ramberg, R., Sökjer, P.: Where all the interaction is: sketching in interaction design as an embodied practice in Proceedings of the 7th ACM conference on Designing interactive systems (2008), 445–454.
60.
go back to reference Thomas, J. P., Shiffrar, M.: I can see you better if I can hear you coming: Action-consistent sounds facilitate the visual detection of human gait. Journal of vision 10, 14–14 (2010). Thomas, J. P., Shiffrar, M.: I can see you better if I can hear you coming: Action-consistent sounds facilitate the visual detection of human gait. Journal of vision 10, 14–14 (2010).
61.
go back to reference Van den Doel, K., Pai, D. K.: The sounds of physical shapes. Presence 7, 382–395 (1998). Van den Doel, K., Pai, D. K.: The sounds of physical shapes. Presence 7, 382–395 (1998).
62.
go back to reference Van der Burg, E., Olivers, C. N., Bronkhorst, A. W., Theeuwes, J.: Pip and pop: nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance 34, 1053 (2008). Van der Burg, E., Olivers, C. N., Bronkhorst, A. W., Theeuwes, J.: Pip and pop: nonspatial auditory signals improve spatial visual search. Journal of Experimental Psychology: Human Perception and Performance 34, 1053 (2008).
63.
go back to reference Vi, C. T., Ablart, D., Gatti, E., Velasco, C., Obrist, M.: Not just seeing, but also feeling art: Mid-air haptic experiences integrated in a multisensory art exhibition. International Journal of Human-Computer Studies 108, 1–14 (2017). Vi, C. T., Ablart, D., Gatti, E., Velasco, C., Obrist, M.: Not just seeing, but also feeling art: Mid-air haptic experiences integrated in a multisensory art exhibition. International Journal of Human-Computer Studies 108, 1–14 (2017).
64.
go back to reference Welch, R. B., Warren, D. H.: Immediate perceptual response to intersensory discrepancy. Psychological bulletin 88, 638 (1980). Welch, R. B., Warren, D. H.: Immediate perceptual response to intersensory discrepancy. Psychological bulletin 88, 638 (1980).
Metadata
Title
Audio in Multisensory Interactions: From Experiments to Experiences
Author
Stefania Serafin
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-04021-4_10