1 Introduction
The perception of vibrations at the skin and sound are often coupled in real life, e.g., while playing an instrument or listening to music with low frequency content. In these cases, the physical stimuli which excite both modalities are usually highly correlated. If new multimodal systems are designed, sound and vibrations can be influenced separately. Just think of the auditory and vibrotactile feedback of a button on a touch screen, or vibrotactile feedback of electronic music instruments, or bimodal devices for guidance of blind persons. For example, the authors developed and optimized systems for multimodal reproduction of music [
64,
65,
67]. To this end, a vibration actuator was coupled to a surface in contact with the listener, e.g., an electrodynamic shaker mounted in a backpack, integrated in clothing or attached below a seat or floor. Audio reproduction was implemented with conventional loudspeakers or headphones. To generate appropriate music-related vibrations from the audio signal various signal processing approaches were compared. It was found that it is beneficial to consider the perceptual capabilities and limitations of both modalities in this design process. Therefore, knowledge of the fundamental characteristics of the auditory and vibrotactile sensory modalities was necessary. Many similarities can be found regarding psycho-physical characteristics, although the anatomy and physiology of both modalities are quite different. A good overview of the basic structure and functionality of the human hearing organ as well as the histology and physiology of the mechanoreceptive system including the neural processing in the somatosensory and auditory areas of the brain can be found in [
59,
86] and will not be described here.
The current survey aims to compare the sense of hearing and touch using data from psychophysical experiments. Special attention is given to the perception of vibrations in the frequency range where sound and vibration perception overlap: between a few Hertz and several hundred Hertz. The authors hope that this overview helps to design good auditory-tactile feedback that matches perceptually. This paper is based on the dissertation of the first author [
63]. Reproduction is kindly permitted by the Shaker Verlag, Germany.
The perception of sound has been studied for several decades. The basic physical attributes of sound (e.g., intensity, frequency or location of a sound source) have been correlated to perceptual attributes like loudness, pitch or distance. Different effects like adaptation to loud signals or masking characterize the auditory system. In contrast to our hearing, vibrations can be perceived at different parts of the body. Most vibrotactile studies focus on vibrations transmitted via hand and finger. However, the principal mechanoreceptors in the skin are similar at different body sites. In the overlapping frequency range of auditory and vibrotactile perception, vibrations are likely to stimulate mainly the Meissner and Pacinian mechanoreceptors which can be found all over the body [
86], however, with varying populations and surrounding tissue mechanics. Nevertheless, data from different body sites is used for a general comparison.
A common measurement unit for sound is the sound pressure level.
\(L_{\mathrm {SPL}}\). It is defined as the logarithmic ratio of the effective value of the sound pressure
p and has a reference value
\(p_{0} = 20\, \upmu \hbox { Pa}\):
$$\begin{aligned} L_{\mathrm {SPL}} = 20 \log \frac{p}{p_{0}} \mathrm {dB}. \end{aligned}$$
A similar unit for measuring vibrations is the acceleration level
\(L_{\mathrm {acc}}\). It is defined as the logarithmic ratio of the acceleration
a and has a reference value
\(a_{0} = 1 \upmu \hbox {m}/\hbox {s}^{2}\):
$$\begin{aligned} L_{\mathrm {acc}} = 20 \log \frac{a}{a_{0}} \mathrm {dB}. \end{aligned}$$
In contrast to sound pressure level, 0 dB acceleration level is not related to the perception threshold. Therefore, sensation level (the level above threshold) will be used to compare the auditory and vibrotactile modality directly. Please note that within this paper the term ‘vibrotactile’ will be sometimes abbreviated as ‘tactile’. However, the article will not discuss other types of tactile sensations (e.g., temperature).
4 Summary
In this paper, basic psychophysical abilities and limitations of the auditory and vibrotactile modality are discussed in a comparative manner. The validity of such comparisons could be questioned because of different methodologies used in the reviewed papers. Different researchers pursued different questions at different times with different test participants (number, gender, age, ...) and different equipment. However, general trends in the data can often be identified. If available, data from several studies are plotted on top of each other to check consistency. Sometimes not all available data are presented for reasons of clarity. Being aware of the variations between the compared studies, the authors believe that this comparison provides the background for the auditory-tactile design, e.g., of perceptually optimized human–machine interfaces or multimodal music applications. This sections summarizes the main similarities and differences between both modalities and discusses useful applications scenarios.
Both modalities show frequency dependent perception thresholds, but with different characteristics. When designing auditory-tactile feedback with the goal of equal intensity in both modalities, this disparity can be compensated by careful frequency equalization using the differences between the threshold curves. Compared to the sense of hearing, vibrotactile perception is restricted to low frequencies. At 20 Hz the usable amplitude range of both modalities is similar. However, with increasing frequency the auditory dynamic range increases rapidly, while the vibrotactile dynamic range seems to remain constant up to approximately 200 Hz. Compared to audition, the increase in perceived magnitude is steeper with increasing level in the vibrotactile domain, particularly at low sensation levels. If the target of a multimodal design is to match the perceived intensity of a stimuli in both modalities, e.g., for auditory-tactile button feedback of a touch screen, the dynamic range of one domain should be adapted, e.g., using a compressor for vibration processing.
Both modalities show severe impairment of sensitivity with increasing age. This effect has a similar tendency: it is stronger towards the upper frequency limit of each modality. However, around 250 Hz the age-induced threshold shift seems to be stronger for the sense of touch than for hearing. This is especially crucial in the context of auditory-tactile feedback design, since the vibrotactile dynamic range is considerably smaller than the auditory dynamic range. A vibrotactile threshold shift of 20 dB at 200 Hz almost halves the available amplitude range. In other words: vibrations which are strong for younger subjects, might not be perceived at all by the elderly. Again, dynamic compression in the tactile domain helps the designer to reduce this effect with the drawback of a decreased dynamic range. Because less impairment was reported in the vibrotactile domain below 40 Hz, it might be worth to consider this frequency range for a feedback design which is less dependent on age.
The auditory system is able to integrate energy over time for stimuli durations up to approximately 1 s. A similar temporal effect can be found in the vibrotactile system for sufficiently high frequencies and relatively large stimulation areas. In addition energy integration over space has been observed. From this it follows that the size of a vibrating contact area, e.g., the size of a vibrating smart watch, must be taken into account by the designer if the perceived intensities are to be matched in both modalities.
Both modalities show the ability of one stimulus to mask (or enhance) another. In comparison, in the vibrotactile modality broader masking patterns are excited around the masker frequency with strong masking towards higher frequencies. Also in time domain, the vibrotactile threshold is raised over a longer period around the duration of a masker. Strong masking in the vibrotactile modality suggests that, e.g., when designing a system for auditory-tactile music reproduction, it might suffice to reproduce the fundamental of a complex sound in the vibratory domain without changing the overall percept.
Temporary threshold shifts due to prolonged stimulation occur in both modalities. In audition high levels or long exposure times are necessary. In the vibrotactile domain, even small sensation levels result in a temporary threshold shifts, which, however, grows and recovers fast. This effect might be relevant for the designer in practical applications if strong background vibrations are present, e.g., at the steering wheel when driving a car.
The just noticable differences in level for sound and vibration seem to be remarkably similar at low frequencies. However, the difference limen of tactile frequency discrimination are much higher compared to audition. This is very important in the context of multimodal design, since frequency information is one of the fundamental components of audio signals, resulting in pitch perception. This perceptual feature is only available to a very limited extent in the tactile domain.
Gap detection thresholds for sinusoidal stimuli are comparable in the tactile and the auditory system. However, this seems not to be the case for noises and clicks. The influence of the sensation level on auditory and tactile temporal resolving power is remarkably similar. Additionally, the gap detection thresholds are in the millisecond range, indicating good temporal resolution for both modalities. Sound and vibrations are therefore equally suitable for reproducing temporal information via a user interface. However, depending on the application, the different temporal acuity with different reproduction intensities must be taken into account.
It is difficult to compare the localization ability in both modalities. Auditory events can be perceived everywhere around the listener, however, resolution is quite limited. The spatial resolution of somatosensation is generally more detailed, but tactile events are restricted to the proximity of the skin. However, it has been demonstrated that the projection of tactile events towards a sound source is possible. Sensory substitution systems for the hearing impaired use the good location discrimination of the tactile system to encode information, e.g. the frequency of a sound, in order to overcome shortcomings in tactile frequency perception.
This article focused on the independent absolute and differential sensitivities of both modalities. It is important to note, however, that many multimodal illusions exist that exploit features of our audio and tactile perceptual abilities, e.g., the auditory-tactile loudness illusion [
63]. A future article will explore these crossmodal interactions further.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.