Elsevier

Cognitive Brain Research

Volume 21, Issue 3, November 2004, Pages 351-359
Cognitive Brain Research

Research report
Changes in emotional tone and instrumental timbre are reflected by the mismatch negativity

https://doi.org/10.1016/j.cogbrainres.2004.06.009Get rights and content

Abstract

The present study examined whether or not the brain is capable to preattentively discriminate tones differing in emotional expression or instrumental timbre. In two event-related potential (ERP) experiments single tones (600 ms) were presented which had been rated as happy or sad in a pretest. In experiment 1, 12 non-musicians passively listened to tone series comprising a frequent (standard) single musical tone played by a violin in a certain pitch and with a certain emotional connotation (happy or sad). Among these standard tones deviant tones differing in emotional valence, either in instrumental timbre or in pitch were presented. All deviants generated mismatch negativity (MMN) responses. The MMN scalp topography was similar for all of the three deviants but latency was shorter for pitch deviants than for the other two conditions. The topography of the mismatch responses was indistinguishable. In a second experiment, subjects actively detected the deviant tones by button press. All detected deviants generated P3b waves at parietal leads. These results indicate that the brain is not only able to use simple physical differences such as pitch for rapid preattentive categorization but can also perform similar operations on the basis of more complex differences between tones of the same pitch such as instrumental timbre and the subtle timbral differences associated with different emotional expression. This rapid categorization may serve as a basis for the further fine-grained analysis of musical (and other) sounds with regard to their emotional content.

Introduction

In addition to their factual content, language and music often convey emotional information as well. In the speech domain, lesion studies indicate that the comprehension of the semantic content of an utterance and the understanding of affective prosody can be selectively impaired in the sense of a double dissociation [2]. In addition, it has been shown that affective prosody is independently processed from ”syntactic prosody” conveying information about the type of utterance (e.g., question, declarative sentence, or exclamation [14], although the exact neuroanatomical structures supporting the processing of affective and syntactic prosody are far from clear [8]. Animals, too, express emotions via distinct sounds [13], [21], [30] and the emotional state of a calling animal can be recognized by the specific acoustic structure of certain calls. The same acoustic features are used by different species to communicate emotions [34]. Studies in man aiming to link distinct vocal cues in spoken sentences to perceived emotions have revealed that the rating was mostly influenced by the mean level and the range of the fundamental frequency (F0) [36], [41], [49]. Low mean F0 was generally related to sadness and high mean F0 level to happiness. Increase of the F0 range was generally associated with high arousal.

In the music domain, a seminal series of experiments by Hevner [15], [16], [17] investigated which structural features contribute to the emotional expression conveyed by a piece of music. By systematically manipulating individual factors within the same musical pieces, she concluded that tempo and mode had the largest effects on listeners' judgements, followed by pitch level, harmony and rhythm [17]. In more recent work, Juslin [22] summarized the musical features supporting the impression of sadness (slow mean tempo, legato articulation, small articulation variability, low sound level, dull timbre, large timing variations, soft duration contrasts, slow tone attacks, flat micro-intonation, slow vibrato and final ritardando) and happiness (fast mean tempo, small tempo variability, staccato articulation, large articulation variability, fairly high sound level, little sound level variability, bright timbre, fast tone attacks, small timing variations, sharp duration contrasts and rising micro-intonation).

Many of these features describe changes in the structure of a musical sequence and it has been suggested that the emotional information transported by such suprasegmental features emerges as the result of a lifelong sociocultural conventionalization [43]. Recent studies show that listeners can accurately identify emotions in musical pieces from different cultures [1], however. In contrast, it has been suggested that the appraisal of segmental features [42], i.e., individual sounds or tones, is based on innate symbolic representations which have emerged from evolutionarily mechanisms for the evaluation of vocal expression [22], [42]. For opera singers, Rapoport [38], based on spectrogram analyses, has described seven factors that contribute to the emotional expression of single tones:

  • (1)

    onset of phonation (voicing);

  • (2)

    vibrato;

  • (3)

    excitation of higher harmonic partials;

  • (4)

    transition—a gradual pitch increase from the onset to the sustained stage;

  • (5)

    sforzando—an abrupt pitch increase at the very onset of the tone;

  • (6)

    pitch change within the tone; and

  • (7)

    unit pulse (a feature produced by the vocal cords).

Many of these features can be mimicked by string and wind instruments, while keyboard instruments are less versatile with respect to the modulation of individual tones.

The variations induced in single tones of the same pitch fall within the realm of timbre. Timbre refers to the different quality of sounds in the absence of differences in pitch, loudness and duration. The classical view of timbre, dating back to von Helmholtz [48], holds that different timbres result from different distributions of amplitudes of the harmonic components of a complex tone in a steady state. More recent studies show that timbre also involves more dynamic features of the sound [9], [12], particularly with regard to onset characteristics. Timbre has been mostly studied with regard to the recognition of different musical instruments [9], [10], [11], [12], [27] and multidimensional scaling techniques have revealed that timbre is determined by variations along three dimensions termed attack time, spectral centroid, and spectral flux [27].

Clearly, the timbral variations within a single instrument that are used to transmit emotional expressions are different and are likely smaller than those that are present between instruments. The present study therefore asks whether the brain mechanisms of detecting the timbral variation between notes of different emotional expression played by the same instrument are similar to or different from the variations between instruments playing the same note with the same emotional expression.

Given the importance of emotions for survival, we assumed that the brain may accomplish a fast and probably automatic check [40] on every incoming stimulus with regard to the properties correlated with emotional expression. In the present investigation, we used musical stimuli as a tool to demonstrate the existence of such a fast and automatic checking procedure by employing a mismatch negativity paradigm.

In order to address the early, automatic stages of sound evaluation, the mismatch negativity (MMN) is an ideal tool [32], [33], [35]. The MMN is a component of the auditory event-related potential (ERP) which is elicited during passive listening by an infrequent change in a repetitive series of sounds. It occurs in response to any stimulus which is physically deviant (in frequency, duration or intensity) to the standard tone. It has also been demonstrated that the MMN is sensitive to changes in the spectral component of tonal timbre [44]. Toiviainen et al. [46] have shown that the amplitude of the MMN obtained for different timbre deviants corresponded to the distance metric obtained in an artificial neural network trained with a large set of instrumental sounds.

The onset latency of the MMN varies according to the nature of the stimulus deviance but for simple, physically deviant stimuli lies at approximately 150 ms. Previous studies have led to the assumption that the MMN reflects the mismatch resulting from a comparison between the physical features of the deviant and the standard stimulus [32]. This implies the existence of a neural sensory–memory trace representing the physical structure of the standard stimulus against which incoming auditory information can be compared. More recent studies (see [33], [35] for a review) have shown, however, that the MMN can also be obtained to deviations within complex series of sounds, suggesting that the memory trace is not only dependent on the physical characteristics of the stimuli but can also contain more abstract properties such as the order of stimuli.

The sensory analysis of the incoming stimulus as well as its encoding appears to take place automatically because the MMN typically occurs when the subjects do not attend to the eliciting stimuli and are involved in a different task like reading a book [32] or when they are sleeping [26].

The P300 is also evoked by infrequent deviant stimuli, but in contrast to the MMN, it is triggered most effectively when the deviant events are attended and task-relevant [6], [31], [47]. It is assumed that the P300 is not a unitary component but can be broken down to several subcomponents, one of which is termed P3b. The P3b occurs in response to task-relevant deviant stimuli within a stream of standard stimuli, a sequence known as oddball paradigm. The P3b displays a parietal distribution, the onset latency varies between 300 and 600 ms. Latency and amplitude of the P3b depend on the difficulty of the categorisation task as well as on the task-relevance of the stimulus [20], [24]. Thus, the P3b appears to reflect stimulus evaluation and stimulus categorisation processes. It has further been suggested that the underlying processes serve the updating of working memory [7] although not everyone agrees on this interpretation [47].

In the current study, two experiments were conducted to assess whether the emotional expression of a single tone allows for attentive as well as preattentive categorization. For that purpose, a standard violin tone of a certain emotional valence (e.g., happy) was presented repeatedly, infrequently interspersed with a tone that deviated from the standard according to its emotional expression (e.g., sad). In addition to this emotional deviant, a tone which differed from the standard tone in pitch level (pitch deviant) and a tone which was played by a flute instead of a violin and therefore differed from the standard stimulus according to instrumental timbre (instr. deviant) were introduced as control stimuli. In experiment 1 (Exp. 1), subjects watched a video and were asked to ignore the sounds (passive condition). In experiment 2 (Exp. 2), a modified oddball paradigm was conducted with subjects required to react to any of the three deviant stimulus types by pressing a button (active condition).

Section snippets

Subjects

Twelve non-musicians participated in the experiment (11 women, 20 to 36 years of age, mean=26). All participants were right-handed, neurologically healthy and had normal hearing.

Stimuli

Two sets of four different tones were used. Each set consisted of one standard tone and three different deviant tones. All tones were played by a violinist and a flutist, digitally recorded, and edited to equal length (600 ms) and sound level (65 dB) using cool edit. These edited tones were rated by 10 naive listeners

Passive condition

Fig. 2, left, shows the grand average waveforms for all three deviant types at three scalp positions (Fz, Cz, Pz). Note that the results from the two blocks, using the happy and the sad violin tone as standard stimuli respectively, are given in separate columns. The waveforms show an initial small negative deflection (N1) at around 100 ms. This is followed by a long-duration negative component with a frontal maximum and a peak around 400 to 500 ms (Fig. 3).1

Discussion

In this study, we used the high temporal resolution of electrophysiological measures to estimate the relative time courses of the brain's response to tones that differed from a standard tone by their emotional expression, by the timbre of the instrument used and by their pitch. The results demonstrate that affective deviants evoke a mismatch response even when subjects do not attend the auditory stimuli akin to the mismatch negativity that was seen for pitch and instrumental deviants. While the

Acknowledgements

We thank Dana Heinze and Monique Lamers for their help during recording and analysis of the data. Supported by grants of the DFG to EA and TFM.

References (49)

  • M. Davis

    The role of the amygdala in fear and anxiety

    Annu. Rev. Neurosci.

    (1992)
  • E. Donchin

    Surprise?…Surprise!

    Psychophysiology

    (1981)
  • E. Donchin et al.

    Is the P300 component a manifestation of context updating?

    Behav. Brain Sci.

    (1988)
  • J.M. Grey

    Multidimensional perceptual scaling of musical timbres

    J. Acoust. Soc. Am.

    (1977)
  • J.M. Grey

    Timbre discrimination in musical patterns

    J. Acoust. Soc. Am.

    (1978)
  • J.M. Grey et al.

    Perceptual effects of spectral modifications on musical timbres

    J. Acoust. Soc. Am.

    (1978)
  • J.M. Grey et al.

    Perceptual evaluation of synthetic music instrument tones

    J. Acoust. Soc. Am.

    (1977)
  • M.D. Hauser

    The Evolution of Communication

    (1997)
  • K. Heilman et al.

    Comprehension of affective and non-affective prosody

    Neurology

    (1984)
  • K. Hevner

    The affective character of the major and minor modes in music

    Am. J. Psychol.

    (1935)
  • K. Hevner

    Experimental studies of the elements of expression in music

    Am. J. Psychol.

    (1936)
  • K. Hevner

    The affective value of pitch and tempo in music

    Am. J. Psychol.

    (1937)
  • H. Huynh et al.

    Conditions under which mean square ratios in repeated measure designs have exact F-distributions

    J. Am. Stat. Assoc.

    (1980)
  • R. Johnson

    A triarchic model of P300 amplitude

    Psychophysiology

    (1986)
  • Cited by (67)

    • Early neural responses underlie advantages for consonance over dissonance

      2018, Neuropsychologia
      Citation Excerpt :

      Concurrently, changes in sequences of dissonant intervals elicited only a late MMN in musicians. The late MMN has been associated with the detection of changes in difficult tasks (Goydke et al., 2004), over complex auditory stimuli (Ceponiene et al., 1998) and under long-term memory conditions (Zachau et al., 2005). So, our results provide evidence at the neural level that detecting changes in dissonant sequences is a more demanding task than detecting changes in consonant sequences.

    • Laughter catches attention!

      2017, Biological Psychology
    • Is laughter a better vocal change detector than a growl?

      2017, Cortex
      Citation Excerpt :

      So how is an emotional (positive vs. negative valence) change in the voice detected when we do not focus attention to it? Our findings add to previous studies indicating that emotional voice processing is at least partially automatic (e.g., Liu et al., 2012; Schirmer et al., 2005), and point to a rapid categorization of emotionally relevant acoustic vocal properties that were not attended to (e.g., Goydke, Altenmüller, Möller, & Münte, 2004; Schirmer et al., 2005; Thonnessen et al., 2010). In addition, they align with accumulating evidence showing that rapid emotional salience detection happens within the first 200 msec after voice onset (linked to the fronto-centrally distributed P2 component in studies of explicit vocal emotional processing – e.g., Paulmann & Kotz, 2008; Pinheiro et al., 2013), and that discrete emotional categories may be distinguished from one another and from neutral sounds within these first 200 msec (e.g., Liu et al., 2012; Pinheiro et al., 2013, 2014).

    • Musical rhythm and pitch: A differential effect on auditory dynamics as revealed by the N1/MMN/P3a complex

      2017, Neuropsychologia
      Citation Excerpt :

      Research in this line has mostly focused on the MMN, which is seen as an index for sensory memory (Sabri et al., 2004) and perceptual accuracy (Tiitinen et al., 1994). It has shown more accurate perception of harmonically rich sounds compared to pure sinusoidal tones (Tervaniemi et al., 2000a), automatic detection of variations within complex sound features like timbre, emotional valence, and abstract relations between sounds (Göydke et al., 2004; Caclin et al., 2006; Saarinen et al., 1992), and enhanced neural detection of basic sound features (i.e. pitch) when embedded in a musical scale as compared to a monotonic pattern (Brattico et al., 2002). Additionally, MMN has been shown to vary within the auditory cortex as a function of informational content (phonetic vs. musical sounds) with a right lateralization for musical sounds (Tervaniemi et al., 1999).

    View all citing articles on Scopus
    View full text