Sound iconicity of abstract concepts: Place of articulation is implicitly associated with abstract concepts of size and social dominance

Jan Auracher

doi:10.1371/journal.pone.0187196

Abstract

The concept of sound iconicity implies that phonemes are intrinsically associated with non-acoustic phenomena, such as emotional expression, object size or shape, or other perceptual features. In this respect, sound iconicity is related to other forms of cross-modal associations in which stimuli from different sensory modalities are associated with each other due to the implicitly perceived correspondence of their primal features. One prominent example is the association between vowels, categorized according to their place of articulation, and size, with back vowels being associated with bigness and front vowels with smallness. However, to date the relative influence of perceptual and conceptual cognitive processing on this association is not clear. To bridge this gap, three experiments were conducted in which associations between nonsense words and pictures of animals or emotional body postures were tested. In these experiments participants had to infer the relation between visual stimuli and the notion of size from the content of the pictures, while directly perceivable features did not support–or even contradicted–the predicted association. Results show that implicit associations between articulatory-acoustic characteristics of phonemes and pictures are mainly influenced by semantic features, i.e., the content of a picture, whereas the influence of perceivable features, i.e., size or shape, is overridden. This suggests that abstract semantic concepts can function as an interface between different sensory modalities, facilitating cross-modal associations.

Citation: Auracher J (2017) Sound iconicity of abstract concepts: Place of articulation is implicitly associated with abstract concepts of size and social dominance. PLoS ONE 12(11): e0187196. https://doi.org/10.1371/journal.pone.0187196

Editor: Emmanuel Manalo, Kyoto University, JAPAN

Received: November 3, 2016; Accepted: October 16, 2017; Published: November 1, 2017

Copyright: © 2017 Jan Auracher. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by the Japanese Society for the Promotion of Science (JSPS, 科学研究費助成事業, research project No. 24520481, https://www.jsps.go.jp/english/); Researcher: Jan Auracher. Funding was used for technical equipment, software licenses and to pay allowance for participants.

Competing interests: The author has declared that no competing interests exist.

Introduction

How do we convey meaning in verbal interaction? In contrast to mainstream linguistics, the field of phonosemantics claims that phonemes themselves carry inherent semantic information. Within phonosemantics, sound iconicity is defined as an acoustic representation of non-acoustic phenomena in a non-arbitrary manner [1]. Sound iconicity has been referred to by various terms, most prominently by Hinton and colleagues, who use the term synaesthetic sound iconicity [2]. However, this term is misleading for two reasons: First, synaesthesia usually refers to a phenomenon that is restricted to specific individuals and, second, the attribute symbolic implies a relation between sound and meaning that is based on convention. By contrast, an extensive body of literature provides solid evidence suggesting that sound iconicity is truly iconic and universal insofar as it is based on perceived associations of articulatory-acoustic features of phonemes with non-acoustic characteristics, such as size, taste, or emotional valence, and as these associations have been found across languages and cultures (for reviews and discussions of the terminology, see [2, 3–8]).

Recent research emphasized the functionality of sound iconicity, facilitating, in particular, word learning and communication (for an overview see [9]). Studies, for example, found that Japanese words in which sound and meaning correspond with each other are easier to memorize for native speakers of English [10, 11] and of Dutch [12] when compared to words with an arbitrary relation between sound and meaning. Interestingly, the same seems to hold true for first language acquisition. Several studies report a significant higher likelihood for sound iconic words to be learned in an early stage of language acquisition than for non-iconic words [13–16]. Yet, another line of research has focused on the communicative role of emotional connotations elicited by sound iconicity. Shrum and colleagues [17], for example, examined how sound iconicity can be used in brand names to communicate (alleged) attributes of products (see also [18, 19]). Other studies have found evidence suggesting that sound iconicity is used in poetic language to convey the emotional tone of poems and song lyrics [1, 20–23].

Though there is a long tradition of behavioral, developmental, corpus-linguistic, and (more recently) neurocognitive studies on sound iconicity, the vast majority of these studies have been dedicated to the question of whether sound iconicity exists in the first place, and the focus has only recently shifted towards a better understanding of the cognitive processes underlying sound iconicity [8]. For example, EEG studies analyzing event-related potentials during the processing of words with or without sound iconicity have found evidence suggesting that sound-iconic words can cause a co-activation of otherwise modality-specific areas in the brain during early sensory processing [24–26]. On the other hand, studies applying brain imaging techniques (fMRI) have found increased activity in fields that had previously been related to multi-modal integration [27, 28]. However, while these studies are critical to the establishment of a better understanding of neurocognitive processes related to sound iconicity, they still leave pivotal questions unanswered. For one, there has been some–partly controversial–discussion about the role of abstract conceptual thinking in cross-modal associations [29–34]. The association between luminance and sound pitch, for example, could be due to a perceived correspondence at the level of physical characteristics (linking the intensity of light directly with the frequency of sound) or at the level of semantic connotations (e.g., mediated by a shared hedonic value) [35]. In fact, scholars such as Hornbostel have suggested the existence of an amodal concept of brightness that links sensation across different sensory modalities [36].

It is important to note, however, that perceptual and conceptual processing do not mutually exclude each other as sources of sound iconicity. On the contrary, it is likely that both forms are involved at different stages of the processing of sound-iconic words. However, the studies outlined above call into question which role sensory analogy and semantic analogy play between the acoustic appearance of a word and its meaning in sound-iconic associations. The aim of the current paper will be to compare the influence of directly perceivable features (such as size) with the influence of abstract, semantic features (such as content) on the association of visual stimuli with vowels, differing in their place of articulation.

One of the most-studied examples of sound iconicity is the relationship between articulatory characteristics of vowels and their association with size (for reviews see [3, 6, 37–39]). In an early study, Sapir [40] found that the association of nonsense syllables, consisting of consonant-vowel-consonant combinations, with either small or big objects is determined by the central vowel of the syllables. Newman [41], who conducted a follow-up study, concluded that specific acoustic and articulatory characteristics determine the association of vowels with size, claiming that vowels with high pitch and smaller vocal cavity are preferably associated with stimuli related to smallness, while vowels with low pitch and larger vocal cavity are preferably associated with stimuli related to bigness.

Later studies asked participants to assess vowels–single or in combinations with consonants–on a list of bi-polar items, such as big-small, fast-slow, bright-dark, or good-bad [42–46]. An extensive discussion of their results would clearly overstretch this introduction, however, three major findings shall be highlighted here. First, Greenberg and Jenkins [42], applying a paper-and-pencil test, found that the place of articulation of vowels correlates with their assessment on items referring to size (see also [43, 44, 45]). That is, vowels that are articulated toward the back of the vocal tract are more readily associated with attributes expressing bigness, and as articulation moves toward the front of the articulatory tract, association with ‘small’ and related attributes increases. Second, this association has been found not only for size but also for related concepts referring to physical or social dominance, such as strong-weak [42] or powerful-powerless [45]. And third: the relation between articulatory characteristics of phonemes and their association with size seems to be universal, as it has been found independent of participants’ native language [44, 45, 46]. In fact, studies examining a wide range of world languages found that front vowels occur with an unusual high frequency in words and morphemes that express smallness and associated categories [47, 48]. Moreover, Peña et al. [49] reported that even four-month-old infants show similar preferences to match the place of articulation of vowels with object size.

In an attempt to explain the association between place of articulation and size, it has been related to the so called size-pitch effect, according to which low-pitch sounds are associated with bigness and high-pitched sounds are associated with smallness [33, 39, 50]. This is insofar interesting as research suggests that lower pitch is not only associated with directly perceivable size but also with semantic stimuli (i.e., words) that denote size or related concepts, such as weight or thickness [51]. Moreover, Eitan and Timmers [52] reported that participants when asked tend to match low pitch to words that are related to bigness such as attributes like “thick, heavy, strong, and male” or objects like “crocodile”, while the opposite holds for high pitch. Finally, evidence from studies of animal behavior indicates that in many animal species, the voice pitch of males predicts their social status and mating success [53–55]. Similarly, studies of human verbal interaction revealed that voice pitch is a good indicator of a male’s attractiveness for females, as well as his–actual or alleged–physical or social dominance [56–62].

To sum up, there is sound evidence suggesting that pitch is not only related to size but more to an abstract concept of physical or social dominance. According to Ohala [63], this association of pitch has found its way into human language, causing a universal association that connects back vowels with bigness, strength, or high-dominance and front vowels with smallness, weakness, or low-dominance. Following this claim, one would expect that the articulatory place of vowels allows to convey meaning on an abstract conceptual level. Consequently, it was hypothesized that sound iconic associations can also be found with non-acoustic stimuli that refer to size or related concepts on a semantic level, e.g. through the content of a picture. In the following, three experiments designed to test this hypothesis will be outlined. The aim of these experiments was to show that sound iconic associations between the place of articulation and size are not (necessarily) triggered by directly perceivable properties, such as the size of a visual stimulus or the denotation of a word, but can also be triggered by abstract conceptual properties, such as the inferences about the size of a depicted object or the relation between behavior and social status.

Research objective and design

The objective of the current study was to investigate whether sound iconicity of magnitude enables the prediction of implicit associations of pseudo-words with visual stimuli depicting physical or social dominance. By pseudo-words, I mean consonant-vowel combinations that are phonologically possible but have no lexical denotation in a specific language. Following the frequency code hypothesis by Ohala [50, 63], I hypothesized that pictures depicting big, strong, aggressive or dominant objects or behavior would be associated with pseudo-words containing back vowels, while pictures depicting small, weak, fearful, or submissive objects or behavior would be associated with pseudo-words containing front vowels.

In the first experiment, visual stimuli consisted of pictures depicting either small or big animals, whereas the pictures themselves were of the same size. Thus, unlike previous experiments, in which visual stimuli directly embodied the notion of either bigness or smallness (e.g., through the size of a picture or the meaning of a word), in this experiment the relation between the stimuli and the concept of size relied on participants’ interpretation of stimulus content. The aim of this experiment, therefore, was to test the relation between pitch and size using stimuli that imply a difference in size without making it explicit. In the second experiment, the same pictures were used, however, this time the pictures depicting small animals were larger than the pictures depicting big animals. That is, perceptible features of the pictures were incongruent with their content. Thus, the second experiment targeted on the interaction between perceptible and semantic features in sound iconic associations. Assuming that sound iconicity is based on semantic connotations elicited by acoustic characteristics of phonemes, it was hypothesized that the size of the picture should show only a negligible effect on the association between phonemes and the content of the pictures. In the last experiment, pictures depicting either dominant or submissive body postures were used as visual stimuli. Thus, in this experiment it was tested whether the association between pitch and size can also be extended to social behavior. It was assumed that back vowels are associated with behavior perceived as dominant while, vice versa, front vowels are associated with behavior perceived as submissive. The three experiments are summarized in Fig 1.

Download:

Fig 1. Overview of experiments with examples of visual stimuli.

Visual stimuli in the left column were expected to be associated with front vowels; pictures in the right column were expected to be associated with back vowels.

https://doi.org/10.1371/journal.pone.0187196.g001

Experiment I

The first experiment was conducted to see whether the relationship between the place of articulation of vowels and size can also be found with semantic features that imply rather than depict or denote respective concepts. To this end, pseudo-words containing back or front vowels were tested for implicit associations with pictures of big and small animals. Like experiments 2 and 3, experiment 1 was conducted in Japan; thus, all instructions were written in Japanese.

Method

Participants.

The participants were 30 university students (27 female) aged between 18 and 21 (mean 19). All participants were native Japanese speakers who reported no hearing impairments and had normal or corrected-to-normal vision. Participation was voluntary and participants received a 500-yen gift certificate. All participants gave written informed consent to participate in the study. The experiments were approved by the Ethics Committee of Doshisha University.

Materials.

For visual stimuli sixteen pictures of either big or small animals were used (Fig 2). The animal pictures were drawn by a professional illustrator (NiKo-Illustration). The illustrator was instructed to draw either “big, strong, and heavy” or “small, weak, and light” animals. Otherwise the illustrator was uninformed with regard to the goal of the research. All animals were depicted in a three-quarter side view facing to the left.

Download:

Fig 2. Visual stimuli showing either big or small animals.

https://doi.org/10.1371/journal.pone.0187196.g002

Two groups of pseudo-words were generated by randomly creating sequences of three syllables, each consisting of one consonant and one vowel (CVCVCV). The two groups differed with respect to their vowels; while the first group contained back vowels (/o/ and /u/), the second group contained front vowels (/i/ and /e/). To avoid the influence of the consonants, the plosive consonants /p/, /t/, and /k/ were used for both groups of pseudo-words.

Eight pseudo-words per group were recorded by a male speaker (pseudo-words with back vowels: kotopu, kutopo, kopotu, pokotu, putoko, tokopu, tupuko, topuko; pseudo-words with front vowels: kipite, kitepi, pekite, tipeki, pikite, piteke, tepeki, tikipi). The recordings were done in an anechoic room, using a Roland CD-2e digital recorder. The speaker was asked to clearly put the stress on the first syllable when pronouncing each word. Otherwise he was uninformed with respect to the aim of the experiment. The length of the stimuli was manipulated to fit exactly 0.58 sec, using Audacity version 2.0. Files were saved in WAV 16 Bit PCM format.

In order to exclude undesired effects caused by phonetic congruence between the (Japanese) names of the depicted animals and the phonetic characteristics of the pseudo-words, precautions were taken by (1) keeping the number of front-vowels and back-vowels balanced within each group (i.e., small animals and big animals), (2) minimizing the number of consonant-vowel combinations that occur in the acoustic stimuli (i.e., plosive + front-vowel or plosive + back-vowel), (3) preferably selecting animals with Japanese names that contain both front-vowels and back-vowels, and (4) apportioning the number of animals that contain only front-vowels or only back-vowels within each group. The Japanese names were collected in a pre-study in which 12 participants were asked to provide name(s) that spontaneously came to mind when looking at the picture. The Japanese names are listed below in Roman letters (for Katakana, see supporting materials S1 Table). Answers that were given by a majority of participants are listed outside the brackets. Participants also provided alternative answers for some animals, which are listed inside the brackets. The order of animals is consistent with their occurrence in Fig 2 from left to right and from top to bottom: BISON: ushi (bison, baffarō, gnū); POLAR BEAR: shirokuma (kuma); ELEPHANT: zō; GORILLA: gorira; LION: raion; RHINO: sai; HIPPO: kaba; CHEETAH: chītā (hyou, tora); HAMSTER: hamustā (nezumi, morumotto, momonga); RABBIT: usagi; CAT: neko (inu); MOUSE: nezumi; BIRD: tori (hato, inko); DEER: shika (banbi, inpara); SQUIRRIL: risu; MEERKAT: mīakyatto (pureridogu, okojo). Based on the answers given by a relative majority of participants, the distribution of front-vowels and back-vowels is as follows:

Ratio of front-vowels to back-vowels per group: small animals: 8/8, big animals: 6/6.
Ratio of consonant-vowel combinations that also occur in the acoustic stimuli (consonant + front-vowel vs. consonant + back-vowel): small animals: 0/3, big animals: 0/1.
Number of animals with both front-vowels and back-vowels in their Japanese names: small animals: 6, big animals: 4.
Ratio of animals with only front-vowels vs. animals with only back-vowels: Small animals: 1/1, big animals: 2/1.

To summarize, the ratio of front-vowels to back-vowels was balanced for both groups of animals. Unequal distributions regarding consonant-vowel combinations and the number of animals with only front-vowels or only back-vowels predicted a result that contradicted the hypothesis (i.e., more consonant + back-vowel combinations among small animals and more names with only front-vowels in the group with big animals). Thus, potential confounding variables based on phonetic congruency between visual and acoustic stimuli were excluded as far as possible.

Apparatus.

The experiments were conducted in a quiet atmosphere free from disturbing noise or other distractions. Visual stimuli were presented on a 15.6” HD LED LCD display. Acoustic stimuli were presented using a Sony MDR-CD900ST headphone. During presentation of the acoustic stimuli, a white fixation cross on black background appeared on the screen. For stimulus delivery and experimental control, the software Presentation® (version 15) by Neuro-Behavioral-Systems (http://www.neurobs.com/) was used.

Procedure.

The hypothesis was tested using the Implicit Association Test (IAT) [64]. According to Parise and Spence [65], who used the IAT in a study of cross-modal associations, the design of this test has the advantages of (a) being independent of participants’ ability or willingness to accurately report their introspection, (b) avoiding the risk of merely assessing failure of selective attention on the part of the participant, and (c) allowing for the measurement of mutual associations, rather than assessing influence in a single direction from one modality to another.

In the current research, participants were presented with two categories of pseudo-words and two categories of pictures. Stimuli of both modalities–acoustic and visual–were presented in random order. Participants were asked to categorize each stimulus by pressing a button with either the left or the right index finger. The Implicit Association Test is based on the idea that participants encounter fewer problems and hence are able to complete the task faster and with fewer mistakes when associated stimuli–e.g., visual stimuli related to bigness and acoustic stimuli with low frequency–are allocated to the same response behavior (i.e., pressing buttons on the same side) than when they are allocated to different response behaviors (i.e., pressing buttons on opposite sides). Each participant performed two experimental blocks, one in which presumably associated groups of stimuli were allocated to the same side (conforming condition) and one in which presumably associated groups of stimuli were allocated to opposite sides (non-conforming condition). The hypothesis was corroborated when response latency in the conforming condition was significantly shorter than that in the non-conforming condition.

Following the standard procedure of the IAT [66], each experiment consisted of five blocks in total: three for training and the other two for the experiment (Fig 3). In the training blocks–numbers 1, 2, and 4 –participants practiced allocation of the stimuli for each modality separately (e.g., first for pseudo-words and second for pictures). In the experimental blocks–numbers 3 and 5 –stimuli of both dimensions were presented together in randomized order. In block 4, the allocation of the acoustic stimuli changed sides. The order of the two experimental blocks (e.g., first conforming and second non-conforming condition) and the allocation of the stimuli (e.g., big animals to the right and small animals to the left in conforming block) was randomized.

Download:

Fig 3. Experimental design.

IAT in five blocks, with blocks 1, 2, and 4 for training and blocks 3 and 5 for the experiment. In the experiment, big animals were labeled as “wild animals”, small animals as “gentle animals”, front vowels as “bright sounds”, and back vowels as “dark sounds”. Stimuli on the left-hand side were categorized by pressing the [E] key. Stimuli on the right-hand side were categorized by pressing the [I] key. The figure shows an example for a possible sequence. The order of the two experimental blocks as well as the allocation of the stimuli to either the right or the left side was randomized.

https://doi.org/10.1371/journal.pone.0187196.g003

Participants first read an introduction which explained the procedure and introduced the stimuli (for the original introduction together with an English translation see S1 Text). To make sure that participants were not primed to focus on differences in size, animals were introduced as being either wild (big animals) or gentle (small animals). In Japanese, the expressions read 獰猛な動物 for wild (big) and 温和な動物 for gentle (small) animals. However, to avoid confusion I will stick to the distinction between ‘big’ and ‘small’ animals in what follows. Pseudo-words with back- and front vowels were introduced as 暗い音 (dark sounds) and 明るい音 (bright sounds), respectively. After the introduction, the allocation of the stimuli to either the left or the right side was displayed, followed by the training sessions.

Each category contained eight different stimuli. For each block, each stimulus was presented twice, resulting in 32 trials for the training blocks and 64 trials for the experimental blocks (8 stimuli x 2 times x 2 categories (x 2 modalities)). Each trial lasted as long as it took for the participant to answer. Mistakes (incorrect categorizations) were signaled by a small red x at the bottom of the screen, forcing the participant to correct his or her mistake. Following each trial there was an inter-stimulus interval (ISI) of 500 ms, during which the screen was black (Fig 4).

Download:

Fig 4. Trial design.

The figure shows an example for a sequence taken from block 3 (see Fig 3). Incorrect answers were signaled by a repetition of the same stimulus together with a red cross at the bottom of the monitor. Correct answers were followed by 500 ms ISI before the next stimulus (visual or acoustic) was presented.

https://doi.org/10.1371/journal.pone.0187196.g004

Data analysis.

The data were analyzed following the recommendations by Greenwald and colleagues [67]. In their article, the authors compared a variety of scoring algorithms for the IAT in terms of five criteria: (1) correlations with parallel self-report measures, (2) resistance to an artifact associated with reaction time, (3) internal consistency, (4) sensitivity to known influences on IAT measures, and (5) resistance to known procedural influences. To this end, Greenwald et al. conducted six studies analyzing data from approximately 1.2 million tests. The tested algorithms differed regarding the inclusion or exclusion of data obtained during training trials for the experimental blocks, the setting of an upper- and/or lower-threshold for the reaction time, and the manner in which error trials were handled (i.e., trials in which participants allocated a stimulus not in accordance with the given instructions). As a conclusion of their analysis, the authors suggested an improved scoring algorithm, which was applied in all three experiments reported in the present text. Accordingly, reaction times of error trials were replaced by an error penalty generated from the mean of all correct answers per block plus a 600 ms penalty. However, as the results in all three experiments suggested that reaction times for visual stimuli and for acoustic stimuli differed substantially, the average reaction time was assessed separately per modality of the stimulus.

To test the hypothesis, response latencies were compared between the two experimental conditions. In accordance with the hypothesis, the average response latency for combinations of assumingly associated visual and acoustic stimuli (d1, conforming condition, e.g., pseudo-words with back vowels and pictures with big animals allocated to the same side) was predicted to be shorter than that for combinations of visual and acoustic stimuli that were not believed to be associated (d2, non-conforming condition, e.g., pseudo-words with back vowels allocated to the right side and pictures with big animals to the left) (i.e., ∆d = d2 –d1 > 0).

The effect of the experimental conditions on reaction time latencies was analyzed by applying mixed-effects linear regressions using the lme4 package [68] of the R environment (Version 3.2.3). To test whether the effect of the experimental conditions holds independent of the modality (visual vs. acoustic) or the category (big vs. small for animal pictures and front-vowel vs. back-vowel for pseudo-words) of the stimuli, both factors were included as control variables in the model. All effects as well as the interactions between fixed factors were taken as random at the participant level. To obtain p-values, the car package of the R environment was used by applying type III Wald F tests with Kenward-Roger degrees of freedom approximation [69]. All fixed factors were contrast coded. To ensure normal distribution of the data, a log transformation of the reaction time data was performed. The homogeneity of residuals was validated by inspecting the residuals plotted against fitted values, which yielded no clear indication of heteroscedasticity. Additionally, using aggregated values per participant, paired sample t-tests were conducted in which differences between conditions were tested separately for each stimulus to evaluate the generalizability of the results. Consequently, these post-hoc tests allowed for monitoring the relative influence of the intrinsic qualities of the stimuli on the effect of the experimental condition.

Results and discussion

The results clearly support the hypothesis. As predicted, mean response latency was significantly shorter in the conforming condition (M(d1) = 740.71 ms, SD = 151.77 ms) than in the non-conforming condition (M(d2) = 858.35 ms, SD = 160.90 ms, F(1/30.6) = 17.45, p < .001). This effect of the experimental conditions on reaction time held for both sensory modalities and for both categories (Table 1).

Download:

Table 1. Comparison of reaction times in milliseconds per experimental condition separated for stimulus modality and stimulus category.

https://doi.org/10.1371/journal.pone.0187196.t001

Next to the effect of the experimental conditions, only stimulus modality exerted a significant effect on the reaction time (F(1/34.3) = 34.9, p < .001), and the interaction between stimulus modality and experimental conditions showed a tendency towards significance (F(1/33.8) = 3.97, p < .1). On the other hand, neither stimulus category nor the interaction between stimulus category and experimental conditions had a significant effect on reaction time (category: F(1/37.5) = 0.70, p > .1; condition*category: F(1/31.4) = 0.61, p > .1).

To examine the influence of stimulus modality, the effect of the experimental conditions was analyzed separately for visual and acoustic stimuli. Results of this analysis confirmed that the effect of the experimental conditions was more pronounced for visual than for acoustic stimuli (Fig 5). This difference between visual and acoustic stimuli, however, was to be expected given the nature of the stimuli, with visual stimuli depicting familiar animals and the auditory stimuli consisting of meaningless pseudo-words. Thus, it can be assumed that the distinction between “wild” and “gentle” animals was made automatically and without conscious consideration, whereas the distinction between specific articulatory features of vowels was less intuitive. Still, the difference between the conforming and the non-conforming blocks pointed in the predicted direction for both modalities, with a shorter reaction time in the conforming block than in the non-conforming block (visual: M(d1) = 675.78 ms, SD = 158.33 ms, M(d2) = 820.74, SD = 191.71 ms, df = 29, t-value = 3.815, p < .01; acoustic: M(d1) = 805.64 ms, SD = 165.12 ms, M(d2) = 895.97, SD = 163.75 ms, df = 29, t-value = 2.372, p < .05).

Download:

Fig 5. Response latency per experimental block separated by stimulus’ modality for experiment one.

The figure contrasts the distribution of reaction-time latencies in milliseconds between experimental blocks (conforming condition to the left in bright gray and non-conforming condition to the right in dark gray), separated by stimulus modality with visual stimuli to the left and acoustic stimuli to the right.

https://doi.org/10.1371/journal.pone.0187196.g005

A post-hoc comparison of reaction time between the experimental conditions conducted separately for each stimulus revealed that all visual stimuli and all but one acoustic stimuli (i.e., kutopo) were categorized faster in the conforming condition than in the non-conforming condition (Fig 6). Note that the results also indicate that phonetic similarity between the name of the visual stimuli and the acoustic stimuli had no relevant effect on participants’ performances. For example, when comparing the results of animals that have only one group of vowels in their Japanese name, it becomes clear that phonetic congruence between visual and acoustic stimuli neither led to a comparatively high (e.g., ELEPHANT, Japanese: zō) nor to a comparatively low (e.g., CHEETAH, Japanese: chītā, HAMSTER, Japanese: hamustā) effect size. Moreover, comparing the results between those animals that contained consonant-vowel combinations of the acoustic stimuli in their Japanese names, again results do not suggest that phonetic congruency had a strong influence on participants’ performance. If similarity between the Japanese names of the animals and the acoustic stimuli had a significant effect, one would expect that on the one hand small animals with consonant + back-vowel combinations (i.e., CAT and BIRD) show a particularly small effect size while big animals with consonant + back-vowel combinations (i.e., POLAR-BEAR) should have a particularly pronounced effect size. However, this prediction is confirmed only for the BIRD but not for the CAT or the POLAR-BEAR, suggesting that congruencies between visual and acoustic stimuli had only a minor effect of the performance of the participants.

Download:

Fig 6. Averaged difference in response latency per stimulus.

Bars represent the distance between conforming and non-conforming conditions in milliseconds. Positive values indicate shorter response latencies in the conforming condition. Error bars represent the Standard Error of Mean. Numbers above the bars display t-values. Stars indicate the level of significance, with *** < .001, ** < .01, * < .05, and + < .1. Exact values can be found in the supporting materials (S1 Table).

https://doi.org/10.1371/journal.pone.0187196.g006

Given that the association between pseudo-words and pictures was triggered by the relation between the place of articulation of vowels and the notion of size, one must conclude that the association took place after participants interpreted the content of the pictures. That is, the association between the articulatory features of the verbal stimuli and the visual stimuli took place on a level of cognitive processing, which involved the interpretation of semantic information. Note in particular that animals were not introduced with any reference to size or related concepts but as being either wild or gentle. Thus, the result suggests that the association was not primarily triggered by directly perceptible features, but by semantic conceptualization of these features.

Experiment II

The aim of the second experiment was to monitor the interaction between perceptible and semantic features of visual stimuli with respect to their association with phonemes. To this end, experiment one was repeated; this time, however, the pictures of big animals were noticeably smaller than the pictures of small animals. Thus, if participants ignore the actual size of the pictures and–as in experiment one–preferably associate back vowels with big animals and front vowels with small animals, it can be concluded that the influence of the content overrides the influence of directly perceptible characteristics on the association between picture and pseudo-word. In other words, the experiment was designed to test the influence of perceptible features (i.e., the size of the pictures) on the association between the vowels and the content of the pictures (i.e., the size of the depicted animals).