Skip to main content

About this book

This book constitutes the proceedings of the First International Workshop on Language, Music and Computing, LMAC 2017, held in St. Petersburg, Russia, in April 2017.

The 18 papers presented in this volume were carefully reviewed and selected from 52 submissions. They were organized in topical sections on the universal grammar of music, the surface of music and singing, language as music, music computing, formalization of the informality.

Table of Contents


The Universal Grammar of Music


Does the Y-Model for Language Work for Music?

The four main modules of the classic model for the faculty of language postulated in generative linguistics—lexicon, syntax, phonology/prosody and semantics—have been hypothesized to each have a (more or less abstract) equivalent module in the faculty of music. This hypothesis suggests that it should be possible to explain the way these modules interact—represented by the inverted-Y form of the model—in a similar fashion. I propose a refinement of Katz and Pesetsky’s (2011) hypothesis by suggesting that there are a number of common properties shared by the lexical systems of music and language, and it is precisely this that explains some of their fundamental syntactic similarities. What makes the two systems different is not primitively the properties of their lexical modules, but rather the radically different nature of their respective interpretive modules—semantics in the case of language (or, technically, the conceptual-intentional system), and the Tonal-Harmonic Component (THC) in the case of music.
Oriol Quintana

Is Generative Theory Misleading for Music Theory?

During the decade of the 1960s linguistics entered what can be seen as a paradigm shift following Thomas Kuhn’s theory of the Structure of Scientific Revolutions (1962). As a result, the discipline steps out of the Cartesian dualism between body and mind. During the 80th’s analytical musicology was related to the methodological approach of transformational grammars, the best known example being the Generative Theory of Tonal Music (Lerdahl and Jackendoff 1983). For musicologist, the motivation to adopt this position is naturally nurtured by the work of Heinrich Schenker (Der Freier Satz 1935) in which, as in transformational grammar, a hierarchy of layers going from the actual piece of music to its Ursatz (Kernel) is proposed. The hypothesis developed in this article is that the analytical musicology, despite the efforts to link it with modern linguistics, has not yet stepped into the new scientific paradigm led by cognitive sciences. The reason for this is that musicology has not yet adopted a redefinition of its object of study from a non-dualistic and transdisciplinary perspective. With the development of experimental aesthetics, the ontological gap between the object of musicology and that of the scientific approach to music has been growing larger. As a result, if the study of aesthetic meaning in music has become possible today, it seems to be inconsistent with the traditional reductionist methods of analytical musicology, from which the analogy with transformational grammar rely upon.
Rafael Barbosa

Acoustic and Perceptual-Auditory Determinants of Transmission of Speech and Music Information (in Regard to Semiotics)

The paper presents conception regarding speech and music research on the basis of semiotics. The speech and music semiotic systems are examined as special hierarchical subsystems of the common human semiotic interpersonal communication system. And what is more every speech semiotic subsystem has its own subsystems: e.g., articulatory, phonatory, acoustic, perceptual-auditory etc. The speech semiotic acoustic subsystem has its own subsystems, e.g. duration (tn – ms), intensity (In – dB), fundamental frequency (F0n – Hz), spectrum values (Fn – Hz). Speech and music are considered as two congeneric phenomena subsystems of the common semiotic interpersonal communication system with regard to semantics, syntactics and pragmatics: semantics as an area of relations between speech and music expressions, on the one hand, and objects and processes in the world, on the other hand; syntactics as an area of the interrelations of these expressions; pragmatics as an area of the influence of the meaning of these expressions on their users. This speech and music semiotic conception includes binary oppositions “ratio-emotio”, actual “thema-rhema” segmentation of speech and musical text/discourse, “segmental-suprasegmental items”, “prosody-timbre items”, etc. In this research was made an attempt to show the progressiveness of using the method of music “score” (composed on the basis of the results of speech prosodic characteristics analysis), which can help to determine the informativeness of parameters used for determining the speakers’ emotional state. Creation of music score based on this analysis is viewed as a model for a prosodic vocal outline of the utterance. At the same time the prosodic basis of speech and basic expressive means of the “music language” are connected. Speech as well as music uses the same space and time coordinates representing sound item movements. The height metric grid is based on this principle, which determines sound item in dynamics. Time organization of music and speech creates a common temporal basis. The music score created on the basis of the results of the prosodic features acoustic analysis meets the requirements taking into consideration the restrictions caused by the absence of segmented (sound-syllable) text. Thus, with the help of special music synthesis of speech utterance on the basis of the acoustic and further perceptual-auditory analysis it is possible to conduct an “analysis-synthesis” research.
Rodmonga Potapova, Vsevolod Potapov

The Surface of Music and Singing


Synchronization of Musical Acoustics and Articulatory Movements in Russian Classical Romance

The given paper is aimed at investigating synchronization of musical acoustics (pitch, frequencies, durations) and articulatory movements in Russian classical romance. The study employs the method of electromagnetic articulography (EMA) to observe and compare objective data on articulatory characteristics in singing and reading. The genre of romance was chosen as it does not normally employ vocal techniques specific to opera singing (vibrato and etc.) which affect vowel intelligibility significantly. The romance chosen for the experiment is often performed by Russian singers being a part of canonic repertoire at conservatoires. We obtained the samples of singing and read speech and registered the objective data in both types of articulation activities. The recordings can be considered parallel as they were made in succession during one experiment. The calibration and attachment of the sensors was performed once in the beginning of the experiment. That means that the sensor positions were the same for the both recordings which makes them comparable in terms of articulatory data. The obtained material (both singing and reading) is annotated and analyzed in terms of the synchronisation of articulatory movements and pitch in singing as opposed to that in speech.
Karina Evgrafova, Vera Evdokimova, Pavel Skrelin, Tatjana Chukaeva

A Comparison Between Musical and Speech Rhythms: A Case Study of Standard Thai and Northern Thai

The nPVI measurement is an empirical tool used for finding phonetics evidence supporting language classification in terms of rhythmic properties. The nPVI can also be used in order to find the influence of a composer’s native language on instrumental music’s rhythm (Patel and Daniel, 2003). However, there is a question as to whether music with lyrics will yield the same result as instrumental music. Therefore, this study aims to investigate rhythm in Standard Thai and Northern Thai pop songs. The results show that the nPVI value for Standard Thai pop songs is lower than the nPVI value of Northern Thai pop songs and the result of rhythmic property in music obtained from nPVI calculating is not parallel with speech nPVI value. To illustrate, musical nPVI value calculated from Standard Thai pop songs is lower than the musical nPVI value of Northern Thai pop songs. This incongruousness might result from the influence of Western music and melodies on Standard Thai pop songs and folk melodies on Northern Thai pop songs.
Chawadon Ketkaew, Yanin Sawanakunanon

Singer’s Performance Style in Numbers and Dynamic Curves: A Case Study of “Spring Waters” by Sergei Rachmaninoff

The paper is devoted to the research of vocalist’s individual manners of performance made on audio recordings with use of their acoustic analysis. The research method was tested on recordings of five performances of the famous song “Spring waters” by Sergei Rachmaninoff sung by different vocalists-tenors. Audio recordings were manually segmented into vocal syllables to obtain information on duration of each vocal note. Tabulating data on different performances into a single database allowed to conduct their statistical analysis. As a result, we have obtained time series that reveal tempo dynamics for each singer, composed an averaged profile of all performances under study, and made comparison between tempo characteristics of syllables, vocal lines and performances in whole. It turned out that an average tempo differs significantly for different performers. In “Spring waters”, tempo profiles of different individual singers have much in common, going almost in parallel in the beginning of the song, however diverging from each other closer to its end. The greatest variation of tempo was observed in two important points—in the final vocal line and in the point of the golden section, the latter being the break point in the song composition. The proposed method allows to reveal the average style of performance for a particular song and the deviations from it.
Gregory Martynenko

Textual and Musical Invariants for Searching and Classification of Traditional Music

The goal of this research is to determine whether such properties as tonality, mode, meter and tune title remain similar between different versions of the same melody. A variability in some features makes classifying and searching tasks more difficult. The author uses a corpus of traditional dance melodies on audio recordings from Macedonia (Greece), as a base for analysis.
We show that, in general, none of the features – meter, mode, key and tune title – are invariable on their own, for all versions of a selected tune. At the same time, using linguistic features where the musical ones fail, and vice versa, helps to improve the chances of a correct attribution and an efficient search.
It is possible now to use the examples of invariance violations to assess possible search systems for a corpus of musical works.
Ilya Saitanov

Language as Music


Sound Symbolism of Contemporary Communication: Croatian Imitative Expressions in Computer-Mediated Discourse

Generally, sound symbolism has been regarded as a marginal linguistic ocurrence because (traditional/structuralist) linguistic theory presupposes arbitrary link between sound and meaning. Phonemes are perceived as the smallest building blocks of language structure whose value comes from (distinctive) relations with other phonemes. These abstract units do not posses nor carry their own meaning, but trough their distribution they make differences in meaning of more complex language units (e.g. morphemes, words). In contrast, sound symbolism presupposes direct, motivated link between sound (consequently grapheme) and meaning, which is the case with onomatopoeic expressions or imitatives. Led by communicative practice and observation of Croatian online vernacular, we have decided to approach this problem from pragmalinguistic perspective. Starting from everyday language-use, in the context of computer-mediated communication (chat analysis), we have noticed employment of large numbers of imitative expressions with various communicative functions (not exclusively poetic). Main objective of this paper is to point out the importance of a language process (sound symbolism) that has a potential of logically connecting two different systems of human communication and expression (language and music) and shed light on developmental, evolutionary and context-sensitive features of signing activities.
Jana Jurčević

Slips of the Ear as Clues for Speech Processing

The aim of the paper is to show how slips of the ear can contribute to the understanding of spoken word processing be native speakers and second language learners and to the description of the structure of the mental lexicon for native and second languages. In our experiment, 30 native Russian speakers and 30 Chinese students learning Russian as a second language listened to 100 Russian words and had to write them down. We analyzed the mistakes in the answers of the both groups of participants checking different linguistic and psycholinguistic parameters (phonetic factors, part-of-speech, priming and frequency effects). We found out that the native language of a listener influences the recognition of spoken words both in native and non-native language on the phonetic level. The processing on higher levels is less language specific: we found evidence that the word frequency effect and priming effect are relevant for processing Russian words by both native and non-native speakers.
Elena I. Riekhakaynen, Alena Balanovskaia

Phrase Breaks in Everyday Conversations from Sociolinguistic Perspective

This study was made on the base of the ORD corpus of everyday spoken Russian, containing the rich collection of audio recordings made in real-life settings. Speech transcripts of the ORD corpus imply mandatory indication of word and phrase breaks, self-correction, hesitations, fillers and other irregularities of spoken discourse. The paper deals with speech breaks in oral discourse (word breaks, phrase breaks, intraphrasal pauses, etc.). Quantitative analysis performed on the subcorpus of 187 600 tokens has shown that 7,56% of all phrases in everyday communication are not finished. If word breaks can be referred to word search/choice or self-correction, phrase breaks affect the text level and result in ragged, rough, and poorly structured syntactic sequence. Sociolinguistic analysis has revealed that phrase breaks are more frequent in men’s speech than in the women’s (8.16 vs. 7,12%). Seniors have significantly more speech breaks (10,76%) than children (6,78%), youth (6,08%) and middle-aged people (7,37%). As for status groups of speakers, the highest share of breaks is found in speech of unemployed and retired people (10,75%), whereas the lowest percentage of breaks is observed in speech of managers (4,50%) who care, apparently, more about their speech quality than others.
Natalia Bogdanova-Beglarian

Audible Paralinguistic Phenomena in Everyday Spoken Conversations: Evidence from the ORD Corpus Data

Paralinguistic phenomena are non-verbal elements in conversation. Paralinguistic studies are usually based on audio or video recordings of spoken communication. In this article, we will show what kind of audible paralinguistic information may be obtained from the ORD speech corpus of everyday Russian discourse containing long-term audio recordings of conversations made in natural circumstances. This linguistic resource provides rich authentic data for studying the diversity of audible paralinguistic phenomena. The frequency of paralinguistic phenomena in everyday conversations has been calculated on the base of the annotated subcorpus of 187,600 tokens. The most frequent paralinguistic phenomena turned out to be: laughter, inhalation noise, cough, e-like and m-like vocalizations, tongue clicking, and the variety of unclassified non-verbal sounds (calls, exclamations, imitations by voice, etc.). The paper reports on distribution of paralinguistic elements, non-verbal interjections and hesitations in speech of different gender and age groups.
Tatiana Sherstinova

Music Computing


Distributed Software Hardware Solution for Complex Network Sonification

The paper presents the hardware software solution for sonification of complex networks and systems. The sonification expands the possibilities of an analysis of complex information through using the human hearing. Auditory displays allow reducing the operator’s workload and better detection of specific features and patterns in the data. The proposed sonification complex consists of two main parts. Data source located in the LO ZNIIS generates data that describes current state of a network or complex system, accumulates and redirects it. The parametric sonification layer located in the SUT converts the information into forms suitable for creating new audio environment, and represents the data as relevant timbral classes.
Gleb G. Rogozinsky, Konstantin Lyzhinkin, Anna Egorova, Dmitry Podolsky

Automated Soundtrack Generation for Fiction Books Backed by Lövheim’s Cube Emotional Model

One of the main tasks of any work of art is transferring emotion conceived by the author to its recipient. When using several modalities a synergistic effect occurs, making the achievement of the target emotional state more likely. In reading, mostly, visual perception is involved, nevertheless, we can supplement it with an audio modality with the soundtrack’s help via specially selected music that corresponds to the emotional state of a text fragment.
As a base model for representing emotional state we have selected physiologically motivated Lövheim’s cube model which embraces 8 emotional states instead of 2 (positive and negative) usually used in sentiment analysis.
This article describes the concept of selecting special music for the “mood” of a text extract by mapping text emotional labels to tags in LastFM API, fetching music data to play and experimental validation of this approach.
Alexander Kalinin, Anastasia Kolmogorova

Characteristics of Music Playback and Visio-Motor Interaction at Sight-Reading by Pianists Depending on the Specifics of a Musical Piece

We have analysed the basic characteristics of music playback at sight-reading of three two-line classic music selections of varying textures and complexity: a two-voice polyphonic musical piece, a theme and a variation of homophonic-harmonic musical piece. These characteristics serve as objective indicators of the musicians’ skill of sight-reading, and the complexity of musical selection. Applying an original technique of eye movement recording without fixating the head, we studied the eye-hand span i.e. the time from reading the text to music playback. Our findings reveal, that the eye-hand span depends on the texture of the performed musical piece and inversely correlates with the number of errors as well as directly correlates with the rate of stability in the performance. This parameter may serve as an objective measure of the sight-reading ability. It is connected with the complexity of a musical piece and, presumably, characterizes the working memory capacity of musicians.
Leonid Tereshchenko, Lyubov’ Boyko, Dar’ya Ivanchenko, Galina Zadneprovskaya, Alexander Latanov

Formalization of the Informality


The Role of Truth-Values in Indirect Meanings

The problem of truth-values of indirect meanings is discussed within the semantic theory of indirect meaning proposed by the present authors in a dialogue with Hintikka’s and Sandu’s theory. The authors preserve the key notion of the latter, the meaning line, but putting it into different semantics (non-Fregean situational) and logic (paraconsistent). Like the contradictions, the indirect meanings tend to an explosion (there are always such possible worlds where they are true); to make them meaningful, there is a need of singling out the only relevant transworld connexion among the infinite number of the possible ones. The meaning line serves to this purpose. An analysis of the simplest semantic constructions with indirect meaning (tropes, humour, hints, riddles, etc.) is proposed.
Basil Lourié, Olga Mitrenina

Linguistic Approaches to Robotics: From Text Analysis to the Synthesis of Behavior

We examine the problem of “understanding robots” and design an F–2 emotional robot to “understand” speech and to support human-like behavior. The suggested system is an applied implementation of the theoretical concept of robotic information flow, suggested by M. Minsky (“proto-specialists”) and A. Sloman (CogAff). This system works with real world input – natural texts, speech sound – and produces natural behavioral output – speech, gestures and facial expressions. Unlike other chatbots, the system relies on semantic representation and operates with a set of d-scripts (equivalents to proto-specialists), extracted from advertising and mass media texts as a classification of basic emotional patterns. The process of “understanding” is modelled as the selection of a relevant d-script for the incoming utterance.
Artemy Kotov, Nikita Arinkin, Ludmila Zaidelman, Anna Zinina

Semantics and Syntax Tagging in Russian Child Corpus

The paper describes a new semantic and syntax tagging that is applied to annotate Russian child corpus “KONDUIT” – the corpus of oral unprepared elicited narratives produced by Russian monolinguals at the age of 2;7–7;6. This annotation allows uncovering some links between verb semantics and syntax that influence the steps of verb acquisition. A case study of verbs of speech and their comparison to the ones of mental activity shows that children acquire verb semantic and syntax structures on the base of verb semantic classes gradually progressing from one structure to the next one. On the other hand the results prove that syntax acquisition depends not only on verb semantics but also on such parameters as reference or part of speech. Along with the acquisition of narrative skills and the rules of referencing children widen their sets of possible syntactic structures.
Polina Eismont

When Language Survived, Music Resurrected and Computer Died: To the Problem of Covert Ontologies in Language

Creating formal ontologies is one of the current information science trends. However, in the context of taxonomies existing in natural languages another type of class hierarchy seems to be more important – the so-called “covert ontologies” that categorize entities in terms of crypto classes or hidden classes.
The research aims to examine localization of the three entities in the Russian language natural ontology, which seem very different from the formal point of view. These entities are: “language” (i.e. belongs to the formal class of systems), “music” (i.e. represents the formal class of perception or activity), and, finally, “computer” (i.e. embodies the formal class of equipment).
According to our preliminary observations, all three entities under discussion are conceptualized in Russian language as living systems. Our further analysis of 500 occurrences in which the three entities’ names adjoin verbs designing different steps of vitality cycle showed that “music” enters the class of mythic heroes or Demiurges, “language” belongs to the covert class of Humans; at last, “computer” integrates the class of pets.
The revealed properties of natural categorization due to the effects of covert ontology also influence the eventual semantic roles of exploring entities’ names.
Anastasia Kolmogorova


Additional information

Premium Partner

    Image Credits