Skip to main content

2021 | Buch

Perception, Representations, Image, Sound, Music

14th International Symposium, CMMR 2019, Marseille, France, October 14–18, 2019, Revised Selected Papers

insite
SUCHEN

Über dieses Buch

This book constitutes the refereed proceedings of the 14th International Symposium on Perception, Representations, Image, Sound, Music, CMMR 2019, held in Marseille, France, in October 2019.

The 46 full papers presented were selected from 105 submissions. The papers are grouped in 9 sections. The first three sections are related to music information retrieval, computational musicology and composition tools, followed by a section on notations and instruments distributed on mobile devices. The fifth section concerns auditory perception and cognition, while the three following sections are related to sound design and sonic and musical interactions. The last section contains contributions that relate to Jean-Claude Risset's research.

Inhaltsverzeichnis

Frontmatter

Music Information Retrieval - Music, Emotion and Representation

Frontmatter
The Deep Learning Revolution in MIR: The Pros and Cons, the Needs and the Challenges

This paper deals with the deep learning revolution in Music Information Research (MIR), i.e. the switch from knowledge-driven hand-crafted systems to data-driven deep-learning systems. To discuss the pro and cons of this revolution, we first review the basic elements of deep learning and explain how those can be used for audio feature learning or for solving difficult MIR tasks. We then discuss the case of hand-crafted features and demonstrate that, while those where indeed shallow and explainable at the start, they tended to be deep, data-driven and unexplainable over time, already before the reign of deep-learning. The development of these data-driven approaches was allowed by the increasing access to large annotated datasets. We therefore argue that these annotated datasets are today the central and most sustainable element of any MIR research. We propose new ways to obtain those at scale. Finally we highlight a set of challenges to be faced by the deep learning revolution in MIR, especially concerning the consideration of music specificities, the explainability of the models (X-AI) and their environmental cost (Green-AI).

Geoffroy Peeters
Methods and Datasets for DJ-Mix Reverse Engineering

DJ techniques are an important part of popular music culture. However, they are also not sufficiently investigated by researchers due to the lack of annotated datasets of DJ mixes. Thus, this paper aims at filling this gap by introducing novel methods to automatically deconstruct and annotate recorded mixes for which the constituent tracks are known. A rough alignment first estimates where in the mix each track starts, and which time-stretching factor was applied. Second, a sample-precise alignment is applied to determine the exact offset of each track in the mix. Third, we propose a new method to estimate the cue points and the fade curves which operates in the time-frequency domain to increase its robustness to interference with other tracks. The proposed methods are finally evaluated on our new publicly available DJ-mix dataset UnmixDB. This dataset contains automatically generated beat-synchronous mixes based on freely available music tracks, and the ground truth about the placement, transformations and effects of tracks in a mix.

Diemo Schwarz, Dominique Fourer
Towards Deep Learning Strategies for Transcribing Electroacoustic Music

Electroacoustic music is experienced primarily through auditory perception, as it is not usually based on a prescriptive score. For the analysis of such pieces, transcriptions are sometimes created to illustrate events and processes graphically in a readily comprehensible way. These are usually based on the spectrogram of the recording. Although the manual generation of transcriptions is often time-consuming, they provide a useful starting point for any person who has interest in a work. Deep-learning algorithms that learn to recognize characteristic spectral patterns using supervised learning represent a promising technology to automatize this task. This paper investigates and explores the labeling of sound objects in electroacoustic music recordings. We test several neural-network architectures that enable classification of sound objects using musicological and signal-processing methods. We also show future perspectives how our results can be improved and applied to a new gradient-based visualization approach.

Matthias Nowakowski, Christof Weiß, Jakob Abeßer
Ensemble Size Classification in Colombian Andean String Music Recordings

Reliable methods for automatic retrieval of semantic information from large digital music archives can play a critical role in musicological research and musical heritage preservation. With the advancement of machine learning techniques, new possibilities for information retrieval in scenarios where ground-truth data is scarce are now available. This work investigates the problem of ensemble size classification in music recordings. For this purpose, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts. Different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized. The best deep neural network architecture achieved 81.5% file-wise mean class accuracy using only feed forward layers with linear magnitude spectrograms as input representation. This model will serve as a baseline for future research on ensemble size classification.

Sascha Grollmisch, Estefanía Cano, Fernando Mora Ángel, Gustavo López Gil
Tapping Along to the Difficult Ones: Leveraging User-Input for Beat Tracking in Highly Expressive Musical Content

We explore the task of computational beat tracking for musical audio signals from the perspective of putting an end-user directly in the processing loop. Unlike existing “semi-automatic” approaches for beat tracking, where users may select from among several possible outputs to determine the one that best suits their aims, in our approach we examine how high-level user input could guide the manner in which the analysis is performed. More specifically, we focus on the perceptual difficulty of tapping the beat, which has previously been associated with the musical properties of expressive timing and slow tempo. Since musical examples with these properties have been shown to be poorly addressed even by state of the art approaches to beat tracking, we re-parameterise an existing deep learning based approach to enable it to more reliably track highly expressive music. In a small-scale listening experiment we highlight two principal trends: i) that users are able to consistently disambiguate musical examples which are easy to tap to and those which are not; and in turn ii) that users preferred the beat tracking output of an expressive-parameterised system to the default parameterisation for highly expressive musical excerpts.

António Sá Pinto, Matthew E. P. Davies
Drum Fills Detection and Generation

Drum fills are essential in the drummer’s playing. They regularly restore energy and announce the transition to a new part of the song. This aspect of the drums has not been explored much in the field of MIR because of the lack of datasets with drum fills labels. In this paper, we propose two methods to detect drum fills along a song, to obtain drum fills context information. The first method is a logistic regression which uses velocity-related handcrafted data and features from the latent space of a variational autoencoder. We give an analysis of the classifier performance regarding each features group. The second method, rule-based, considers a bar as a fill when a sufficient difference of notes is detected with respect to the adjacent bars. We use these two methods to extract regular pattern/ drum fill couples in a big dataset and examine the extraction result with plots and statistical test. In a second part, we propose a RNN model for generating drum fills, conditioned by the previous bar. Then, we propose objective metrics to evaluate the quality of our generated drum fills, and the results of a user study we conducted. Please go to https://frederictamagnan.github.io/drumfills/ for details and audio examples.

Frederic Tamagnan, Yi-Hsuan Yang
End-to-End Classification of Ballroom Dancing Music Using Machine Learning

The term ‘ballroom dancing’ refers to a social and competitive type of partnered dance. Competitive ballroom dancing consists of 10 different types of dances performed to specific styles of music unique to each type of dance. There are few algorithms attempting to differentiate between pieces of music and classify them into the categories, making it hard for beginners to identify which dance corresponds to a certain piece of music they may be listening to. In our research, we attempted to use an end-to-end machine learning approach to easily and accurately classify music into the 10 different types of dance. We experimented with four types of machine learning models and received the highest accuracy of 83% using a Deep Neural Network with three hidden layers. With this algorithm, we can facilitate the learning experience of beginner ballroom dancers by aiding them to distinguish between different types of ballroom dancing music.

Noémie Voss, Phong Nguyen

Computational Musicology

Frontmatter
Modal Logic for Tonal Music

It is generally accepted that the origin of music and language is one and the same. Thus far, many syntactic theories of music have been proposed, however, all these efforts seem mainly to concern the generative syntax. Although such syntax enables us to construct hierarchical tree, the mere tree representation is not sufficient in representing mutual references in music. In this research, we propose the annotation of tree with modal logic, by which the reference from each pitch event to harmonic regions are clarified. In addition, while the conventional generative syntax constructs the tree in the top-down way, the modal interpretation gives the incremental construction according to the progression of music. Therefore, we can naturally interpret our theory as the expectation–realization model that is more familiar to our human recognition of music.

Satoshi Tojo
John Cage's Number Pieces, a Geometric Interpretation of “Time Brackets” Notation

Conceptual musical works that lead to a multitude of realizations are of special interest. One can’t talk about a performance without considering the rules that lead to the existence of that version. After dealing with similar works of open form by Iannis Xenakis, Pierre Boulez and Karlheinz Stockhausen, the interest in John Cage’s music is evident. His works are “so free” that one can play any part of the material; even a void set is welcomed. The freedom is maximal and still there are decisions to consider in order to make the piece playable. Our research was initially intended to develop a set of conceptual and software tools that generates a representation of the work as an assistance to performance. We deal here with the Number Pieces Cage composed in the last years of his life. Over time, we realized that the shape used to represent time brackets, brought important information for the interpretation and musical analysis. In the present text, we propose a general geometric study of these time brackets representations, while trying to make the link with their musical properties to improve the performance.

Benny Sluchin, Mikhail Malt
Modelling 4-Dimensional Tonal Pitch Spaces with Hopf Fibration

The question of how to arrange harmonically related pitches in space is a historical research topic of computational musicology. The primitive of note arrangement is linear in 1-D, in which ordered ascending pitches in one direction correspond to increasing frequencies. Euler represented harmonic relationships between notes with a mathematical lattice named Tonnetz, which extends the 1-D arrangement into 2-D space by reflecting consonances. Since then, mathematicians, musicians, and psychologists have studied this topic for hundreds of years. Recently, pitch-space modelling has expanded to mapping musical notes into higher-dimensional spaces. This paper aims to investigate existing tonal pitch space models, and to explore a new approach of building a pitch hyperspace by using the Hopf fibration.

Hanlin Hu, David Gerhard
Automatic Dastgah Recognition Using Markov Models

This work focuses on automatic Dastgah recognition of monophonic audio recordings of Iranian music using Markov Models. We present an automatic recognition system that models the sequence of intervals computed from quantized pitch data (estimated from audio) with Markov processes. Classification of an audio file is performed by finding the closest match between the Markov matrix of the file and the (template) matrices computed from the database for each Dastgah. Applying a leave-one-out evaluation strategy on a dataset comprised of 73 files, an accuracy of 0.986 has been observed for one of the four tested distance calculation methods.

Luciano Ciamarone, Baris Bozkurt, Xavier Serra
Chord Function Identification with Modulation Detection Based on HMM

This study aims at identifying the chord functions by statistical machine learning. Those functions found in the traditional harmony theory are not versatile for the various music styles, and we envisage that the statistical method would more faithfully reflect the music style we have targeted. In machine learning, we adopt hidden Markov models (HMMs); we evaluate the performance by perplexity and optimize the parameterization of HMM for each given number of hidden states. Thereafter, we apply the acquired parameters to the detection of modulation. We evaluate the plausibility of the partitioning by modulation by the likelihood value. As a result, the six-state model achieved the highest likelihood value both for the major keys and for the minor keys. We could observe finer-grained chord functions in the six-state models, and also found that they assigned different functional roles to the two tonalities.

Yui Uehara, Eita Nakamura, Satoshi Tojo

Audio Signal Processing - Music Structure, Analysis, Synthesis and Composition Tools

Frontmatter
Deploying Prerecorded Audio Description for Musical Theater Using Live Performance Tracking

Audio description, an accessibility service used by blind or visually impaired individuals, provides spoken descriptions of visual content. This alternative format allows those with low or no vision the ability to access information that sighted people obtain visually. In this paper a method for deploying prerecorded audio description in a live musical theater environment is presented. This method uses a reference audio recording and an online time warping algorithm to align tracks of audio description with live performances. A software implementation that is integrated into an existing theatrical workflow is also described. This system is used in two evaluation experiments that show the method successfully aligns multiple recordings of works of musical theater in order to automatically trigger prerecorded, descriptive audio in real time.

Dirk Vander Wilt, Morwaread Mary Farbood
MUSICNTWRK: Data Tools for Music Theory, Analysis and Composition

We present the API for MUSICNTWRK, a python library for pitch class set and rhythmic sequences classification and manipulation, the generation of networks in generalized music and sound spaces, deep learning algorithms for timbre recognition, and the sonification of arbitrary data. The software is freely available under GPL 3.0 and can be downloaded at www.musicntwrk.com or installed as a PyPi project (pip install musicntwrk).

Marco Buongiorno Nardelli
Feasibility Study of Deep Frequency Modulation Synthesis

Deep Frequency Modulation (FM) synthesis is the method of generating approximate or new waveforms by the network inspired by the conventional FM synthesis. The features of the method include that the activation functions of the network are all vibrating ones with distinct parameters and every activation function (oscillator unit) shares an identical time t. The network learns a training waveform given in the temporal interval designated by time t and generates an approximating waveform in the interval. As the first step of the feasibility study, we examine the basic performances and potential of the deep FM synthesis in small-sized experiments. We have confirmed that the optimization techniques developed for the conventional neural networks is applicable to the deep FM synthesis in small-sized experiments.

Keiji Hirata, Masatoshi Hamanaka, Satoshi Tojo
Musical Note Attacks Description in Reverberant Environments

This paper addresses the description of musical note attacks considering the influence of the reverberation. It is well known that attacks have an essential role in music performance. By manipulating note attack quality, musicians are able to control timbre, articulation, and rhythm, which are essential parameters for conveying their expressive intentions. Including information about the interaction with room acoustics enriches the study of musical performances in everyday practice conditions where reverberant environments are always present. Spectral Modeling decomposition was applied to evaluate independently, three components along the attack: (i) the harmonics of the note being played, (ii) the harmonics of the reverberation, (iii) the residue energy. The description proposal covers two stages: A 2D confrontation of the energy from the extracted components, and a profile representing the first nine harmonics’ structure. We tested the approach in a case study using recordings of an excerpt from a clarinet piece from the traditional classical repertoire, played by six professional musicians. MANOVA tests indicated significant differences (p < 0.05) when considering the musician as a factor for the 2D confrontation. Linear Discriminant Analysis applied for supervised dimensionality reduction of the harmonic profile data also indicated group separation to the same factor. We examined different legato, as well as articulated note transition presenting different performance technique demands.

Thiago de Almeida Magalhães Campolina, Mauricio Alves Loureiro
Generating Walking Bass Lines with HMM

In this paper, we propose a method of generating walking bass lines for jazz with a hidden Markov model (HMM). Although automatic harmonization has been widely and actively studied, automatic generation of walking bass lines has not. With our model, which includes hidden states that represent combinations of pitch classes and metric positions, different distributions of bass notes selected at different metric positions can be learned. The results of objective and subjective evaluations suggest that the model can learn such different tendencies of bass notes at different metric positions and generates musically flowing bass lines that contain passing notes.

Ayumi Shiga, Tetsuro Kitahara
Programming in Style with bach

Several programming systems for computer music are based upon the data-flow family of programming paradigms. In the first part of this article, we shall introduce the general features and lexicon of data-flow programming, and review some specific instances of it with reference to computer music applications. We shall then move the discussion to Max’s very peculiar take on data-flow, and evaluate its motivation and shortcomings. Subsequently, we shall show how the bach library can support different programming styles within Max, improving the expression, the readability and the maintenance of complex algorithms. In particular, the latest version of bach has introduced bell, a small textual programming language embedded in Max and specifically designed to facilitate programming tasks related to manipulation of symbolic musical material.

Andrea Agostini, Daniele Ghisi, Jean-Louis Giavitto
Generative Grammar Based on Arithmetic Operations for Realtime Composition

Mathematical sequences in $$\mathbb {N}_0$$ N 0 are regarded as time series. By repeatedly applying arithmetic operations to each of their elements, the sequences are metamorphised and finally transformed into sounds by an interpretation algorithm. The efficiency of this method as a composition method is demonstrated by explicit examples. In principle, this method also offers laypersons the possibility of composing. In this context it will be discussed how well and under what kind of conditions the compositional results can be predicted and thus can be deliberately planned by the user. On the way to assessing this, Edmund Husserl’s concept of “fulfillment chains” provides a good starting point. Finally, the computer-based board game MODULO is presented. Based on the here introduced generative grammar, MODULO converts the respective game situation directly into sound events. In MODULO, the players behave consistent to the gaming-rules and do not care about the evolving musical structure. In this respect, MODULO represents an alternative draft to a reasonable and common use of the symbols of the grammar in which the user anticipates the musical result.

Guido Kramann

Notation and Instruments Distributed on Mobile Devices

Frontmatter
Mobile Music with the Faust Programming Language

The Faust programming language has been playing a role in the mobile music landscape for the past ten years. Multiple tools to facilitate the development of musical smartphone applications for live performance such as faust2ios, faust2android, faust2api, and faust2smartkeyb have been implemented and used in the context of a wide range of large scale musical projects. Similarly, various digital musical instruments leveraging these tools and based on the concept of augmenting mobile devices have been created. This paper gives an overview of the work done on these topics and provide directions for future developments.

Romain Michon, Yann Orlarey, Stéphane Letz, Dominique Fober, Catinca Dumitrascu
COMPOSITES 1: An Exploration into Real-Time Animated Notation in the Web Browser

COMPOSITES 1 for Modular Synthesizer Soloist and Four Accompanists is a real-time, graphically notated work for modular synthesizer soloist and four accompaniment parts that utilizes the power of Node.js, WebSockets, Web Audio, and CSS to realize an OS-agnostic and web-deliverable electroacoustic composition that can be accessed on any device with a web browser. This paper details the technology stack used to write and perform the work, including examples of how it is used compositionally and in performance. Recent developments in web browser technology, including the Web Audio API and Document Object Model (DOM) manipulation techniques in vanilla JavaScript, have improved the possibilities for the synchronization of audio and visuals using only the browser itself. This paper also seeks to introduce the reader to the aforementioned technologies, and what benefits might exist in the realization of creative works using this stack, specifically regarding the construction of real-time compositions with interactive graphic notations.

Daniel McKemie
Distributed Scores and Audio on Mobile Devices in the Music for a Multidisciplinary Performance

In an attempt to uncover the strengths and limitations of web technologies for sound and music notation applications, driven by aesthetic goals and prompted by the lack of logistic means, the author has developed a system for animated scores and sound diffusion using browser-enabled mobile devices, controlled by a host computer running Max and a web server. Ease of deployment was seen as a desirable feature in comparison to native application computer-based systems – such as Comprovisador, a system which has lent many features to the one proposed herein. Weaknesses were identified motivating the design of mitigation and adaptation strategies at the technical and the compositional levels, respectively. The creation of music for a multidisciplinary performance entitled GarB’urlesco has served as a case study to assess the effectiveness of those strategies. The present text is an extended version of a paper presented at CMMR 2019, in Marseille.

Pedro Louzeiro
The BabelBox: An Embedded System for Score Distribution on Raspberry Pi with INScore, SmartVox and BabelScores

The slow but steady shift away from printed text into digital media has not yet modified the working habits of chamber music practitioners. If most instrumentalists still heavily rely on printed scores, audiences increasingly access notated music online, with printed scores synced to an audio recording on youtube for instance. This paper proposes to guide the listener and/or the performer with a cursor scrolling on the page with INScore, in order to examine the consequences of representing time in this way as opposed to traditional bars and beats notation. In addition to its score following interest for pedagogy and analysis, the networking possibilities of today’s ubiquitous technologies reveal interesting potentials for works in which the presence of a conductor is required for synchronization between performers and/or with fixed media (film or tape). A Raspberry Pi-embedded prototype for animated/distributed notation is presented here as a score player (such as the Decibel ScorePlayer, or SmartVox), in order to send and synchronize mp4 scores to any browser capable device connected to the same WIFI network. The corpus will concern pieces edited at BabelScores, an online library for contemporary classical music. The BabelScores pdf works, composed in standard engraving softwares, will be animated using INScore and video editors, in order to find strategies for animation or dynamic display of the unfolding of time, originally represented statically on the page.

Jonathan Bell, Dominique Fober, Daniel Fígols-Cuevas, Pedro Garcia-Velasquez

Auditory Perception and Cognition - From the Ear to the Body

Frontmatter
Modeling Human Experts’ Identification of Orchestral Blends Using Symbolic Information

Orchestral blend happens when sounds coming from two or more instruments are perceived as a single sonic stream. Several studies have suggested that different musical properties contribute to create such an effect. We developed models to identify orchestral blend effects from symbolic information taken from scores based on calculations related to three musical parameters, namely onset synchrony, pitch harmonicity, and parallelism in pitch and dynamics. In order to evaluate the performance of the models, we applied them to different orchestral pieces and compared the outputs with human experts’ ratings available in the Orchestration Analysis and Research Database (Orchard). Using different thresholds for the three parameters under consideration, the models were able to successfully retrieve 81% of the instruments involved in an orchestral blend on average. These results suggest that symbolic representation of music conveys perceptual information. However, further developments including audio analyses to take into account timbral properties could alleviate some of the current limitations.

Aurélien Antoine, Philippe Depalle, Philippe Macnab-Séguin, Stephen McAdams
The Effect of Auditory Pulse Clarity on Sensorimotor Synchronization

This study investigates the relationship between auditory pulse clarity and sensorimotor synchronization performance, along with the influence of musical training. 29 participants walked in place to looped drum samples with varying degrees of pulse clarity, which were generated by adding artificial reverberation and measured through fluctuation spectrum peakiness. Experimental results showed that reducing auditory pulse clarity affected phase matching through significantly higher means and standard deviations in asynchrony across musical sophistication groups. Referent period matching ability was also degraded, and non-musicians were impacted more than musicians. Subjective ratings of required active concentration also increased with decreasing pulse clarity. These findings point to the importance of clear and distinct pulses to timing performance in synchronization tasks such as music and dance.

Prithvi Kantan, Rares Stefan Alecu, Sofia Dahl
A Proposal of Emotion Evocative Sound Compositions for Therapeutic Purposes

Recognition and understanding of emotions is a path for self healing. We have worked with Mandalas of Emotions, derived from the Traditional Chinese Medicine (TCM), as a complementary therapy. In this paper, we present the conceptual framework related to the creation of sound collages for the five elements of TCM and assessment of these compositions by experienced holistic therapists. Results present quantitative data, according to scales for relaxation, arousal and valence, and qualitative data from transcription and analysis of the recorded responses of volunteers. In our study, the most common perceptions were warmth, irritation, peace and fear. The innovation of this proposal may stimulate further research on emotion-evoking sounds, and in sound composition.

Gabriela Salim Spagnol, Li Hui Ling, Li Min Li, Jônatas Manzolli
Why People with a Cochlear Implant Listen to Music

The cochlear implant (CI) is the most successful neural prosthetic device in the market. It allows hundreds of thousands of people around the world to regain a sense of hearing. However, unlike a pair of glasses that can restore vision perfectly, the CI still has some shortcomings for non-speech sounds such as music and environmental sounds. Many studies have shown that most CI users have great difficulties perceiving pitch differences or recognizing simple melodies without words or rhythmical cues. Consequently, CI users report finding music less pleasant compared to their pre-deafness period. Despite this, many of those users do not entirely reject music, and it is not uncommon to see young CI users listening to music all day, or even playing an instrument. Listening to music is an experience that arises from more than the sum of the sensations induced by the basic elements of music: pitch, timbre and rhythm. Listening to music is a pleasant experience because it prompts high-level cognitive aspects such as emotional reactions, needs to dance, or the feeling of musical tension. Therefore, CI users still engaged in musical activities might experience some of these high-level features. In this paper, I will review recent studies on music perception in CI listeners and demonstrate that, although most CI users have difficulties with perceiving pitch, additional music cues such as tempo and dynamic range might contribute positively to their enjoyment of music.

Jérémy Marozeau
The Deaf Musical Experience
Bodily and Visual Specificities: Corpaurality and Vusicality

This paper focuses on the bodily and visual specificities of the Deaf musical experience, by first focusing on the investigation of a fundamental principle of the human experience, the corpaurality, that engages to consider the sono-sensitive bodily qualities and the natural hearing modalities of Deaf; and secondly considering the visual dimensions of music, based on the Deaf practices, that reveal a denormalized musical expression, namely the vusicality.

Sylvain Brétéché
How Would You Reproduce a Synthetic Sound of an Ellipse in Water? A Phenomenological Investigation of Pre-reflexive Contents of Consciousness

This article describes a listening experiment based on elicitation interviews that aims at describing the conscious experience of a subject submitted to a perceptual stimulation. As opposed to traditional listening experiments in which subjects are generally influenced by closed or suggestive questions and limited to predefined, forced choices, elicitation interviews make it possible to get a deeper insight into the listener’s perception, in particular to the pre-reflexive content of the conscious experiences. Inspired by previous elicitation interviews during which subjects passively listened to sounds, this experience is based on an active task during which the subjects were asked to reproduce a sound with a stylus on a graphic tablet that controlled a synthesis model. The reproduction was followed by an elicitation interview. The trace of the graphic gesture as well as the answers recorded during the interview were then analyzed. Results revealed that the subjects varied their focus towards both the evoked sound source, and intrinsic sound properties and also described their sensations induced by the experience.

Jean Vion-Dury, Marie Degrandi, Gaëlle Mougin, Thomas Bordonné, Sølvi Ystad, Richard Kronland-Martinet, Mitsuko Aramaki

The Process of Sound Design

Frontmatter
Exploring Design Cognition in Voice-Driven Sound Sketching and Synthesis

Conceptual design and communication of sonic ideas are critical, and still unresolved aspects of current sound design practices, especially when teamwork is involved. Design cognition studies in the visual domain represent a valuable resource to look at, to better comprehend the reasoning of designers when they approach a sound-based project. A design exercise involving a team of professional sound designers is analyzed, and discussed in the framework of the Function-Behavior-Structure ontology of design. The use of embodied sound representations of concepts fosters team-building and a more effective communication, in terms of shared mental models.

Stefano Delle Monache, Davide Rocchesso
Morphing Musical Instrument Sounds with the Sinusoidal Model in the Sound Morphing Toolbox

Sound morphing stands out among the sound transformation techniques in the literature due to its creative and research potential. The aim of sound morphing is to gradually blur the categorical distinction between the source and target sounds by blending sensory attributes. As such, the focus and ultimate challenge of most sound morphing techniques is to interpolate across dimensions of timbre perception to achieve the desired result. There are several sound morphing proposals in the literature with few open-source implementations freely available, making it difficult to reproduce the results, compare models, or simply use them in other applications such as music composition, sound design, and timbre research. This work describes how to morph musical instrument sounds with the sinusoidal model using the sound morphing toolbox (SMT), a freely available and open-source piece of software. The text describes the audio processing steps required to morph sounds with the SMT using a step-by-step example to illustrate the need for and the result of each step. The SMT contains implementations of a sound morphing algorithm in MATLAB ® that were designed to be as easy as possible to understand and use, giving the user control over the result and full customization.

Marcelo Caetano
Mapping Sound Properties and Oenological Characters by a Collaborative Sound Design Approach – Towards an Augmented Experience

The paper presents a specific sound design process implemented upon a collaboration with an important stakeholder of the wine (Champagne) industry. The goal of the project was to link sound properties with oenological dimensions in order to compose a sonic environment able to realise a multisensory experience during the wine tasting protocol. This creation has resulted from a large scale methodological approach based on the semantic transformation concept (from wine words to sound words) and deployed by means of a codesign method – after having shared respective skills of each field (sound and oenology). A precise description of the workflow is detailed in the paper, The outcomes of the work are presented, either in terms of realisation or conceptual knowledge acquisition. Then, future perspectives for the following of the work are sketched, especially regarding the notion of evaluation. The whole approach is finally put in the broad conceptual framework of ‘sciences of sound design’ that is developed and argued in the light of this study.

Nicolas Misdariis, Patrick Susini, Olivier Houix, Roque Rivas, Clément Cerles, Eric Lebel, Alice Tetienne, Aliette Duquesne
Kinetic Design
From Sound Spatialisation to Kinetic Music

This paper explores the process of kinetic music design. The first part of this paper presents the concept of kinetic music. The second part presents the sound design and compositional process of this type of music. The third part presents some excerpts from the composition logbook of a piece called Kinetic Design to illustrate the process of kinetic design as work in progress. This paper focuses on the question of sound spatialisation from a theoretical, as well as an empirical, point of view, through the experience and experiments of an electroacoustic music composer trying to make the imaginary concept of kinetic music real. It is a form of research by design, or research by doing. The kinetic design project examined here is the first time an experimental approach of research by design has been applied to kinetic music.

Roland Cahen

Sonic Interaction for Immersive Media - Virtual and Augmented Reality

Frontmatter
Designing Soundscapes for Alzheimer’s Disease Care, with Preliminary Clinical Observations

Acoustic environment is a prime source of conscious and unconscious information which allows listeners to place themselves, to communicate, to feel, to remember. Recently, there has been a growing interest to the acoustic environment and its perceptual counterparts of care facilities. In this contribution, the authors describe the process of designing a new audio interactive apparatus for Alzheimer’s Disease care in the context of an active multidisciplinary research project led by a sound designer since 2018, in collaboration with a residential longterm care (EHPAD) in France, a geriatrician, a gerontologist, psychologists and caregivers. The apparatus, named «Madeleines Sonores» in reference to Proust’s madeleine, has been providing virtual soundscapes for two years 24/7 to elderly people suffering from Alzheimer disease. The configuration and sound processes of the apparatus are presented in relation to Alzheimer Disease care. Preliminary psychological and clinical observations are discussed in relation to dementia and to the activity of caring to evaluate the benefits of such a disposal in Alzheimer’s disease therapy and in caring dementia.

Frédéric Voisin, Arnaud Bidotti, France Mourey
ARLooper: A Mobile AR Application for Collaborative Sound Recording and Performance

This paper introduces ARLooper, an AR-based iOS application for multi-user sound recording and performance, that aims to explore the possibility of using mobile AR technology in creating novel musical interfaces and collaborative audiovisual experience. ARLooper allows the user to record sound through microphones in mobile devices and, at the same time, visualizes and places recorded sounds as 3D waveforms in an AR space. The user can play, modify, and loop the recorded sounds with several audio filters attached to them. Since ARLooper generates the world map information through iOS ARKit’s tracking technique called visual-inertial odometry which tracks the real world and a correspondence between real and AR spaces, it enables multiple users to connect to the same AR space by sharing and synchronize the world map data. In this shared AR space, the user can see each other’s 3D waveforms and activities, such as selection and manipulation, as a result, having a potential of collaborative AR performance.

Sihwa Park
Singing in Virtual Reality with the Danish National Children’s Choir

In this paper we present a Virtual Reality (VR) system that allows a user to sing together with the Danish National Children choir. The system was co-designed together with psychologists, in order to be adopted to prevent and cope with social anxiety. We present the different elements of the system, as well as a preliminary evaluation which shows the potential of the system as a tool to help coping with social anxiety.

Stefania Serafin, Ali Adjorlu, Lars Andersen, Nicklas Andersen

Musical Interaction: Embodiment, Improvisation, Collaboration

Frontmatter
Embodied Cognition in Performers of Large Acoustic Instruments as a Method of Designing New Large Digital Musical Instruments

We present The Large Instrument Performers Study, an interview-based exploration into how large scale acoustic instrument performers navigate the instrument’s size-related aesthetic features during the performance. Through the conceptual frameworks of embodied music cognition and affordance theory, we discuss how the themes that emerged in the interview data reveal the ways size-related aesthetic features of large acoustic instruments influence the instrument performer’s choices; how large scale acoustic instruments feature microscopic nuanced performance options; and how despite the preconception of large scale acoustic instruments being scaled up versions of the smaller instrument with the addition of a lower fundamental tone, the instruments offer different sonic and performative features to their smaller counterparts and require precise gestural control that is certainly not scaled up. This is followed by a discussion of how the study findings could influence design features in new large scale digital musical instruments to result in more nuanced control and timbrally rich instruments, and better understanding of how interfaces and instruments influence performers’ choices and as a result music repertoire and performance.

Lia Mice, Andrew P. McPherson
Augmentation of Sonic Meditation Practices: Resonance, Feedback and Interaction Through an Ecosystemic Approach

This paper describes the design and creation of an interactive sound environment project, titled dispersion.eLabOrate. The system is defined by a ceiling array of microphones, audio input analysis, and synthesis directly driven by this analysis. Created to augment a Deep Listening performative environment, this project explores the role that interactive installations can fulfill within a structured listening context. Echoing, modulating, and extending what it hears, the system generates an environment in which its output is a product of ambient sound, feedback, and participant input. Relating to and building upon the ecosystemic model, we discuss the benefit of designing for participant incorporation within such a responsive listening environment.

Rory Hoy, Doug Van Nort
Gesture-Timbre Space: Multidimensional Feature Mapping Using Machine Learning and Concatenative Synthesis

This chapter explores three systems for mapping embodied gesture, acquired with electromyography and motion sensing, to sound synthesis. A pilot study using granular synthesis is presented, followed by studies employing corpus-based concatenative synthesis, where small sound units are organized by derived timbral features. We use interactive machine learning in a mapping-by-demonstration paradigm to create regression models that map high-dimensional gestural data to timbral data without dimensionality reduction in three distinct workflows. First, by directly associating individual sound units and static poses (anchor points) in static regression. Second, in whole regression a sound tracing method leverages our intuitive associations between time-varying sound and embodied movement. Third, we extend interactive machine learning through the use of artificial agents and reinforcement learning in an assisted interactive machine learning workflow. We discuss the benefits of organizing the sound corpus using self-organizing maps to address corpus sparseness, and the potential of regression-based mapping at different points in a musical workflow: gesture design, sound design, and mapping design. These systems support expressive performance by creating gesture-timbre spaces that maximize sonic diversity while maintaining coherence, enabling reliable reproduction of target sounds as well as improvisatory exploration of a sonic corpus. They have been made available to the research community, and have been used by the authors in concert performance.

Michael Zbyszyński, Balandino Di Donato, Federico Ghelli Visi, Atau Tanaka
Developing a Method for Identifying Improvisation Strategies in Jazz Duos

The primary purpose of this paper is to develop a method to investigate the communication process between musicians performing improvisation in jazz, and to apply this method in a first case study. In jazz, applied improvisation theory usually consists of scale and harmony studies within quantized rhythmic patterns. There is a need to expand the concept of theory to include areas related to communication and strategic choices. To study improvisational strategies we recorded duos performed by the first author at the piano together with different horn players. Backing tracks were provided by prerecorded material from an ensemble with piano, bass and drums The duo recording was transcribed using music production software. The resulting score and the audio recording were then used during an in-depth interview of the horn player to identify underlying strategies. The strategies were coded according to previous research and could be classified according to five different categories. The paper contributes to jazz improvisation theory towards embracing artistic expressions and choices made in real-life musical situations.

Torbjörn Gulz, Andre Holzapfel, Anders Friberg
Instruments and Sounds as Objects of Improvisation in Collective Computer Music Practice

Collective forms of improvisation are at the heart of numerous creative processes today, in a vast range of cultures, practises and artistic disciplines, each one bearing its own definitions, traditions and customs. In this contribution, we raise the question of collective sound improvisation involving digital technologies on two levels: first, by discussing the possible nature of improvisation in relation to digital artistic creation as a transversal notion that permeates through multiple fields of scientific research and artistic practise, raising fundamentally different questions than those of traditional musical improvisation. Then, by presenting a practise-based study on an emergent collective computer music improvisation project involving the authors. Subjective experiences, interrogations and remarks from this shared practise are confronted on the one hand with traditional literature regarding musical improvisation, and, on the other hand, placed within a broader scope of improvisation involving digital technologies. In particular, we will elaborate on using the computer instrument as a means to improvise both tools and sounds in one continuous flow.

Jérôme Villeneuve, James Leonard, Olivier Tache

Jean-Claude Risset and Beyond

Frontmatter
Jean-Claude Risset’s Paradoxical Practice of Interdisciplinarity: Making Inseparable Both Activities of Scientist and Musician, While Resolutely Distinguishing the Two Fields

In 2017, Jean-Claude Risset gave his archives to the PRISM laboratory. Thereby the researchers’ community will have soon at their disposal a fund, especially interdisciplinary art and science oriented. For the moment, the archives are divided into two main parts: one within scientific research and one within artistic creation activity. More specifically, Jean-Claude Risset’s own story shaped major interdisciplinary orientations: first of all, his pioneering research at Bell Labs, then back to “French reality” (his half-failure with Ircam and his difficulties concerning Marseille-Luminy), afterwards his quest for solutions as a political lever, especially through the Art-Science-Technology’s report in 1998, and finally his turning point with his CNRS 1998 Gold Medal, consequently increasing conferences and mostly concerts. In addition, the study of material aspects (sharing activities between the laboratory and his home, place and content of documentation, etc.) is also necessary to understand “Risset’s practice” of interdisciplinarity.

Vincent Tiffon
Machine Learning for Computer Music Multidisciplinary Research: A Practical Case Study

This paper presents a multidisciplinary case study of practice with machine learning for computer music. It builds on the scientific study of two machine learning models respectively developed for data-driven sound synthesis and interactive exploration. It details how the learning capabilities of the two models were leveraged to design and implement a musical interface focused on embodied musical interaction. It then describes how this interface was employed and applied to the composition and performance of ægo, an improvisational piece with interactive sound and image for one performer. We discuss the outputs of our research and creation process, and expose our personal reflections and insights on transdisciplinary research opportunities framed by machine learning for computer music.

Hugo Scurto, Axel Chemla–Romeu-Santos
Iterative Phase Functions on the Circle and Their Projections: Connecting Circle Maps, Waveshaping, and Phase Modulation

In memoriam of Jean-Claude Risset’s recent passing, we revisit two of his contributions to sound synthesis, namely waveshaping and feedback modulation synthesis as starting points to develop the connection of a plenthora of oscillatory synthesis methods through iterative phase functions, motivated by the theory of circle maps, which describes any iterated function from the circle to itself. Circle maps have played an important role in developing the theory of dynamical systems with respect to such phenomena as mode-locking, parametric study of stability, and transitions to chaotic regimes. This formulation allows us to bring a wide range of oscillatory methods under one functional description and clarifies their relationship, such as showing that sine circle maps and feedback FM are near-identical synthesis methods.

Georg Essl
Mathematics and Music: Loves and Fights

We present different aspects of the special relationship that music has with mathematics, in particular the concepts of rigour and realism in both fields. These directions are illustrated by comments on the personal relationship of the author with Jean-Claude, together with examples taken from his own works, specially the “Duos pour un pianiste".

Thierry Paul
Zero-Emission Vehicles Sonification Strategy Based on Shepard-Risset Glissando

In this paper, we present a sonification strategy developed for electric vehicles aiming to synthetize a new engine sound to enhance the driver’s dynamic perception of the vehicle. We chose to mimic the internal combustion engine (ICE) noise by informing the driver through pitch variations. However, ICE noise pitch variations are correlated to the engine’s rotations per minute (RPM) and its dynamics is covered within a limited vehicle speed range. In order to provide the driver with extended pitch variations throughout the full vehicle speed range, we based our sonification strategy on the Shepard-Risset glissando. Such illusory infinite ascending/descending sounds enable to represent accelerations with significant pitch variations for an unlimited range of speeds. In this way, it is possible to conserve the metaphor of ICE noise with unheard gearshifts. We tested this sonification strategy in a perceptual test in a driving simulator and showed that the mapping of this acoustical feedback affects the drivers’ perception of vehicle dynamics.

Sébastien Denjean, Richard Kronland-Martinet, Vincent Roussarie, Sølvi Ystad
Backmatter
Metadaten
Titel
Perception, Representations, Image, Sound, Music
herausgegeben von
Richard Kronland-Martinet
Sølvi Ystad
Dr. Mitsuko Aramaki
Copyright-Jahr
2021
Electronic ISBN
978-3-030-70210-6
Print ISBN
978-3-030-70209-0
DOI
https://doi.org/10.1007/978-3-030-70210-6

Neuer Inhalt