Skip to main content
main-content

Über dieses Buch

Signal Processing Methods for Music Transcription is the first book dedicated to uniting research related to signal processing algorithms and models for various aspects of music transcription such as pitch analysis, rhythm analysis, percussion transcription, source separation, instrument recognition, and music structure analysis. Following a clearly structured pattern, each chapter provides a comprehensive review of the existing methods for a certain subtopic while covering the most important state-of-the-art methods in detail. The concrete algorithms and formulas are clearly defined and can be easily implemented and tested. A number of approaches are covered, including, for example, statistical methods, perceptually-motivated methods, and unsupervised learning methods. The text is enhanced by a common reference and index.

Inhaltsverzeichnis

Frontmatter

Foundations

Frontmatter

1. Introduction to Music Transcription

Abstract
Music transcription refers to the analysis of an acoustic musical signal so as to write down the pitch, onset time, duration, and source of each sound that occurs in it. In Western tradition, written music uses note symbols to indicate these parameters in a piece of music. Figures 1.1 and 1.2 show the notation of an example music signal. Omitting the details, the main conventions are that time flows from left to right and the pitch of the notes is indicated by their vertical position on the staff lines. In the case of drums and percussions, the vertical position indicates the instrument and the stroke type. The loudness (and the applied instrument in the case of pitched instruments) is normally not specified for individual notes but is determined for larger parts.
Anssi Klapuri

2. An Introduction to Statistical Signal Processing and Spectrum Estimation

Abstract
This chapter presents an overview of current signal processing techniques, most of which are applied to music transcription in the following chapters. The elements provided will hopefully help the reader. Some signal processing tools presented here are well known, and readers already familiar with these concepts may wish to skip ahead. As we only present an overview of various methods, readers interested in more depth may refer to the bibliographical references provided throughout the chapter.
This chapter is organized as follows. Section 2.1 presents the Fourier transform and some related tools: time-frequency representations and cepstral coefficients. Section 2.2 introduces basic statistical tools such as random variables, probability density functions, and likelihood functions. It also introduces estimation theory. Section 2.3 is about Bayesian estimation methods, including Monte Carlo techniques for numerical computations. Finally, Section 2.4 introduces pattern recognition methods, including support vector machines and hidden Markov models.
Manuel Davy

3. Sparse Adaptive Representations for Musical Signals

Abstract
Musical signals are, strictly speaking, acoustic signals where some aesthetically relevant information is conveyed through propagating pressure waves. Although the human auditory system exhibits a remarkable ability to interpret and understand these sound waves, these types of signals cannot be processed as such by computers. Obviously, the signals have to be converted into digital form, and this first implies sampling and quantization. In time-domain digital formats, such as the Pulse Code Modulation (PCM)—or newer formats such as one-bit oversampled bitstreams used in the Super Audio CD—audio signals can be stored, edited, and played back. However, many current signal processing techniques aim at extracting some musically relevant high-level information in (optimally) an unsupervised manner, and most of these are not directly applicable in the above-mentioned time domain. Among such semantic analysis tasks, let us mention segmentation, where ones wants to break down a complex sound into coherent sound objects; classification, where one wants to relate these sound objects to putative sound sources; and transcription, where one wants to retrieve the individual notes and their timings from the audio signals. For such algorithms, it is often desirable to transform the time-domain signals into other, better suited representations. Indeed, accord-ing to the Merrian-Webster dictionary, to ‘represent’ primarily means ‘to bring clearly before the mind’.
Laurent Daudet, Bruno Torrésani

Rhythm and Timbre Analysis

Frontmatter

4. Beat Tracking and Musical Metre Analysis

Abstract
Imagine you are sitting in a bar and your favourite song is played on the jukebox. It is quite possible that you might start tapping your foot in time to the music. This is the essence of beat tracking and it is a quite automatic and subconscious task for most humans. Unfortunately, the same is not true for computers; replicating this process algorithmically has been an active area of research for well over twenty years, with reasonable success achieved only recently.
Stephen Hainsworth

5. Unpitched Percussion Transcription

Abstract
Up until recently, work on automatic music transcription has concentrated mainly on the transcription of pitched instruments, i.e., melodies. However, during the past few years there has been a growing interest in the problem of transcription of percussive instruments. This chapter aims to give an overview of the methods used in this field ranging from the pioneering works of the 1980s to more recent systems.
Derry FitzGerald, Jouni Paulus

6. Automatic Classification of Pitched Musical Instrument Sounds

Abstract
This chapter discusses the problem of automatically identifying the musical instrument played in a given sound excerpt. Most of the research until now has been carried out using isolated sounds, but there is also an increasing amount of work dealing with instrument-labelling in more complex music signals, such as monotimbral phrases, duets, or even richer polyphonies. We first describe basic concepts related to acoustics, musical instruments, and perception, insofar as they are relevant for dealing, with the present problem. Then, we present a practical approach to this problem, with a special emphasis on methodological issues. Acoustic features, or, descriptors, as will be argued, are a keystone for the problem and therefore we devote a long section to some of the most useful ones, and we discuss strategies for selecting the best features when large sets of them are available. Several techniques for automatic classification, complementing those explained in Chapter 2, are described. Once the reader has been introduced to all the necessary tools, a review of the most relevant instrument classification systems is presented, including approaches that deal with continuous musical recordings. In the closing section, we summarize the main conclusions and topics for future research.
Perfecto Herrera-Boyer, Anssi Klapuri, Manuel Davy

Multiple Fundamental Frequency Analysis

Frontmatter

7. Multiple Fundamental Frequency Estimation Based on Generative Models

Abstract
Western tonal music is highly structured, both along the time axis and along the frequency axis. The time structure is described in other chapters of this book (see Chapter 4), and it may be exploited to build efficient beat trackers, for example. The frequency structure is also quite strong in tonal music. It has been shown since Helmholtz (and probably before) that an individual note is composed of one fundamental and several overtone partials [451], [193]. Though acoustic waveforms may vary from one musical instrument to another, and even from one performance to another with the same instrument, they can be modelled accurately using a unique mathematical model, with different parameters.
Manuel Davy

8. Auditory Model-Based Methods for Multiple Fundamental Frequency Estimation

Abstract
This chapter describes fundamental frequency (FO) estimation methods that make use of computational models of human auditory perception and especially pitch perception. At the present time, the most reliable music transcription system available is the ears and the brain of a trained musician. Compared with any artificial audio processing tool, the analytical ability of human hearing is very good for complex mixture signals: in natural acoustic environments, we are able to perceive the characteristics of several simultaneously occurring sounds, including their pitches [49]. It is therefore quite natural to pursue automatic music transcription and multiple FO estimation by investigating what happens in the human listener. Here the term multiple FO estimation means estimating the FOs of several concurrent sounds.
Anssi Klapuri

9. Unsupervised Learning Methods for Source Separation in Monaural Music Signals

Abstract
Computational analysis of polyphonic musical audio is a challenging problem. When several instruments are played simultaneously, their acoustic signals mix, and estimation of an individual instrument is disturbed by the other cooccurring sounds. The analysis task would become much easier if there was a way to separate the signals of different instruments from each other. Techniques that implement this are said to perform sound source separation. The separation would not be needed if a multi-track studio recording was available where the signal of each instrument is on its own channel. Also, recordings done with microphone arrays would allow more efficient separation based on the spatial location of each source. However, multi-channel recordings are usually not available; rather, music is distributed in stereo format. This chapter discusses sound source separation in monaural music signals, a term which refers to a one-channel signal obtained by recording with a single microphone or by mixing down several channels.
Tuomas Virtanen

Entire Systems, Acoustic and Musicological Modelling

Frontmatter

10. Auditory Scene Analysis in Music Signals

Abstract
This chapter discusses work done in the area of music scene analysis (MSA). Generally, scene analysis is viewed as the transformation of information from a sensory input (physical entity) into concepts (psychological or perceptual entities). Therefore, MSA is defined as a process that converts an audio signal into musical concepts such as notes, chords, beats, and rhythms. Related tasks include music transcription, pitch tracking, and beat tracking; however, this chapter focuses on the auditory scene analysis (ASA) related aspect of this process and does not explore the issues of pitch and beat tracking. An important idea related to this is the distinction between physical and perceptual sounds, as explained in Section 10.1.3 below.
Kunio Kashino

11. Music Scene Description

Abstract
This chapter introduces a research approach called ‘music scene description’ [232], [225], [228], where the goal is to build a computer system that can understand musical audio signals at the level of untrained human listeners without trying to extract every musical note from music. People listening to music can easily hum the melody, clap hands in time to the musical beat, notice a phrase being repeated, and find chorus sections. The brain mechanisms underlying these abilities, however, are not yet well understood. In addition, it has been difficult to implement these abilities on a computer system, although a system with them is useful in various applications such as music information retrieval, music production/editing, and music interfaces. It is therefore an important challenge to build a music scene description system that can understand complex real-world music signals like those recorded on commercially distributed compact discs (CDs).
Masataka Goto

12. Singing Transcription

Abstract
Singing refers to the act of producing musical sounds with the human voice, and singing transcription refers to the automatic conversion of a recorded singing signal into a parametric representation (e.g., a MIDI file) by applying signal-processing methods. Singing transcription is an important topic in computational music-content analysis since it is the most natural way of human-computer interaction in the musical sense: even a musically untrained subject is usually able to hum the melody of a piece. This chapter introduces the singing transcription problem and presents an overview of the main approaches to solve it, including the current state-of-the-art singing transcription systems.
Matti Ryynänen

Backmatter

Weitere Informationen