nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

15. Audio Source Separation in a Musical Context

verfasst von : Bryan Pardo, Zafar Rafii, Zhiyao Duan

Erschienen in: Springer Handbook of Systematic Musicology

Verlag: Springer Berlin Heidelberg

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

When musical instruments are recorded in isolation, modern editing and mixing tools allow correction of small errors without requiring a group to re-record an entire passage. Isolated recording also allows rebalancing of levels between musicians without re-recording and application of audio effects to individual instruments. Many of these techniques require (nearly) isolated instrumental recordings to work. Unfortunately, there are many recording situations (e. g., a stereo recording of a 10-piece ensemble) where there are many more instruments than there are microphones, making many editing or remixing tasks difficult or impossible.

Audio source separation is the process of extracting individual sound sources (e. g., a single flute) from a mixture of sounds (e. g., a recording of a concert band using a single microphone). Effective source separation would allow application of editing and remixing techniques to existing recordings with multiple instruments on a single track.

In this chapter we will focus on a pair of source separation approaches designed to work with music audio. The first seeks the repeated elements in the musical scene and separates the repeating from the nonrepeating. The second looks for melodic elements, pitch tracking and streaming the audio into separate elements. Finally, we consider informing source separation with information from the musical score.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Convolution, Fourier Analysis, Cross-Correlation and Their Interrelationship

Nächstes Kapitel Automatic Score Extraction with Optical Music Recognition (OMR)

15.1

P. Common, C. Jutten (Eds.): Handbook of Blind Source Separation: Independent Component Analysis and Applications, 1st edn. (Academic, Oxford 2010)

15.2

T. Virtanen: Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, IEEE Trans. Audio Speech Lang. Process. 15(3), 1066–1074 (2007)CrossRef

15.3

D. FitzGerald, M. Cranitch, E. Coyle: Non-negative tensor factorisation for sound source separation. In: Irish Signals and Syst. Conf., Dublin (2005)

15.4

P. Smaragdis, B. Raj, M.V.S. Shashanka: A probabilistic latent variable model for acoustic modeling. In: NIPS Workshop Adv. Modeling Acoust. Process., Whistler (2006)

15.5

P.-S. Huang, S.D. Chen, P. Smaragdis: Singing-voice separation from monaural recordings using robust principal component analysis. In: 37th Int. Conf. Acoustics, Speech and Signal Process., Kyoto (2012)

15.6

H. Schenker: Harmony, Vol. 1 (Univ. Chicago Press, Chicago 1980)

15.7

N. Ruwet, M. Everist: Methods of analysis in musicology, Music Anal. 6(1/2), 3–9 (1987)CrossRef

15.8

A. Ockelford: Repetition in Music: Theoretical and Metatheoretical Perspectives, Royal Musical Association Monographs, Vol. 13, 2005)

15.9

M.A. Bartsch: To catch a chorus using chroma-based representations for audio thumbnailing. In: IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz (2001)

15.10

M. Cooper, J. Foote: Automatic music summarization via similarity analysis. In: 3rd Int. Conf. Music Inf. Retr., Paris (2002)

15.11

G. Peeters: Deriving musical structures from signal analysis for music audio summary generation: Sequence and state approach, Comput. Music Modeling Retr. 2771, 143–166 (2004)CrossRef

15.12

J. Foote: Automatic audio segmentation using a measure of audio novelty. In: IEEE Int. Conf. Multimedia and Expo, New York (2000)

15.13

J. Foote, S. Uchihashi: The beat spectrum: A new approach to rhythm analysis. In: IEEE Int. Conf. Multimedia and Expo, Tokyo (2001)

15.14

K. Yoshii, M. Goto, H.G. Okuno: Drum sound identification for polyphonic music using template adaptation and matching methods. In: ISCA Tutor. Res. Workshop on Stat. Percept. Audio Process., Jeju (2004)

15.15

R.B. Dannenberg: Listening to Naima: An automated structural analysis of music from recorded audio. In: Int. Comput. Music Conf., Gothenburg (2002)

15.16

R.B. Dannenberg, M. Goto: Music structure analysis from acoustic signals, Handbook of Signal Process, Acoustics 1, 305–331 (2009)

15.17

J. Paulus, M. Müller, A. Klapuri: Audio-based music structure analysis. In: 11th Int. Soc. Music Inf. Retr., Utrecht (2010)

15.18

J.H. McDermott, D. Wrobleski, A.J. Oxenham: Recovering sound sources from embedded repetition, Proc. Nat. Acad. Sci. USA 108(3), 1188–1193 (2011)CrossRef

15.19

A. Bregman, C. Jutten: Auditory Scene Analysis: The Perceptual Organization of Sound (MIT Press, Cambridge 1994)

15.20

Interactive Audio Lab of Northwestern University: http://music.eecs.northwestern.edu/research.php?project=repet

15.21

Z. Rafii, B. Pardo: A simple music–voice separation system based on the extraction of the repeating musical structure. In: 36th Int. Conf. Acoust. Speech Signal Process., Prague (2011)

15.22

Z. Rafii, B. Pardo: REpeating pattern extraction technique (REPET): A simple method for music–voice separation, IEEE Trans. Audio Speech Lang. Process. 21(1), 71–82 (2013)CrossRef

15.23

Z. Rafii, D.L. Sun, F.G. Germain, G.J. Mysore: Combining modeling of singing voice and background music for automatic separation of musical mixtures. In: 14th Int. Soc. Music Inf. Retr., Curitiba (2013)

15.24

A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, G. Richard: Adaptive filtering for music–voice separation exploiting the repeating musical structure. In: 37th Int. Conf. Acoustics, Speech and Signal Process., Kyoto (2012)

15.25

Z. Rafii, B. Pardo: Music–voice separation using the similarity matrix. In: 13th Int. Soc. Music Inf. Retr., Porto (2012)

15.26

J. Foote: Visualizing music and audio using self-similarity. In: 7th ACM Int. Conf. Multimedia, Orlando (1999)

15.27

Z. Rafii, B. Pardo: Online REPET-SIM for real-time speech enhancement. In: 38th Int. Conf. Acoust. Speech and Signal Process., Vancouver (2013)

15.28

D. FitzGerald: Vocal separation using nearest neighbours and median filtering. In: 23nd IET Irish Signals and Syst. Conf., Maynooth (2012)

15.29

Z. Duan, B. Pardo, C. Zhang: Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions, IEEE Trans. Audio Speech Lang. Process. 18(8), 2121–2133 (2010)CrossRef

15.30

Z. Duan, J. Han, B. Pardo: Multi-pitch streaming of harmonic sound mixtures, IEEE Trans. Audio Speech Lang. Process. 22(1), 1–13 (2014)CrossRef

15.31

G.E. Poliner, D.P.W. Ellis: A discriminative model for polyphonic piano transcription, EURASIP J. Adv. Signal Process. 2007, 48317-1–48317-9 (2007), https://doi.org/10.1155/2007/48317 CrossRefMATH

15.32

M. Davy, S.J. Godsill, J. Idier: Bayesian analysis of polyphonic western tonal music, J. Acoustical Soc. Am. 119, 2498–2517 (2006)CrossRef

15.33

E. Vincent, M.D. Plumbley: Efficient Bayesian inference for harmonic models via adaptive posterior factorization, Neurocomputing 72, 79–87 (2008)CrossRef

15.34

K. Kashino, H. Murase: A sound source identification system for ensemble music based on template adaptation and music stream extraction, Speech Commun. 27(3--4), 337–349 (1999)CrossRef

15.35

M. Goto: A real-time music-scene-description system: Predominant-F0 estimation for detecting melody and bass lines in real-world audio signals, Speech Commun. 43(4), 311–329 (2004)CrossRef

15.36

H. Kameoka, T. Nishimoto, S. Sagayama: A multipitch analyzer based on harmonic temporal structured clustering, IEEE Trans. Audio Speech Lang. Process. 15(3), 982–994 (2007)CrossRef

15.37

S. Saito, H. Kameoka, K. Takahashi, T. Nishimoto, S. Sagayama: Specmurt analysis of polyphonic music signals, IEEE Trans. Speech Audio Process. 16(3), 639–650 (2008)CrossRef

15.38

J.-L. Durrieu, G. Richard, B. David: Singer melody extraction in polyphonic signals using source separation methods. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2008) pp. 169–172

15.39

V. Emiya, R. Badeau, B. David: Multipitch estimation of quasi-harmonic sounds in colored noise. In: Proc. Int. Conf. Digital Audio Effects (DAFx) (2007)

15.40

G. Reis, N. Fonseca, F. Ferndandez: Genetic algorithm approach to polyphonic music transcription. In: Proc. IEEE Int. Symp. Intell. Signal Process (2007)

15.41

T. Tolonen, M. Karjalainen: A computationally efficient multipitch analysis model, IEEE Trans. Speech Audio Process. 8(6), 708–716 (2000)CrossRef

15.42

A. de Cheveigné, H. Kawahara: Multiple period estimation and pitch perception model, Speech Commun. 27, 175–185 (1999)CrossRef

15.43

A. Klapuri: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness, IEEE Trans. Speech Audio Process. 11(6), 804–815 (2003)CrossRef

15.44

A. Klapuri: Multiple fundamental frequency estimation by summing harmonic amplitudes. In: Proc. ISMIR (2006) pp. 216–221

15.45

R.J. Leistikow, H.D. Thornburg, J.S. Smith, J. Berger: Bayesian identification of closely-spaced chords from single-frame STFT peaks. In: Proc. Int. Conf. Digital Audio Effects (DAFx’04), Naples (2004) pp. 228–233

15.46

A. Pertusa, J.M. Inesta: Multiple fundamental frequency estimation using Gaussian smoothness. In: Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP) (2008) pp. 105–108

15.47

C. Yeh, A. Röbel, X. Rodet: Multiple fundamental frequency estimation of polyphonic music signals. In: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP) (2005) pp. 225–228

15.48

J.O. Smith: Spectral Audio Signal Processing, http://ccrma.stanford.edu/~jos/sasp/ (2014)

15.49

Z. Duan, Y. Zhang, C. Zhang, Z. Shi: Unsupervised single-channel music source separation by average harmonic structure modeling, IEEE Trans. Audio Speech Lang. Process. 16(4), 766–778 (2008)CrossRef

15.50

J.O. Smith, X. Serra: Parshl: An analysis–synthesis program for non-harmonic sounds based on a sinusoidal representation. In: Proc. Int. Comput. Music Conf. (ICMC) (1987)

15.51

L. Fritts, University of Iowa: http://theremin.music.uiowa.edu/MIS.html

15.52

A. de Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music, J. Acoustical Soc. Am. 111, 1917–1930 (2002)CrossRef

15.53

M. Ryynanen, A. Klapuri: Polyphonic music transcription using note event modeling. In: Proc. IEEE Workshop on Appl. Signal Process. Audio Acoustics (WASPAA) (2005) pp. 319–322

15.54

W.-C. Chang, A.W.Y. Su, C. Yeh, A. Robel, X. Rodet: Multiple-F0 tracking based on a high-order HMM model. In: Proc. Int. Conf. Digital Audio Effects (DAFx) (2008)

15.55

Z. Duan, B. Pardo, L. Daudet: A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2014)

15.56

K. Wagstaff, C. Cardie: Clustering with instance-level constraints. In: Proc. Int. Conf. Machine Learning (ICML) (2000) pp. 1103–1110

15.57

K. Wagstaff, C. Cardie, S. Rogers, S. Schroedl: Constrained k-means clustering with background knowledge. In: Proc. Int. Conf. Machine Learning (ICML) (2001) pp. 577–584

15.58

I. Davidson, S.S. Ravi, M. Ester: Efficient incremental constrained clustering. In: Proc. ACM Conf. Knowl. Discovery and Data Mining (KDD) (2007) pp. 240–249

15.59

Z. Duan, B. Pardo: Soundprism: An online system for score-informed source separation of music audio, IEEE J. Selected Topics Signal Process. 5(6), 1205–1215 (2011)CrossRef

15.60

S. Ewert, M. Müller, P. Grosche: High resolution audio synchronization using chroma onset features. In: Proc. IEEE Int. Conf. Acoustics Speech Signal Process. (ICASSP) (2009) pp. 1869–1872

15.61

C. Joder, S. Essid, G. Richard: A conditional random field framework for robust and scalable audio-to-score matching, IEEE Trans. Audio Speech Lang. Process. 19(8), 2385–2397 (2011)CrossRef

15.62

A. Doucet, N. de Freitas, N.J. Gordon (Eds.): Sequential Monte Carlo Methods in Practice (Springer, New York 2001)MATH

15.63

M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp: A tutorial on particle filters for online nonlinear–non-Gaussian Bayesian tracking, IEEE Trans. Signal Process. 50(2), 174–188 (2002)CrossRef

Titel: Audio Source Separation in a Musical Context
verfasst von: Bryan Pardo
Zafar Rafii
Zhiyao Duan
Verlag: Springer Berlin Heidelberg
Buch: Springer Handbook of Systematic Musicology
Print ISBN: 978-3-662-55002-1

Electronic ISBN: 978-3-662-55004-5

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-662-55004-5_15

Premium Partner

Marktübersichten

Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.

Zur Marktübersicht