research-article

Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping

Authors:
Francisco Jose Rodriguez-Serrano

University of Jaén, Spain

University of Jaén, Spain
View Profile

,
Julio Jose Carabias-Orti

Universitat Pompeu Fabra, Barcelona, Spain

Universitat Pompeu Fabra, Barcelona, Spain
View Profile

,
Pedro Vera-Candeas

University of Jaén, Spain

University of Jaén, Spain
View Profile

,
Damian Martinez-Munoz

University of Jaén, Spain

University of Jaén, Spain
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 8 Issue 2Article No.: 22pp 1–20https://doi.org/10.1145/2926717

Published:21 October 2016Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

In this article, we present an online score following framework designed to deal with automatic accompaniment. The proposed framework is based on spectral factorization and online Dynamic Time Warping (DTW) and has two separated stages: preprocessing and alignment. In the first one, we convert the score into a reference audio signal using a MIDI synthesizer software and we analyze the provided information in order to obtain the spectral patterns (i.e., basis functions) associated to each score unit. In this work, a score unit represents the occurrence of concurrent or isolated notes in the score. These spectral patterns are learned from the synthetic MIDI signal using a method based on Non-negative Matrix Factorization (NMF) with Beta-divergence, where the gains are initialized as the ground-truth transcription inferred from the MIDI. On the second stage, a non-iterative signal decomposition method with fixed spectral patterns per score unit is used over the magnitude spectrogram of the input signal resulting in a distortion matrix that can be interpreted as the cost of the matching for each score unit at each frame. Finally, the relation between the performance and the musical score times is obtained using a strategy based on online DTW, where the optimal path is biased by the speed of interpretation. Our system has been evaluated and compared to other systems, yielding reliable results and performance.

Supplemental Material

Available for Download

zip

rodriguez-serrano.zip (579.4 KB)

Supplemental movie, appendix, image and software files for, Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping

References

C. Raphael. 2010. Music plus one and machine learning. In Proceedings of the 27th International Conference on Machine Learning. 21--28.Google ScholarDigital Library
A. Arzt. 2008. Score following with dynamic time warping. An automatic page-turner. Master’s Thesis, Vienna University of Technology, Vienna, Austria.Google Scholar
A. Arzt and G. Widmer. 2010. Towards effective any-time music tracking. In Proceedings of the Starting AI Researchers Symposium (STAIRS). 24--36. Google ScholarDigital Library
A. Arzt and G. Widmer. 2010. Real-time music tracking using tempo-aware on-line dynamic time warping. Music Information Retrieval Evaluation eXchange.Google Scholar
A. Arzt and G. Widmer. 2015. Real-time music tracking using multiple performances as a reference. In Proceedings of ISMIR. 357--363.Google Scholar
D. Berndt and J. Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases. 229--248. Google ScholarDigital Library
J. J. Carabias-Orti, F. J. Rodriguez-Serrano, P. Vera-Candeas, F. J. Canadas-Quesada, and N. Ruiz-Reyes. 2013. Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription. Engineering Applications of Artificial Intelligence., Retrieved from http://www.sciencedirect.com/science/article/pii/S0952197613000523.Google Scholar
J. J. Carabias-Orti, F. J. Rodriguez-Serrano, P. Vera-Candeas, P. Cabanas-Molero, and N. Ruiz-Reyes. 2013. RealTime score follower using spectral factorization and online time warping. Music Information Retrieval Evaluation eXchange.Google Scholar
J. J. Carabias-Orti, F. J. Rodriguez-Serrano, P. Vera-Candeas, N. Ruiz-Reyes, and F. J. Cañadas-Quesada. 2015. An audio to score alignment framework using spectral factorization and dynamic time warping. In Proceedings of ISMIR. 742--748.Google Scholar
C. Chen and R. Jang. 2011. “sf1” Music Information Retrieval Evaluation eXchange.Google Scholar
C. Chen. 2012. “sf1” Music Information Retrieval Evaluation eXchange.Google Scholar
C. Chen. 2013. “sf” Music Information Retrieval Evaluation eXchange.Google Scholar
A. de Cheveign and H. Kawahara. 2002. Yin, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Amer., 111, 1917--1930.Google ScholarCross Ref
A. Cont. 2006. Realtime audio to score alignment for polyphonic music instruments using sparse non-negative constraints and hierarchical HMMs. In Proceedings of IEEE ICASSP, Toulouse, France.Google ScholarCross Ref
A. Cont, D. Schwarz, N. Schnell, and C. Raphael. 2007. Evaluation of real- time audio-to-score alignment. In Proceedings of the International Conference on Music Information Retrieval (ISMIR).Google Scholar
A. Cont. 2008. ANTESCOFO: Anticipatory synchronization and control of interactive parameters in computer music. International Computer Music Conference (ICMC), Belfast, Ireland, 33--40.Google Scholar
A. Cont. 2010. A coupled duration-focused architecture for real-time music-to-score alignment. IEEE Trans. Pattern Anal. Mach. Intell. 32, 6, 974--987. Google ScholarDigital Library
P. Cuvillier and A. Cont. 2014. Coherent time modeling of semi-Markov models with application to realtime audio-to-score alignment. In Proceedings of the 2014 IEEE International Workshop on Machine Learning for Signal Processing. 16.Google Scholar
R. B. Dannenberg. 1984. An on-line algorithm for real-time accompaniment. In Proceedings of the International Computer Music Conference (ICMC). 193--198, 398.Google Scholar
R. Dannenberg and N. Hu. 2003. Polyphonic audio matching for score following and intelligent audio editors. In Proceedings of the International Computer Music Conference. International Computer Music Association, San Francisco, 27--34.Google Scholar
R. B. Dannenberg. 2007. An intelligent multi-track audio editor. In Proceedings of the International Computer Music Conference (ICMC), 2, August, 89--94.Google Scholar
S. Dixon. 2005. Live tracking of musical performances using on-line time warping. In Proc. International Conference on Digital Audio Effects (DAFx), Madrid, Spain, 92--97.Google Scholar
Z. Duan and B. Pardo. 2010. A real-time score follower for Mirex 2010. Music Information Retrieval Evaluation eXchange.Google Scholar
Z. Duan and B. Pardo. 2011. Soundprism: An online system for score-informed source separation of music audio. IEEE Journal of Selected Topics in Signal Process, 5, 6, 1205--1215.Google ScholarCross Ref
D. Ellis. 2003. Dynamic time warp (DTW) in Matlab. Retrieved from http://www.ee.columbia.edu/dpwe/resources/matlab/dtw/.Google Scholar
Ewert, Sebastian, Meinard Muller, and Roger B. Dannenberg. 2011. Towards reliable partial music alignments using multiple synchronization strategies. Adaptive Multimedia Retrieval. Understanding Media and Adapting to the User. Springer, Berlin, 35--48. Google ScholarDigital Library
C. Févotte and J. Idier. 2011. Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23, 9, 2421--2456.Google ScholarDigital Library
C. Fremerey, M. Mller, and M. Clausen. 2010. Handling repeats and jumps in score-performance synchronization. In Proceedings of ISMIR. 243--248.Google Scholar
J. Fritsch and M. Plumbley. 2013. Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis. In Proc. ICASSP, Vancouver, Canada.Google Scholar
J. F. Gemmeke, T. Virtanen, and A. Hurmalainen. 2011. Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Transactions on Audio, Speech and Language Processing 19, 7, 2067--2080. Google ScholarDigital Library
N. Hu, R. B. Dannenberg, and G. Tzanetakis. 2009. Polyphonic audio matching and alignment for music retrieval. In Proceedings of IEEE WASPAA, 185--188.Google Scholar
O. Izmirli and R. Dannenberg. 2010. Understanding features and distance functions for music sequence alignment. In Proceedings of ISMIR. 411--416.Google Scholar
F. Itakura. 1975. Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 23, 52--72.Google ScholarCross Ref
C. Joder, S. Essid, and G. Richard. 2011. A conditional random field frame- work for robust and scalable audio-to-score matching. IEEE Trans. Speech, Audio and Lang. Process., 19, 8. Google ScholarDigital Library
C. Joder, S. Essid, and G. Richard. 2013. Learning optimal features for polyphonic audio-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing, 21, 10, 2118--2128. Google ScholarDigital Library
H. Kaprykowsky and X. Rodet. 2006. Globally optimal short-time dynamic time warping applications to score to audio alignment. In IEEE ICASSP.Google Scholar
D. D. Lee and H. S. Seung. 2000. Algorithms for non-negative matrix factorization. In Proceedings of Neural Information Processing Systems, Denver, CO. Google ScholarDigital Library
R. J. McNab, L. A. Smith, I. H. Witten, C. L. Henderson, and S. J. Cunningham. 1996. Towards the digital music library: Tune retrieval from acoustic input. In DL 96: Proceedings of the 1st ACM International Conference on Digital Libraries. ACM. New York, NY. 11--18. Google ScholarDigital Library
N. Montecchio and N. Orio. 2009. A discrete filterbank approach to audio to score matching for score following. In Proc. ISMIR, 495--500.Google Scholar
M. Mueller, H. Mattes, and F. Kurth. 2006. An efficient multiscale approach to audio synchronization. In Proceedings of the 7th International Conference on Music Information Retrieval, Victoria, Canada.Google Scholar
N. Orio and D. Schwarz. 2001. Alignment of monophonic and polyphonic music to a score. In Proceedings of the International Computer Music Conference (ICMC).Google Scholar
M. Puckette. 1995. Score following using the sung voice. In Proc. ICMC, 175--178.Google Scholar
L. R. Rabiner, A. Rosenberg, and S. Levinson. 1978. Considerations in dynamic time warping algorithms for discrete word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 575--582, 1978.Google ScholarCross Ref
P. Alonso, R. Cortina, F. J. Rodrguez-Serrano, P. Vera-Candeas, M. Alonso-Gonzlez, and J. Ranilla. 2016. Parallel online time warping for real-time audio-to-score alignment in multi-core systems. The Journal of Supercomputing, 1--13. http://link.springer.com/article/10.1007/s11227-016-1647-5.Google Scholar
C. Raphael. 2006. Aligning music audio with symbolic scores using a hybrid graphical model. Machine Learning, 65, 389--409. Google ScholarDigital Library
F. J. Rodriguez-Serrano, P. Vera-Candeas, J. J. Carabias-Orti, P. Cabanas-Molero, and N. Ruiz-Reyes. 2010. Real time audio to score alignment based on Nls multipitch estimation (1 and 2). Music Information Retrieval Evaluation eXchange.Google Scholar
F. J. Rodriguez-Serrano and P. Vera-Candeas. 2014. A realtime score follower using spectral factorization and online time warping (forward version). Music Information Retrieval Evaluation eXchange.Google Scholar
F. J. Rodriguez-Serrano and P. Vera-Candeas. 2014. A realtime score follower using spectral factorization and online time warping (backward version). Music Information Retrieval Evaluation eXchange.Google Scholar
O. Romani and J. J. Carabias-Orti. 2014. A novel audio-to-score alignment method using velocity-driven dynamic time warping. Music Information Retrieval Evaluation eXchange.Google Scholar
H. Sakoe and S. Chiba. 1978. Dynamic programming algorithm optimisation for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 43--49.Google ScholarCross Ref
Schreck-Ensemble. 2001. Comparser. Retrieved from http://kmt.hku.nl/pieter/-SOFT/CMP/doc/cmp.html.Google Scholar
K. Suzuki, Y. Ueda, S. A. Raczynski, N. Ono, and S. Sagayama. 2010. Real-time audio to score alignment using locally-constrained dynamic time warping of chromagrams. Music Information Retrieval Evaluation eXchange.Google Scholar
K. Suzuki, Y. Ueda, S. A. Raczynski, N. Ono, and S. Sagayama. 2011. Real-time audio to score alignment using locally-constrained dynamic time warping of chromagrams. Music Information Retrieval Evaluation eXchange.Google Scholar
R. Yamamoto, S. Sako, and T. Kitarmura. 2012. Real-time audio to score alignment using segmental conditional random fields and linear dynamical system. Music Information Retrieval Evaluation eXchange.Google Scholar

Index Terms

Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing

Recommendations

A Simple Score Following System for Music Ensembles Using Chroma and Dynamic Time Warping
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval

It is disruptive for instrumentalists to turn the page of music sheet when they are playing instruments. The purpose of this study is to investigate how real-time music score alignment can serve as a tool for a computer-assisted page turner. We proposed ...
Read More
Real-time audio-to-score alignment of music performances containing errors and arbitrary repeats and skips

This paper discusses real-time alignment of audio signals of music performance to the corresponding score (a.k.a. score following) which can handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of ...
Read More
An effective method for audio-to-score alignment using onsets and modified constant Q spectra

This paper proposes an effective algorithm for polyphonic audio-to-score alignment that aligns a polyphonic music performance to its corresponding score. The proposed framework consists of three steps: onset detection, note matching, and dynamic ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 8, Issue 2
Survey Paper, Special Issue: Intelligent Music Systems and Applications and Regular Papers
March 2017
407 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3004291
Editor:
Yu Zheng
Microsoft Research, China
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 October 2016
- Accepted: 1 April 2016
- Revised: 1 February 2016
- Received: 1 November 2015
Published in tist Volume 8, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Accompaniment
audio-to-score alignment
beta-divergence
dynamic time warping (DTW)
non-negative matrix factorization (NMF)
online algorithm
score following
speed of interpretation
tempo
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 359
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping

ACM Transactions on Intelligent Systems and Technology

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

A Simple Score Following System for Music Ensembles Using Chroma and Dynamic Time Warping

Real-time audio-to-score alignment of music performances containing errors and arbitrary repeats and skips

An effective method for audio-to-score alignment using onsets and modified constant Q spectra