Abstract
In this article, we present an online score following framework designed to deal with automatic accompaniment. The proposed framework is based on spectral factorization and online Dynamic Time Warping (DTW) and has two separated stages: preprocessing and alignment. In the first one, we convert the score into a reference audio signal using a MIDI synthesizer software and we analyze the provided information in order to obtain the spectral patterns (i.e., basis functions) associated to each score unit. In this work, a score unit represents the occurrence of concurrent or isolated notes in the score. These spectral patterns are learned from the synthetic MIDI signal using a method based on Non-negative Matrix Factorization (NMF) with Beta-divergence, where the gains are initialized as the ground-truth transcription inferred from the MIDI. On the second stage, a non-iterative signal decomposition method with fixed spectral patterns per score unit is used over the magnitude spectrogram of the input signal resulting in a distortion matrix that can be interpreted as the cost of the matching for each score unit at each frame. Finally, the relation between the performance and the musical score times is obtained using a strategy based on online DTW, where the optimal path is biased by the speed of interpretation. Our system has been evaluated and compared to other systems, yielding reliable results and performance.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping
- C. Raphael. 2010. Music plus one and machine learning. In Proceedings of the 27th International Conference on Machine Learning. 21--28.Google ScholarDigital Library
- A. Arzt. 2008. Score following with dynamic time warping. An automatic page-turner. Master’s Thesis, Vienna University of Technology, Vienna, Austria.Google Scholar
- A. Arzt and G. Widmer. 2010. Towards effective any-time music tracking. In Proceedings of the Starting AI Researchers Symposium (STAIRS). 24--36. Google ScholarDigital Library
- A. Arzt and G. Widmer. 2010. Real-time music tracking using tempo-aware on-line dynamic time warping. Music Information Retrieval Evaluation eXchange.Google Scholar
- A. Arzt and G. Widmer. 2015. Real-time music tracking using multiple performances as a reference. In Proceedings of ISMIR. 357--363.Google Scholar
- D. Berndt and J. Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases. 229--248. Google ScholarDigital Library
- J. J. Carabias-Orti, F. J. Rodriguez-Serrano, P. Vera-Candeas, F. J. Canadas-Quesada, and N. Ruiz-Reyes. 2013. Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription. Engineering Applications of Artificial Intelligence., Retrieved from http://www.sciencedirect.com/science/article/pii/S0952197613000523.Google Scholar
- J. J. Carabias-Orti, F. J. Rodriguez-Serrano, P. Vera-Candeas, P. Cabanas-Molero, and N. Ruiz-Reyes. 2013. RealTime score follower using spectral factorization and online time warping. Music Information Retrieval Evaluation eXchange.Google Scholar
- J. J. Carabias-Orti, F. J. Rodriguez-Serrano, P. Vera-Candeas, N. Ruiz-Reyes, and F. J. Cañadas-Quesada. 2015. An audio to score alignment framework using spectral factorization and dynamic time warping. In Proceedings of ISMIR. 742--748.Google Scholar
- C. Chen and R. Jang. 2011. “sf1” Music Information Retrieval Evaluation eXchange.Google Scholar
- C. Chen. 2012. “sf1” Music Information Retrieval Evaluation eXchange.Google Scholar
- C. Chen. 2013. “sf” Music Information Retrieval Evaluation eXchange.Google Scholar
- A. de Cheveign and H. Kawahara. 2002. Yin, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Amer., 111, 1917--1930.Google ScholarCross Ref
- A. Cont. 2006. Realtime audio to score alignment for polyphonic music instruments using sparse non-negative constraints and hierarchical HMMs. In Proceedings of IEEE ICASSP, Toulouse, France.Google ScholarCross Ref
- A. Cont, D. Schwarz, N. Schnell, and C. Raphael. 2007. Evaluation of real- time audio-to-score alignment. In Proceedings of the International Conference on Music Information Retrieval (ISMIR).Google Scholar
- A. Cont. 2008. ANTESCOFO: Anticipatory synchronization and control of interactive parameters in computer music. International Computer Music Conference (ICMC), Belfast, Ireland, 33--40.Google Scholar
- A. Cont. 2010. A coupled duration-focused architecture for real-time music-to-score alignment. IEEE Trans. Pattern Anal. Mach. Intell. 32, 6, 974--987. Google ScholarDigital Library
- P. Cuvillier and A. Cont. 2014. Coherent time modeling of semi-Markov models with application to realtime audio-to-score alignment. In Proceedings of the 2014 IEEE International Workshop on Machine Learning for Signal Processing. 16.Google Scholar
- R. B. Dannenberg. 1984. An on-line algorithm for real-time accompaniment. In Proceedings of the International Computer Music Conference (ICMC). 193--198, 398.Google Scholar
- R. Dannenberg and N. Hu. 2003. Polyphonic audio matching for score following and intelligent audio editors. In Proceedings of the International Computer Music Conference. International Computer Music Association, San Francisco, 27--34.Google Scholar
- R. B. Dannenberg. 2007. An intelligent multi-track audio editor. In Proceedings of the International Computer Music Conference (ICMC), 2, August, 89--94.Google Scholar
- S. Dixon. 2005. Live tracking of musical performances using on-line time warping. In Proc. International Conference on Digital Audio Effects (DAFx), Madrid, Spain, 92--97.Google Scholar
- Z. Duan and B. Pardo. 2010. A real-time score follower for Mirex 2010. Music Information Retrieval Evaluation eXchange.Google Scholar
- Z. Duan and B. Pardo. 2011. Soundprism: An online system for score-informed source separation of music audio. IEEE Journal of Selected Topics in Signal Process, 5, 6, 1205--1215.Google ScholarCross Ref
- D. Ellis. 2003. Dynamic time warp (DTW) in Matlab. Retrieved from http://www.ee.columbia.edu/dpwe/resources/matlab/dtw/.Google Scholar
- Ewert, Sebastian, Meinard Muller, and Roger B. Dannenberg. 2011. Towards reliable partial music alignments using multiple synchronization strategies. Adaptive Multimedia Retrieval. Understanding Media and Adapting to the User. Springer, Berlin, 35--48. Google ScholarDigital Library
- C. Févotte and J. Idier. 2011. Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23, 9, 2421--2456.Google ScholarDigital Library
- C. Fremerey, M. Mller, and M. Clausen. 2010. Handling repeats and jumps in score-performance synchronization. In Proceedings of ISMIR. 243--248.Google Scholar
- J. Fritsch and M. Plumbley. 2013. Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis. In Proc. ICASSP, Vancouver, Canada.Google Scholar
- J. F. Gemmeke, T. Virtanen, and A. Hurmalainen. 2011. Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Transactions on Audio, Speech and Language Processing 19, 7, 2067--2080. Google ScholarDigital Library
- N. Hu, R. B. Dannenberg, and G. Tzanetakis. 2009. Polyphonic audio matching and alignment for music retrieval. In Proceedings of IEEE WASPAA, 185--188.Google Scholar
- O. Izmirli and R. Dannenberg. 2010. Understanding features and distance functions for music sequence alignment. In Proceedings of ISMIR. 411--416.Google Scholar
- F. Itakura. 1975. Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 23, 52--72.Google ScholarCross Ref
- C. Joder, S. Essid, and G. Richard. 2011. A conditional random field frame- work for robust and scalable audio-to-score matching. IEEE Trans. Speech, Audio and Lang. Process., 19, 8. Google ScholarDigital Library
- C. Joder, S. Essid, and G. Richard. 2013. Learning optimal features for polyphonic audio-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing, 21, 10, 2118--2128. Google ScholarDigital Library
- H. Kaprykowsky and X. Rodet. 2006. Globally optimal short-time dynamic time warping applications to score to audio alignment. In IEEE ICASSP.Google Scholar
- D. D. Lee and H. S. Seung. 2000. Algorithms for non-negative matrix factorization. In Proceedings of Neural Information Processing Systems, Denver, CO. Google ScholarDigital Library
- R. J. McNab, L. A. Smith, I. H. Witten, C. L. Henderson, and S. J. Cunningham. 1996. Towards the digital music library: Tune retrieval from acoustic input. In DL 96: Proceedings of the 1st ACM International Conference on Digital Libraries. ACM. New York, NY. 11--18. Google ScholarDigital Library
- N. Montecchio and N. Orio. 2009. A discrete filterbank approach to audio to score matching for score following. In Proc. ISMIR, 495--500.Google Scholar
- M. Mueller, H. Mattes, and F. Kurth. 2006. An efficient multiscale approach to audio synchronization. In Proceedings of the 7th International Conference on Music Information Retrieval, Victoria, Canada.Google Scholar
- N. Orio and D. Schwarz. 2001. Alignment of monophonic and polyphonic music to a score. In Proceedings of the International Computer Music Conference (ICMC).Google Scholar
- M. Puckette. 1995. Score following using the sung voice. In Proc. ICMC, 175--178.Google Scholar
- L. R. Rabiner, A. Rosenberg, and S. Levinson. 1978. Considerations in dynamic time warping algorithms for discrete word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 575--582, 1978.Google ScholarCross Ref
- P. Alonso, R. Cortina, F. J. Rodrguez-Serrano, P. Vera-Candeas, M. Alonso-Gonzlez, and J. Ranilla. 2016. Parallel online time warping for real-time audio-to-score alignment in multi-core systems. The Journal of Supercomputing, 1--13. http://link.springer.com/article/10.1007/s11227-016-1647-5.Google Scholar
- C. Raphael. 2006. Aligning music audio with symbolic scores using a hybrid graphical model. Machine Learning, 65, 389--409. Google ScholarDigital Library
- F. J. Rodriguez-Serrano, P. Vera-Candeas, J. J. Carabias-Orti, P. Cabanas-Molero, and N. Ruiz-Reyes. 2010. Real time audio to score alignment based on Nls multipitch estimation (1 and 2). Music Information Retrieval Evaluation eXchange.Google Scholar
- F. J. Rodriguez-Serrano and P. Vera-Candeas. 2014. A realtime score follower using spectral factorization and online time warping (forward version). Music Information Retrieval Evaluation eXchange.Google Scholar
- F. J. Rodriguez-Serrano and P. Vera-Candeas. 2014. A realtime score follower using spectral factorization and online time warping (backward version). Music Information Retrieval Evaluation eXchange.Google Scholar
- O. Romani and J. J. Carabias-Orti. 2014. A novel audio-to-score alignment method using velocity-driven dynamic time warping. Music Information Retrieval Evaluation eXchange.Google Scholar
- H. Sakoe and S. Chiba. 1978. Dynamic programming algorithm optimisation for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 43--49.Google ScholarCross Ref
- Schreck-Ensemble. 2001. Comparser. Retrieved from http://kmt.hku.nl/pieter/-SOFT/CMP/doc/cmp.html.Google Scholar
- K. Suzuki, Y. Ueda, S. A. Raczynski, N. Ono, and S. Sagayama. 2010. Real-time audio to score alignment using locally-constrained dynamic time warping of chromagrams. Music Information Retrieval Evaluation eXchange.Google Scholar
- K. Suzuki, Y. Ueda, S. A. Raczynski, N. Ono, and S. Sagayama. 2011. Real-time audio to score alignment using locally-constrained dynamic time warping of chromagrams. Music Information Retrieval Evaluation eXchange.Google Scholar
- R. Yamamoto, S. Sako, and T. Kitarmura. 2012. Real-time audio to score alignment using segmental conditional random fields and linear dynamical system. Music Information Retrieval Evaluation eXchange.Google Scholar
Index Terms
- Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping
Recommendations
A Simple Score Following System for Music Ensembles Using Chroma and Dynamic Time Warping
ICMR '18: Proceedings of the 2018 ACM on International Conference on Multimedia RetrievalIt is disruptive for instrumentalists to turn the page of music sheet when they are playing instruments. The purpose of this study is to investigate how real-time music score alignment can serve as a tool for a computer-assisted page turner. We proposed ...
Real-time audio-to-score alignment of music performances containing errors and arbitrary repeats and skips
This paper discusses real-time alignment of audio signals of music performance to the corresponding score (a.k.a. score following) which can handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of ...
An effective method for audio-to-score alignment using onsets and modified constant Q spectra
This paper proposes an effective algorithm for polyphonic audio-to-score alignment that aligns a polyphonic music performance to its corresponding score. The proposed framework consists of three steps: onset detection, note matching, and dynamic ...
Comments