skip to main content
research-article

Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping

Published:21 October 2016Publication History
Skip Abstract Section

Abstract

In this article, we present an online score following framework designed to deal with automatic accompaniment. The proposed framework is based on spectral factorization and online Dynamic Time Warping (DTW) and has two separated stages: preprocessing and alignment. In the first one, we convert the score into a reference audio signal using a MIDI synthesizer software and we analyze the provided information in order to obtain the spectral patterns (i.e., basis functions) associated to each score unit. In this work, a score unit represents the occurrence of concurrent or isolated notes in the score. These spectral patterns are learned from the synthetic MIDI signal using a method based on Non-negative Matrix Factorization (NMF) with Beta-divergence, where the gains are initialized as the ground-truth transcription inferred from the MIDI. On the second stage, a non-iterative signal decomposition method with fixed spectral patterns per score unit is used over the magnitude spectrogram of the input signal resulting in a distortion matrix that can be interpreted as the cost of the matching for each score unit at each frame. Finally, the relation between the performance and the musical score times is obtained using a strategy based on online DTW, where the optimal path is biased by the speed of interpretation. Our system has been evaluated and compared to other systems, yielding reliable results and performance.

Skip Supplemental Material Section

Supplemental Material

References

  1. C. Raphael. 2010. Music plus one and machine learning. In Proceedings of the 27th International Conference on Machine Learning. 21--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arzt. 2008. Score following with dynamic time warping. An automatic page-turner. Master’s Thesis, Vienna University of Technology, Vienna, Austria.Google ScholarGoogle Scholar
  3. A. Arzt and G. Widmer. 2010. Towards effective any-time music tracking. In Proceedings of the Starting AI Researchers Symposium (STAIRS). 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Arzt and G. Widmer. 2010. Real-time music tracking using tempo-aware on-line dynamic time warping. Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  5. A. Arzt and G. Widmer. 2015. Real-time music tracking using multiple performances as a reference. In Proceedings of ISMIR. 357--363.Google ScholarGoogle Scholar
  6. D. Berndt and J. Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the AAAI Workshop on Knowledge Discovery in Databases. 229--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. J. Carabias-Orti, F. J. Rodriguez-Serrano, P. Vera-Candeas, F. J. Canadas-Quesada, and N. Ruiz-Reyes. 2013. Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription. Engineering Applications of Artificial Intelligence., Retrieved from http://www.sciencedirect.com/science/article/pii/S0952197613000523.Google ScholarGoogle Scholar
  8. J. J. Carabias-Orti, F. J. Rodriguez-Serrano, P. Vera-Candeas, P. Cabanas-Molero, and N. Ruiz-Reyes. 2013. RealTime score follower using spectral factorization and online time warping. Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  9. J. J. Carabias-Orti, F. J. Rodriguez-Serrano, P. Vera-Candeas, N. Ruiz-Reyes, and F. J. Cañadas-Quesada. 2015. An audio to score alignment framework using spectral factorization and dynamic time warping. In Proceedings of ISMIR. 742--748.Google ScholarGoogle Scholar
  10. C. Chen and R. Jang. 2011. “sf1” Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  11. C. Chen. 2012. “sf1” Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  12. C. Chen. 2013. “sf” Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  13. A. de Cheveign and H. Kawahara. 2002. Yin, a fundamental frequency estimator for speech and music. J. Acoust. Soc. Amer., 111, 1917--1930.Google ScholarGoogle ScholarCross RefCross Ref
  14. A. Cont. 2006. Realtime audio to score alignment for polyphonic music instruments using sparse non-negative constraints and hierarchical HMMs. In Proceedings of IEEE ICASSP, Toulouse, France.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Cont, D. Schwarz, N. Schnell, and C. Raphael. 2007. Evaluation of real- time audio-to-score alignment. In Proceedings of the International Conference on Music Information Retrieval (ISMIR).Google ScholarGoogle Scholar
  16. A. Cont. 2008. ANTESCOFO: Anticipatory synchronization and control of interactive parameters in computer music. International Computer Music Conference (ICMC), Belfast, Ireland, 33--40.Google ScholarGoogle Scholar
  17. A. Cont. 2010. A coupled duration-focused architecture for real-time music-to-score alignment. IEEE Trans. Pattern Anal. Mach. Intell. 32, 6, 974--987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Cuvillier and A. Cont. 2014. Coherent time modeling of semi-Markov models with application to realtime audio-to-score alignment. In Proceedings of the 2014 IEEE International Workshop on Machine Learning for Signal Processing. 16.Google ScholarGoogle Scholar
  19. R. B. Dannenberg. 1984. An on-line algorithm for real-time accompaniment. In Proceedings of the International Computer Music Conference (ICMC). 193--198, 398.Google ScholarGoogle Scholar
  20. R. Dannenberg and N. Hu. 2003. Polyphonic audio matching for score following and intelligent audio editors. In Proceedings of the International Computer Music Conference. International Computer Music Association, San Francisco, 27--34.Google ScholarGoogle Scholar
  21. R. B. Dannenberg. 2007. An intelligent multi-track audio editor. In Proceedings of the International Computer Music Conference (ICMC), 2, August, 89--94.Google ScholarGoogle Scholar
  22. S. Dixon. 2005. Live tracking of musical performances using on-line time warping. In Proc. International Conference on Digital Audio Effects (DAFx), Madrid, Spain, 92--97.Google ScholarGoogle Scholar
  23. Z. Duan and B. Pardo. 2010. A real-time score follower for Mirex 2010. Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  24. Z. Duan and B. Pardo. 2011. Soundprism: An online system for score-informed source separation of music audio. IEEE Journal of Selected Topics in Signal Process, 5, 6, 1205--1215.Google ScholarGoogle ScholarCross RefCross Ref
  25. D. Ellis. 2003. Dynamic time warp (DTW) in Matlab. Retrieved from http://www.ee.columbia.edu/dpwe/resources/matlab/dtw/.Google ScholarGoogle Scholar
  26. Ewert, Sebastian, Meinard Muller, and Roger B. Dannenberg. 2011. Towards reliable partial music alignments using multiple synchronization strategies. Adaptive Multimedia Retrieval. Understanding Media and Adapting to the User. Springer, Berlin, 35--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Févotte and J. Idier. 2011. Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23, 9, 2421--2456.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Fremerey, M. Mller, and M. Clausen. 2010. Handling repeats and jumps in score-performance synchronization. In Proceedings of ISMIR. 243--248.Google ScholarGoogle Scholar
  29. J. Fritsch and M. Plumbley. 2013. Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis. In Proc. ICASSP, Vancouver, Canada.Google ScholarGoogle Scholar
  30. J. F. Gemmeke, T. Virtanen, and A. Hurmalainen. 2011. Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Transactions on Audio, Speech and Language Processing 19, 7, 2067--2080. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Hu, R. B. Dannenberg, and G. Tzanetakis. 2009. Polyphonic audio matching and alignment for music retrieval. In Proceedings of IEEE WASPAA, 185--188.Google ScholarGoogle Scholar
  32. O. Izmirli and R. Dannenberg. 2010. Understanding features and distance functions for music sequence alignment. In Proceedings of ISMIR. 411--416.Google ScholarGoogle Scholar
  33. F. Itakura. 1975. Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 23, 52--72.Google ScholarGoogle ScholarCross RefCross Ref
  34. C. Joder, S. Essid, and G. Richard. 2011. A conditional random field frame- work for robust and scalable audio-to-score matching. IEEE Trans. Speech, Audio and Lang. Process., 19, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Joder, S. Essid, and G. Richard. 2013. Learning optimal features for polyphonic audio-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing, 21, 10, 2118--2128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. Kaprykowsky and X. Rodet. 2006. Globally optimal short-time dynamic time warping applications to score to audio alignment. In IEEE ICASSP.Google ScholarGoogle Scholar
  37. D. D. Lee and H. S. Seung. 2000. Algorithms for non-negative matrix factorization. In Proceedings of Neural Information Processing Systems, Denver, CO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. J. McNab, L. A. Smith, I. H. Witten, C. L. Henderson, and S. J. Cunningham. 1996. Towards the digital music library: Tune retrieval from acoustic input. In DL 96: Proceedings of the 1st ACM International Conference on Digital Libraries. ACM. New York, NY. 11--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. N. Montecchio and N. Orio. 2009. A discrete filterbank approach to audio to score matching for score following. In Proc. ISMIR, 495--500.Google ScholarGoogle Scholar
  40. M. Mueller, H. Mattes, and F. Kurth. 2006. An efficient multiscale approach to audio synchronization. In Proceedings of the 7th International Conference on Music Information Retrieval, Victoria, Canada.Google ScholarGoogle Scholar
  41. N. Orio and D. Schwarz. 2001. Alignment of monophonic and polyphonic music to a score. In Proceedings of the International Computer Music Conference (ICMC).Google ScholarGoogle Scholar
  42. M. Puckette. 1995. Score following using the sung voice. In Proc. ICMC, 175--178.Google ScholarGoogle Scholar
  43. L. R. Rabiner, A. Rosenberg, and S. Levinson. 1978. Considerations in dynamic time warping algorithms for discrete word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 575--582, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  44. P. Alonso, R. Cortina, F. J. Rodrguez-Serrano, P. Vera-Candeas, M. Alonso-Gonzlez, and J. Ranilla. 2016. Parallel online time warping for real-time audio-to-score alignment in multi-core systems. The Journal of Supercomputing, 1--13. http://link.springer.com/article/10.1007/s11227-016-1647-5.Google ScholarGoogle Scholar
  45. C. Raphael. 2006. Aligning music audio with symbolic scores using a hybrid graphical model. Machine Learning, 65, 389--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. F. J. Rodriguez-Serrano, P. Vera-Candeas, J. J. Carabias-Orti, P. Cabanas-Molero, and N. Ruiz-Reyes. 2010. Real time audio to score alignment based on Nls multipitch estimation (1 and 2). Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  47. F. J. Rodriguez-Serrano and P. Vera-Candeas. 2014. A realtime score follower using spectral factorization and online time warping (forward version). Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  48. F. J. Rodriguez-Serrano and P. Vera-Candeas. 2014. A realtime score follower using spectral factorization and online time warping (backward version). Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  49. O. Romani and J. J. Carabias-Orti. 2014. A novel audio-to-score alignment method using velocity-driven dynamic time warping. Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  50. H. Sakoe and S. Chiba. 1978. Dynamic programming algorithm optimisation for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 43--49.Google ScholarGoogle ScholarCross RefCross Ref
  51. Schreck-Ensemble. 2001. Comparser. Retrieved from http://kmt.hku.nl/pieter/-SOFT/CMP/doc/cmp.html.Google ScholarGoogle Scholar
  52. K. Suzuki, Y. Ueda, S. A. Raczynski, N. Ono, and S. Sagayama. 2010. Real-time audio to score alignment using locally-constrained dynamic time warping of chromagrams. Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  53. K. Suzuki, Y. Ueda, S. A. Raczynski, N. Ono, and S. Sagayama. 2011. Real-time audio to score alignment using locally-constrained dynamic time warping of chromagrams. Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar
  54. R. Yamamoto, S. Sako, and T. Kitarmura. 2012. Real-time audio to score alignment using segmental conditional random fields and linear dynamical system. Music Information Retrieval Evaluation eXchange.Google ScholarGoogle Scholar

Index Terms

  1. Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 2
      Survey Paper, Special Issue: Intelligent Music Systems and Applications and Regular Papers
      March 2017
      407 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3004291
      • Editor:
      • Yu Zheng
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 October 2016
      • Accepted: 1 April 2016
      • Revised: 1 February 2016
      • Received: 1 November 2015
      Published in tist Volume 8, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader