Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2013

01.12.2013

Automatic music transcription: challenges and future directions

verfasst von: Emmanouil Benetos, Simon Dixon, Dimitrios Giannoulis, Holger Kirchhoff, Anssi Klapuri

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdallah, S.A., & Plumbley, M.D. (2004). Polyphonic transcription by non-negative sparse coding of power spectra. In 5th int. conf. on music information retrieval (pp. 318–325). Abdallah, S.A., & Plumbley, M.D. (2004). Polyphonic transcription by non-negative sparse coding of power spectra. In 5th int. conf. on music information retrieval (pp. 318–325).
Zurück zum Zitat Arberet, S., Ozerov, A., Bimbot, F. & Gribonval, R (2012). A tractable framework for estimating and combining spectral source models for audio source separation. Signal Processing, 92(8), 1886–1901.CrossRef Arberet, S., Ozerov, A., Bimbot, F. & Gribonval, R (2012). A tractable framework for estimating and combining spectral source models for audio source separation. Signal Processing, 92(8), 1886–1901.CrossRef
Zurück zum Zitat Barbancho, A., Klapuri, A., Tardon, L. & Barbancho, I (2012). Automatic transcription of guitar chords and fingering from audio. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 915–921.CrossRef Barbancho, A., Klapuri, A., Tardon, L. & Barbancho, I (2012). Automatic transcription of guitar chords and fingering from audio. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 915–921.CrossRef
Zurück zum Zitat Barbancho, I., de la Bandera, C., Barbancho, A., Tardon, L. (2009). Transcription and expressiveness detection system for violin music. In Int. conf. audio, speech, and signal processing (pp. 189–192). Barbancho, I., de la Bandera, C., Barbancho, A., Tardon, L. (2009). Transcription and expressiveness detection system for violin music. In Int. conf. audio, speech, and signal processing (pp. 189–192).
Zurück zum Zitat Barbedo, J. & Tzanetakis, G (2011). Musical instrument classification using individual partials. IEEE Trans. Audio, Speech, and Language Processing, 19(1), 111–122.CrossRef Barbedo, J. & Tzanetakis, G (2011). Musical instrument classification using individual partials. IEEE Trans. Audio, Speech, and Language Processing, 19(1), 111–122.CrossRef
Zurück zum Zitat Bay, M. & Beauchamp, J. W (2012). Multiple-timbre fundamental frequency tracking using an instrument spectrum library. The. Journal of the Acoustical Society of America, 132(3), 1886.CrossRef Bay, M. & Beauchamp, J. W (2012). Multiple-timbre fundamental frequency tracking using an instrument spectrum library. The. Journal of the Acoustical Society of America, 132(3), 1886.CrossRef
Zurück zum Zitat Bay, M., Ehmann, A.F., Downie, J.S. (2009). Evaluation of multiple-F0 estimation and tracking systems. In 10th int. society for music information retrieval conf. (pp. 315–320). Bay, M., Ehmann, A.F., Downie, J.S. (2009). Evaluation of multiple-F0 estimation and tracking systems. In 10th int. society for music information retrieval conf. (pp. 315–320).
Zurück zum Zitat Bello, J., Daudet, L., Abdallah, S., Duxbury, C., Davies, M. & Sandler, M (2005). A tutorial on onset detection in musical signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.CrossRef Bello, J., Daudet, L., Abdallah, S., Duxbury, C., Davies, M. & Sandler, M (2005). A tutorial on onset detection in musical signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.CrossRef
Zurück zum Zitat Bello, J.P. (2003). Towards the automated analysis of simple polyphonic music: A knowledge-based approach. Ph.D. thesis, Department of Electronic Engineering, Queen Mary University of London. Bello, J.P. (2003). Towards the automated analysis of simple polyphonic music: A knowledge-based approach. Ph.D. thesis, Department of Electronic Engineering, Queen Mary University of London.
Zurück zum Zitat Bello, J. P., Daudet, L. & Sandler, M. B (2006). Automatic piano transcription using frequency and time-domain information. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2242–2251.CrossRef Bello, J. P., Daudet, L. & Sandler, M. B (2006). Automatic piano transcription using frequency and time-domain information. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2242–2251.CrossRef
Zurück zum Zitat Benetos, E., & Dixon, S. (2011). Polyphonic music transcription using note onset and offset detection. In IEEE international conference on acoustics, speech, and signal processing (pp. 37–40). Prague, Czech Republic. Benetos, E., & Dixon, S. (2011). Polyphonic music transcription using note onset and offset detection. In IEEE international conference on acoustics, speech, and signal processing (pp. 37–40). Prague, Czech Republic.
Zurück zum Zitat Benetos, E. & Dixon, S (2012). A shift-invariant latent variable model for automatic music transcription. Computer Music Journal, 36(4), 81–94.CrossRef Benetos, E. & Dixon, S (2012). A shift-invariant latent variable model for automatic music transcription. Computer Music Journal, 36(4), 81–94.CrossRef
Zurück zum Zitat Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., Klapuri, A. (2012). Automatic music transcription: Breaking the glass ceiling. In 13th int. society for music information retrieval conf. (pp. 379–384). Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., Klapuri, A. (2012). Automatic music transcription: Breaking the glass ceiling. In 13th int. society for music information retrieval conf. (pp. 379–384).
Zurück zum Zitat Benetos, E., Klapuri, A., Dixon, S. (2012). Score-informed transcription for automatic piano tutoring. In 20th European signal processing conf. (pp. 2153–2157). Benetos, E., Klapuri, A., Dixon, S. (2012). Score-informed transcription for automatic piano tutoring. In 20th European signal processing conf. (pp. 2153–2157).
Zurück zum Zitat Bertin, N., Badeau, R., Richard, G. (2007). Blind signal decompositions for automatic transcription of polyphonic music: NMF and K-SVD on the benchmark. In IEEE international conference on acoustics, speech, and signal processing (pp. 65–68). Bertin, N., Badeau, R., Richard, G. (2007). Blind signal decompositions for automatic transcription of polyphonic music: NMF and K-SVD on the benchmark. In IEEE international conference on acoustics, speech, and signal processing (pp. 65–68).
Zurück zum Zitat Bertin, N., Badeau, R. & Vincent, E (2010). Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 538–549.CrossRef Bertin, N., Badeau, R. & Vincent, E (2010). Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 538–549.CrossRef
Zurück zum Zitat Böck, S., Arzt, A., Krebs, F., Schedl, M. (2012). Online realtime onset detection with recurrent neural networks. In Proceedings of the 15th international conference on digital audio effects. Böck, S., Arzt, A., Krebs, F., Schedl, M. (2012). Online realtime onset detection with recurrent neural networks. In Proceedings of the 15th international conference on digital audio effects.
Zurück zum Zitat Bosch, J., Janer, J., Fuhrmann, F., Herrera, P. (2012). A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In 13th int. society for music information retrieval conf. (pp. 559–564). Bosch, J., Janer, J., Fuhrmann, F., Herrera, P. (2012). A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In 13th int. society for music information retrieval conf. (pp. 559–564).
Zurück zum Zitat Brown, J (1991). Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1), 425–434.CrossRef Brown, J (1991). Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1), 425–434.CrossRef
Zurück zum Zitat Buckheit, J.B., & Donoho, D.L. (1995). WaveLab and reproducible research. Tech. Rep. 474, Dept of Statistics, Stanford Univ. Buckheit, J.B., & Donoho, D.L. (1995). WaveLab and reproducible research. Tech. Rep. 474, Dept of Statistics, Stanford Univ.
Zurück zum Zitat Burred, J., Robel, A., Sikora, T. (2009). Polyphonic musical instrument recognition based on a dynamic model of the spectral envelope. In Int. conf. audio, speech, and signal processing (pp. 173–176). Burred, J., Robel, A., Sikora, T. (2009). Polyphonic musical instrument recognition based on a dynamic model of the spectral envelope. In Int. conf. audio, speech, and signal processing (pp. 173–176).
Zurück zum Zitat Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C. & Slaney, M (2008). Content-based music information retrieval: current directions and future challenges. Proceedings of the IEEE, 96(4), 668–696.CrossRef Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C. & Slaney, M (2008). Content-based music information retrieval: current directions and future challenges. Proceedings of the IEEE, 96(4), 668–696.CrossRef
Zurück zum Zitat Cemgil, A. & Kappen, B (2003). Monte carlo methods for tempo tracking and rhythm quantization. Journal of Artificial Intelligence Research, 18, 45–81.MATH Cemgil, A. & Kappen, B (2003). Monte carlo methods for tempo tracking and rhythm quantization. Journal of Artificial Intelligence Research, 18, 45–81.MATH
Zurück zum Zitat Cemgil, A.T. (2004). Bayesian music transcription. Ph.D. thesis, Radboud University Nijmegen, Netherlands. Cemgil, A.T. (2004). Bayesian music transcription. Ph.D. thesis, Radboud University Nijmegen, Netherlands.
Zurück zum Zitat Cemgil, A. T., Kappen, H. J. & Barber, D (2006). A generative model for music transcription. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 679–694.CrossRef Cemgil, A. T., Kappen, H. J. & Barber, D (2006). A generative model for music transcription. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 679–694.CrossRef
Zurück zum Zitat Collins, N. (2005). A comparison of sound onset detection algorithms with emphasis on psychoacoustically motivated detection functions. In 118th convention of the audio engineering society. Barcelona, Spain. Collins, N. (2005). A comparison of sound onset detection algorithms with emphasis on psychoacoustically motivated detection functions. In 118th convention of the audio engineering society. Barcelona, Spain.
Zurück zum Zitat Cont, A. (2006). Realtime multiple pitch observation using sparse non-negative constraints. In 7th international conference on music information retrieval. Cont, A. (2006). Realtime multiple pitch observation using sparse non-negative constraints. In 7th international conference on music information retrieval.
Zurück zum Zitat Dannenberg, R. (2005). Toward automated holistic beat tracking, music analysis, and understanding. In 6th int. conf. on music information retrieval (pp. 366–373). Dannenberg, R. (2005). Toward automated holistic beat tracking, music analysis, and understanding. In 6th int. conf. on music information retrieval (pp. 366–373).
Zurück zum Zitat Davies, M. & Plumbley, M (2007). Context-dependent beat tracking of musical audio. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 1009–1020.CrossRef Davies, M. & Plumbley, M (2007). Context-dependent beat tracking of musical audio. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 1009–1020.CrossRef
Zurück zum Zitat Davy, M., Godsill, S. & Idier, J (2006). Bayesian analysis of western tonal music. Journal of the Acoustical Society of America, 119(4), 2498–2517.CrossRef Davy, M., Godsill, S. & Idier, J (2006). Bayesian analysis of western tonal music. Journal of the Acoustical Society of America, 119(4), 2498–2517.CrossRef
Zurück zum Zitat Degara, N., Davies, M., Pena, A. & Plumbley, M (2011). Onset event decoding exploiting the rhythmic structure of polyphonic music. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1228–1239.CrossRef Degara, N., Davies, M., Pena, A. & Plumbley, M (2011). Onset event decoding exploiting the rhythmic structure of polyphonic music. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1228–1239.CrossRef
Zurück zum Zitat Degara, N., Rua, E. A., Pena, A., Torres-Guijarro, S., Davies, M. & Plumbley, M (2012). Reliability-informed beat tracking of musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 290–301.CrossRef Degara, N., Rua, E. A., Pena, A., Torres-Guijarro, S., Davies, M. & Plumbley, M (2012). Reliability-informed beat tracking of musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 290–301.CrossRef
Zurück zum Zitat Desain, P. & Honing, H (1999). Computational models of beat induction: the rule-based approach. Journal of New. Music Research, 28(1), 29–42.CrossRef Desain, P. & Honing, H (1999). Computational models of beat induction: the rule-based approach. Journal of New. Music Research, 28(1), 29–42.CrossRef
Zurück zum Zitat Dessein, A., Cont, A., Lemaitre, G. (2010). Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In 11th int. society for music information retrieval conf. (pp. 489–494). Dessein, A., Cont, A., Lemaitre, G. (2010). Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In 11th int. society for music information retrieval conf. (pp. 489–494).
Zurück zum Zitat Dittmar, C., & Abeßer, J. (2008). Automatic music transcription with user interaction. In 34. Deutsche jahrestagung für akustik (DAGA) (pp. 567–568). Dittmar, C., & Abeßer, J. (2008). Automatic music transcription with user interaction. In 34. Deutsche jahrestagung für akustik (DAGA) (pp. 567–568).
Zurück zum Zitat Dittmar, C., Cano, E., Abeßer, J., Grollmisch, S. (2012). Music information retrieval meets music education. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 95–120). Schloss Dagstuhl–Leibniz-Zentrum für Informatik. Dittmar, C., Cano, E., Abeßer, J., Grollmisch, S. (2012). Music information retrieval meets music education. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 95–120). Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
Zurück zum Zitat Dixon, S (2001). Automatic extraction of tempo and beat from expressive performances. Journal of New. Music Research, 30(1), 39–58.CrossRef Dixon, S (2001). Automatic extraction of tempo and beat from expressive performances. Journal of New. Music Research, 30(1), 39–58.CrossRef
Zurück zum Zitat Dixon, S., Goebl, W. & Cambouropoulos, E (2006). Perceptual smoothness of tempo in expressively performed music. Music Perception, 23(3), 195–214.CrossRef Dixon, S., Goebl, W. & Cambouropoulos, E (2006). Perceptual smoothness of tempo in expressively performed music. Music Perception, 23(3), 195–214.CrossRef
Zurück zum Zitat Duan, Z., Pardo, B. & Zhang, C (2010). Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2121–2133.CrossRef Duan, Z., Pardo, B. & Zhang, C (2010). Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2121–2133.CrossRef
Zurück zum Zitat Durrieu, J., & Thiran, J. (2012). Musical audio source separation based on user-selected F0 track. In 10th int. conf. latent variable analysis and source separation (pp. 438–445). Durrieu, J., & Thiran, J. (2012). Musical audio source separation based on user-selected F0 track. In 10th int. conf. latent variable analysis and source separation (pp. 438–445).
Zurück zum Zitat Eggink, J., & Brown, G. (2003). A missing feature approach to instrument identification in polyphonic music. In Int. conf. audio, speech, and signal processing (Vol. 5, pp. 553–556). Eggink, J., & Brown, G. (2003). A missing feature approach to instrument identification in polyphonic music. In Int. conf. audio, speech, and signal processing (Vol. 5, pp. 553–556).
Zurück zum Zitat Emiya, V., Badeau, R. & David, B (2010). Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1643–1654.CrossRef Emiya, V., Badeau, R. & David, B (2010). Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1643–1654.CrossRef
Zurück zum Zitat Ewert, S., & Müller, M. (2011). Estimating note intensities in music recordings. In Int. conf. audio, speech, and signal processing (pp. 385–388). Ewert, S., & Müller, M. (2011). Estimating note intensities in music recordings. In Int. conf. audio, speech, and signal processing (pp. 385–388).
Zurück zum Zitat Ewert, S., & Müller, M. (2012). Using score-informed constraints for NMF-based source separation. In Int. conf. audio, speech, and signal processing (pp. 129–132). Ewert, S., & Müller, M. (2012). Using score-informed constraints for NMF-based source separation. In Int. conf. audio, speech, and signal processing (pp. 129–132).
Zurück zum Zitat Ewert, S., Muller, M., Grosche, P. (2009). High resolution audio synchronization using chroma onset features. In IEEE international conference on audio, speech and signal processing (pp. 1869–1872). Ewert, S., Muller, M., Grosche, P. (2009). High resolution audio synchronization using chroma onset features. In IEEE international conference on audio, speech and signal processing (pp. 1869–1872).
Zurück zum Zitat Eyben, F., Böck, S., Schuller, B., Graves, A. (2012). Universal onset detection with bidirectional long short-term memory neural networks. In 11th international society for music information retrieval conference. Eyben, F., Böck, S., Schuller, B., Graves, A. (2012). Universal onset detection with bidirectional long short-term memory neural networks. In 11th international society for music information retrieval conference.
Zurück zum Zitat Fourer, D., & Marchand, S. (2012). Informed multiple-F0 estimation applied to monaural audio source separation. In 20th European signal processing conf. (pp. 2158–2162). Fourer, D., & Marchand, S. (2012). Informed multiple-F0 estimation applied to monaural audio source separation. In 20th European signal processing conf. (pp. 2158–2162).
Zurück zum Zitat Freund, Y., Schapire, R. & Abe, N (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5), 771–780. Freund, Y., Schapire, R. & Abe, N (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5), 771–780.
Zurück zum Zitat Fuentes, B., Badeau, R., Richard, G. (2011). Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In Int. conf. audio, speech, and signal processing (pp. 401–404). Fuentes, B., Badeau, R., Richard, G. (2011). Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In Int. conf. audio, speech, and signal processing (pp. 401–404).
Zurück zum Zitat Fuentes, B., Badeau, R., Richard, G. (2012). Blind harmonic adaptive decomposition applied to supervised source separation. In 20th European signal processing conf. (pp. 2654–2658). Fuentes, B., Badeau, R., Richard, G. (2012). Blind harmonic adaptive decomposition applied to supervised source separation. In 20th European signal processing conf. (pp. 2654–2658).
Zurück zum Zitat Gang, R., Bocko, G., Lundberg, J., Roessner, S., Headlam, D., Bocko, M. (2011). A real-time signal processing framework of musical expressive feature extraction using MATLAB. In 12th int. society for music information retrieval conf. (pp. 115–120). Gang, R., Bocko, G., Lundberg, J., Roessner, S., Headlam, D., Bocko, M. (2011). A real-time signal processing framework of musical expressive feature extraction using MATLAB. In 12th int. society for music information retrieval conf. (pp. 115–120).
Zurück zum Zitat Giannoulis, D., & Klapuri, A. (2013). Musical instrument recognition in polyphonic audio using missing feature approach. In IEEE transactions on audio, speech, and language processing (Vol. 21, no. 9, pp. 1805–1817). doi:10.1109/TASL.2013.2248720. Giannoulis, D., & Klapuri, A. (2013). Musical instrument recognition in polyphonic audio using missing feature approach. In IEEE transactions on audio, speech, and language processing (Vol. 21, no. 9, pp. 1805–1817). doi:10.​1109/​TASL.​2013.​2248720.
Zurück zum Zitat Gillet, O., & Richard, G. (2003). Automatic labelling of tabla signals. In 4th int. conf. on music information retrieval. Gillet, O., & Richard, G. (2003). Automatic labelling of tabla signals. In 4th int. conf. on music information retrieval.
Zurück zum Zitat Goto, M (2004). A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43, 311–329.CrossRef Goto, M (2004). A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43, 311–329.CrossRef
Zurück zum Zitat Goto, M. (2012). Grand challenges in music information research. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 217–225). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik. Goto, M. (2012). Grand challenges in music information research. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 217–225). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
Zurück zum Zitat Goto, M., Hashiguchi, H., Nishimura, T., Oka, R. (2002). RWC music database: Popular, classical, and jazz music databases. In Proc. ISMIR (Vol. 2, pp. 287–288). Goto, M., Hashiguchi, H., Nishimura, T., Oka, R. (2002). RWC music database: Popular, classical, and jazz music databases. In Proc. ISMIR (Vol. 2, pp. 287–288).
Zurück zum Zitat Gouyon, F. & Dixon, S (2005). A review of automatic rhythm description systems. Computer Music Journal, 29(1), 34–54.CrossRef Gouyon, F. & Dixon, S (2005). A review of automatic rhythm description systems. Computer Music Journal, 29(1), 34–54.CrossRef
Zurück zum Zitat Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G. & Uhle, C (2006). An experimental comparison of audio tempo induction algorithms. IEEE Transactions on Audio, Speech and Language Processing, 14(5), 1832–1844.CrossRef Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G. & Uhle, C (2006). An experimental comparison of audio tempo induction algorithms. IEEE Transactions on Audio, Speech and Language Processing, 14(5), 1832–1844.CrossRef
Zurück zum Zitat Grindlay, G. & Ellis, D (2011). Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1159–1169.CrossRef Grindlay, G. & Ellis, D (2011). Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1159–1169.CrossRef
Zurück zum Zitat Grosche, P., Schuller, B., Müller, M. & Rigoll, G (2012). Automatic transcription of recorded music. Acta. Acustica United with Acustica, 98(2), 199–215.CrossRef Grosche, P., Schuller, B., Müller, M. & Rigoll, G (2012). Automatic transcription of recorded music. Acta. Acustica United with Acustica, 98(2), 199–215.CrossRef
Zurück zum Zitat Heittola, T., Klapuri, A., Virtanen, T. (2009). Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In 10th int. society for music information retrieval conf. (pp. 327–332). Heittola, T., Klapuri, A., Virtanen, T. (2009). Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In 10th int. society for music information retrieval conf. (pp. 327–332).
Zurück zum Zitat Herrera-Boyer, P., Klapuri, A., Davy, M. (2006). Automatic classification of pitched musical instrument sounds. In Signal processing methods for music transcription (pp. 163–200). Herrera-Boyer, P., Klapuri, A., Davy, M. (2006). Automatic classification of pitched musical instrument sounds. In Signal processing methods for music transcription (pp. 163–200).
Zurück zum Zitat Holzapfel, A., Stylianou, Y., Gedik, A. & Bozkurt, B (2010). Three dimensions of pitched instrument onset detection. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1517–1527.CrossRef Holzapfel, A., Stylianou, Y., Gedik, A. & Bozkurt, B (2010). Three dimensions of pitched instrument onset detection. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1517–1527.CrossRef
Zurück zum Zitat Huang, X., Acero, A., Hon, H.W. (Eds.). (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall. Huang, X., Acero, A., Hon, H.W. (Eds.). (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall.
Zurück zum Zitat Humphrey, E.J., Bello, J.P., LeCun, Y. (2013). Feature learning and deep architectures: new directions for music informatics. Journal of Intelligent Information Systems. doi:10.1007/s10844-013-0248-5. Humphrey, E.J., Bello, J.P., LeCun, Y. (2013). Feature learning and deep architectures: new directions for music informatics. Journal of Intelligent Information Systems. doi:10.​1007/​s10844-013-0248-5.
Zurück zum Zitat Itoyama, K., Goto, M., Komatani, K., Ogata, T., Okuno, H. (2011). Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling. In Int. conf. audio, speech, and signal processing (pp. 3816–3819). Itoyama, K., Goto, M., Komatani, K., Ogata, T., Okuno, H. (2011). Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling. In Int. conf. audio, speech, and signal processing (pp. 3816–3819).
Zurück zum Zitat Kameoka, H., Nishimoto, T. & Sagayama, S (2007). A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 982–994.CrossRef Kameoka, H., Nishimoto, T. & Sagayama, S (2007). A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 982–994.CrossRef
Zurück zum Zitat Kameoka, H., Ochiai, K., Nakano, M., Tsuchiya, M., Sagayama, S. (2012). Context-free 2D tree structure model of musical notes for Bayesian modeling of polyphonic spectrograms. In 13th int. society for music information retrieval conf. (pp. 307–312). Kameoka, H., Ochiai, K., Nakano, M., Tsuchiya, M., Sagayama, S. (2012). Context-free 2D tree structure model of musical notes for Bayesian modeling of polyphonic spectrograms. In 13th int. society for music information retrieval conf. (pp. 307–312).
Zurück zum Zitat Kasimi, A.A., Nichols, E., Raphael, C. (2007). A simple algorithm for automatic generation of polyphonic piano fingerings. In 8th international conference on music information retrieval (pp. 355–356). Vienna, Austria. Kasimi, A.A., Nichols, E., Raphael, C. (2007). A simple algorithm for automatic generation of polyphonic piano fingerings. In 8th international conference on music information retrieval (pp. 355–356). Vienna, Austria.
Zurück zum Zitat Kirchhoff, H., Dixon, S., Klapuri, A. (2012). Shift-variant non-negative matrix deconvolution for music transcription. In Int. conf. audio, speech, and signal processing (pp. 125–128). Kirchhoff, H., Dixon, S., Klapuri, A. (2012). Shift-variant non-negative matrix deconvolution for music transcription. In Int. conf. audio, speech, and signal processing (pp. 125–128).
Zurück zum Zitat Kitahara, T., Goto, M., Komatani, K., Ogata, T. & Okuno, H. G (2007). Instrogram: probabilistic representation of instrument existence for polyphonic music. Information and Media Technologies, 2(1), 279–291. Kitahara, T., Goto, M., Komatani, K., Ogata, T. & Okuno, H. G (2007). Instrogram: probabilistic representation of instrument existence for polyphonic music. Information and Media Technologies, 2(1), 279–291.
Zurück zum Zitat Klapuri, A (2003). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Audio, Speech, and Language Processing, 11(6), 804–816.CrossRef Klapuri, A (2003). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Audio, Speech, and Language Processing, 11(6), 804–816.CrossRef
Zurück zum Zitat Klapuri, A., Davy, M. (Eds.). (2006). Signal processing methods for music transcription. Springer. Klapuri, A., Davy, M. (Eds.). (2006). Signal processing methods for music transcription. Springer.
Zurück zum Zitat Klapuri, A., Eronen, A. & Astola, J (2006). Analysis of the meter of acoustic musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 342–355.CrossRef Klapuri, A., Eronen, A. & Astola, J (2006). Analysis of the meter of acoustic musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 342–355.CrossRef
Zurück zum Zitat Klapuri, A., Eronen, A., Seppänen, J., Virtanen, T. (2001). Automatic transcription of music. In Symposium on stochastic modeling of music. Ghent, Belgium. Klapuri, A., Eronen, A., Seppänen, J., Virtanen, T. (2001). Automatic transcription of music. In Symposium on stochastic modeling of music. Ghent, Belgium.
Zurück zum Zitat Koretz, A. & Tabrikian, J (2011). Maximum a posteriori probability multiple pitch tracking using the harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2210–2221.CrossRef Koretz, A. & Tabrikian, J (2011). Maximum a posteriori probability multiple pitch tracking using the harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2210–2221.CrossRef
Zurück zum Zitat Lacoste, A., & Eck, D. (2007). A supervised classification algorithm for note onset detection. EURASIP Journal on Applied Signal Processing, 2007(1), 1–13. ID 43745. Lacoste, A., & Eck, D. (2007). A supervised classification algorithm for note onset detection. EURASIP Journal on Applied Signal Processing, 2007(1), 1–13. ID 43745.
Zurück zum Zitat Large, E. & Kolen, J (1994). Resonance and the perception of musical meter. Connection Science, 6, 177–208.CrossRef Large, E. & Kolen, J (1994). Resonance and the perception of musical meter. Connection Science, 6, 177–208.CrossRef
Zurück zum Zitat Lee, C. T., Yang, Y. H. & Chen, H (2012). Multipitch estimation of piano music by exemplar-based sparse representation. IEEE Trans. Multimedia, 14(3), 608–618.CrossRef Lee, C. T., Yang, Y. H. & Chen, H (2012). Multipitch estimation of piano music by exemplar-based sparse representation. IEEE Trans. Multimedia, 14(3), 608–618.CrossRef
Zurück zum Zitat Lee, K. & Slaney, M (2008). Acoustic chord transcription and key extraction from audio using key-dependent hmms trained on synthesized audio. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 291–301.CrossRef Lee, K. & Slaney, M (2008). Acoustic chord transcription and key extraction from audio using key-dependent hmms trained on synthesized audio. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 291–301.CrossRef
Zurück zum Zitat Leveau, P., Vincent, E., Richard, G. & Daudet, L (2008). Instrument-specific harmonic atoms for mid-level music representation. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 116–128.CrossRef Leveau, P., Vincent, E., Richard, G. & Daudet, L (2008). Instrument-specific harmonic atoms for mid-level music representation. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 116–128.CrossRef
Zurück zum Zitat Little, D., & Pardo, B. (2008). Learning musical instruments from mixtures of audio with weak labels. In 9th int. conf. on music information retrieval (p. 127). Little, D., & Pardo, B. (2008). Learning musical instruments from mixtures of audio with weak labels. In 9th int. conf. on music information retrieval (p. 127).
Zurück zum Zitat Loscos, A., Wang, Y., Boo, W. (2006). Low level descriptors for automatic violin transcription. In 7th int. conf. on music information retrieval (pp. 164–167). Loscos, A., Wang, Y., Boo, W. (2006). Low level descriptors for automatic violin transcription. In 7th int. conf. on music information retrieval (pp. 164–167).
Zurück zum Zitat Maezawa, A., Itoyama, K., Komatani, K., Ogata, T. & Okuno, H. G (2012). Automated violin fingering transcription through analysis of an audio recording. Computer Music Journal, 36(3), 57–72.CrossRef Maezawa, A., Itoyama, K., Komatani, K., Ogata, T. & Okuno, H. G (2012). Automated violin fingering transcription through analysis of an audio recording. Computer Music Journal, 36(3), 57–72.CrossRef
Zurück zum Zitat Marolt, M (2012). Automatic transcription of bell chiming recordings. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 844–853.CrossRef Marolt, M (2012). Automatic transcription of bell chiming recordings. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 844–853.CrossRef
Zurück zum Zitat Mauch, M. & Dixon, S (2010). Simultaneous estimation of chords and musical context from audio. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1280–1289.CrossRef Mauch, M. & Dixon, S (2010). Simultaneous estimation of chords and musical context from audio. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1280–1289.CrossRef
Zurück zum Zitat Mauch, M., Noland, K., Dixon, S. (2009). Using musical structure to enhance automatic chord transcription. In 10th int. society for music information retrieval conf. (pp. 231–236). Mauch, M., Noland, K., Dixon, S. (2009). Using musical structure to enhance automatic chord transcription. In 10th int. society for music information retrieval conf. (pp. 231–236).
Zurück zum Zitat McKinney, M., Moelants, D., Davies, M. & Klapuri, A (2007). Evalutation of audio beat tracking and music tempo extraction algorithms. Journal of New. Music Research, 36(1), 1–16.CrossRef McKinney, M., Moelants, D., Davies, M. & Klapuri, A (2007). Evalutation of audio beat tracking and music tempo extraction algorithms. Journal of New. Music Research, 36(1), 1–16.CrossRef
Zurück zum Zitat Müller, M., Ellis, D., Klapuri, A. & Richard, G (2011). Signal processing for music analysis. IEEE J. Selected Topics in Signal Processing, 5(6), 1088–1110.CrossRef Müller, M., Ellis, D., Klapuri, A. & Richard, G (2011). Signal processing for music analysis. IEEE J. Selected Topics in Signal Processing, 5(6), 1088–1110.CrossRef
Zurück zum Zitat Nam, J., Ngiam, J., Lee, H., Slaney, M. (2011). A classification-based polyphonic piano transcription approach using learned feature representations. In 12th int. society for music information retrieval conf. (pp. 175–180). Nam, J., Ngiam, J., Lee, H., Slaney, M. (2011). A classification-based polyphonic piano transcription approach using learned feature representations. In 12th int. society for music information retrieval conf. (pp. 175–180).
Zurück zum Zitat Nesbit, A., Hollenberg, L., Senyard, A. (2004). Towards automatic transcription of Australian aboriginal music. In 5th int. conf. on music information retrieval (pp. 326–330). Nesbit, A., Hollenberg, L., Senyard, A. (2004). Towards automatic transcription of Australian aboriginal music. In 5th int. conf. on music information retrieval (pp. 326–330).
Zurück zum Zitat Noland, K., & Sandler, M. (2006). Key estimation using a hidden markov model. In Proceedings of the 7th international conference on music information retrieval (ISMIR) (pp. 121–126). Noland, K., & Sandler, M. (2006). Key estimation using a hidden markov model. In Proceedings of the 7th international conference on music information retrieval (ISMIR) (pp. 121–126).
Zurück zum Zitat Ochiai, K., Kameoka, H., Sagayama, S. (2012). Explicit beat structure modeling for non-negative matrix factorization-based multipitch analysis. In Int. conf. audio, speech, and signal processing (pp. 133–136). Ochiai, K., Kameoka, H., Sagayama, S. (2012). Explicit beat structure modeling for non-negative matrix factorization-based multipitch analysis. In Int. conf. audio, speech, and signal processing (pp. 133–136).
Zurück zum Zitat O’Hanlon, K., Nagano, H., Plumbley, M. (2012). Structured sparsity for automatic music transcription. In IEEE international conference on audio, speech and signal processing (pp. 441–444). O’Hanlon, K., Nagano, H., Plumbley, M. (2012). Structured sparsity for automatic music transcription. In IEEE international conference on audio, speech and signal processing (pp. 441–444).
Zurück zum Zitat Oram, A., & Wilson, G. (2010). Making software: What really works, and why we believe it. O’Reilly Media, Incorporated. Oram, A., & Wilson, G. (2010). Making software: What really works, and why we believe it. O’Reilly Media, Incorporated.
Zurück zum Zitat Oudre, L., Grenier, Y., Févotte, C. (2009). Template-based chord recognition: Influence of the chord types. In 10th international society for music information retrieval conference (pp. 153–158). Oudre, L., Grenier, Y., Févotte, C. (2009). Template-based chord recognition: Influence of the chord types. In 10th international society for music information retrieval conference (pp. 153–158).
Zurück zum Zitat Özaslan, T., Serra, X., Arcos, J.L. (2012). Characterization of embellishments in Ney performances of Makam music in Turkey. In 13th int. society for music information retrieval conf. Özaslan, T., Serra, X., Arcos, J.L. (2012). Characterization of embellishments in Ney performances of Makam music in Turkey. In 13th int. society for music information retrieval conf.
Zurück zum Zitat Ozerov, A., Vincent, E. & Bimbot, F (2012). A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio, Speech, and Language Processing, 20(4), 1118–1133.CrossRef Ozerov, A., Vincent, E. & Bimbot, F (2012). A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio, Speech, and Language Processing, 20(4), 1118–1133.CrossRef
Zurück zum Zitat Papadopoulos, H., & Peeters, G. (2008). Simultaneous estimation of chord progression and downbeats from an audio file. In IEEE international conference on acoustics, speech and signal processing (pp. 121–124). Papadopoulos, H., & Peeters, G. (2008). Simultaneous estimation of chord progression and downbeats from an audio file. In IEEE international conference on acoustics, speech and signal processing (pp. 121–124).
Zurück zum Zitat Papadopoulos, H. & Peeters, G (2011). Joint estimation of chords and downbeats from an audio signal. IEEE Transactions on Audio, Speech and Language Processing, 19(1), 138–152.CrossRef Papadopoulos, H. & Peeters, G (2011). Joint estimation of chords and downbeats from an audio signal. IEEE Transactions on Audio, Speech and Language Processing, 19(1), 138–152.CrossRef
Zurück zum Zitat Peeling, P. & Godsill, S (2011). Multiple pitch estimation using non-homogeneous Poisson processes. IEEE J. Selected Topics in Signal Processing, 5(6), 1133–1143.CrossRef Peeling, P. & Godsill, S (2011). Multiple pitch estimation using non-homogeneous Poisson processes. IEEE J. Selected Topics in Signal Processing, 5(6), 1133–1143.CrossRef
Zurück zum Zitat Peeters, G. (2006). Musical key estimation of audio signal based on hidden Markov modeling of chroma vectors. In Proceedings of the 9th international conference on digital audio effects (pp. 127–131). Peeters, G. (2006). Musical key estimation of audio signal based on hidden Markov modeling of chroma vectors. In Proceedings of the 9th international conference on digital audio effects (pp. 127–131).
Zurück zum Zitat Pertusa, A., & Iñesta, J.M. (2008). Multiple fundamental frequency estimation using Gaussian smoothness. In int. conf. audio, speech, and signal processing (pp. 105–108). Pertusa, A., & Iñesta, J.M. (2008). Multiple fundamental frequency estimation using Gaussian smoothness. In int. conf. audio, speech, and signal processing (pp. 105–108).
Zurück zum Zitat Poliner, G. & Ellis, D (2007). A discriminative model for polyphonic piano transcription. EURASIP J. Advances in Signal Processing, 8, 154–162. Poliner, G. & Ellis, D (2007). A discriminative model for polyphonic piano transcription. EURASIP J. Advances in Signal Processing, 8, 154–162.
Zurück zum Zitat Poliner, G., Ellis, D., Ehmann, A., Gomez, E., Streich, S. & Ong, B (2007). Melody transcription from music audio: Approaches and evaluation. IEEE Trans. Audio, Speech, and Language Processing, 15(4), 1247–1256.CrossRef Poliner, G., Ellis, D., Ehmann, A., Gomez, E., Streich, S. & Ong, B (2007). Melody transcription from music audio: Approaches and evaluation. IEEE Trans. Audio, Speech, and Language Processing, 15(4), 1247–1256.CrossRef
Zurück zum Zitat Raczyński, S.A., Ono, N., Sagayama, S. (2009). Note detection with dynamic bayesian networks as a postanalysis step for NMF-based multiple pitch estimation techniques. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 49–52). Raczyński, S.A., Ono, N., Sagayama, S. (2009). Note detection with dynamic bayesian networks as a postanalysis step for NMF-based multiple pitch estimation techniques. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 49–52).
Zurück zum Zitat Raczynski, S.A., Vincent, E., Bimbot, F., Sagayama, S., et al. (2010). Multiple pitch transcription using DBN-based musicological models. In 2010 int. society for music information retrieval conf. (ISMIR) (pp. 363–368). Raczynski, S.A., Vincent, E., Bimbot, F., Sagayama, S., et al. (2010). Multiple pitch transcription using DBN-based musicological models. In 2010 int. society for music information retrieval conf. (ISMIR) (pp. 363–368).
Zurück zum Zitat Radicioni, D.P., & Lombardo, V. (2005) Fingering for music performance. In International computer music conference (pp. 527–530). Radicioni, D.P., & Lombardo, V. (2005) Fingering for music performance. In International computer music conference (pp. 527–530).
Zurück zum Zitat Raphael, C. (2005). A graphical model for recognizing sung melodies. In 6th international conference on music information retrieval (pp. 658–663). Raphael, C. (2005). A graphical model for recognizing sung melodies. In 6th international conference on music information retrieval (pp. 658–663).
Zurück zum Zitat Reis, G., Fonseca, N., de Vega, F.F., Ferreira, A. (2008). Hybrid genetic algorithm based on gene fragment competition for polyphonic music transcription. In Conf. applications of evolutionary computing (pp. 305–314). Reis, G., Fonseca, N., de Vega, F.F., Ferreira, A. (2008). Hybrid genetic algorithm based on gene fragment competition for polyphonic music transcription. In Conf. applications of evolutionary computing (pp. 305–314).
Zurück zum Zitat Ryynänen, M., & Klapuri, A. (2005). Polyphonic music transcription using note event modeling. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 319–322). Ryynänen, M., & Klapuri, A. (2005). Polyphonic music transcription using note event modeling. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 319–322).
Zurück zum Zitat Ryynänen, M. & Klapuri, A (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.CrossRef Ryynänen, M. & Klapuri, A (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.CrossRef
Zurück zum Zitat Scheirer, E. (1997). Using musical knowledge to extract expressive performance information from audio recordings. In H. Okuno, D. Rosenthal (Eds.), Readings in computational auditory scene analysis. Lawrence Erlbaum. Scheirer, E. (1997). Using musical knowledge to extract expressive performance information from audio recordings. In H. Okuno, D. Rosenthal (Eds.), Readings in computational auditory scene analysis. Lawrence Erlbaum.
Zurück zum Zitat Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., Gómez, E., Gouyon, F., Herrera, P., Jorda, S., Paytuvi, O., Peeters, G., Schlüter, J., Vinet, H., Widmer, G. (2013). Roadmap for music information research. Creative Commons BY-NC-ND 3.0 license. http://mires.eecs.qmul.ac.uk. Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., Gómez, E., Gouyon, F., Herrera, P., Jorda, S., Paytuvi, O., Peeters, G., Schlüter, J., Vinet, H., Widmer, G. (2013). Roadmap for music information research. Creative Commons BY-NC-ND 3.0 license. http://​mires.​eecs.​qmul.​ac.​uk.
Zurück zum Zitat Smaragdis, P., & Brown, J.C. (2003). Non-negative matrix factorization for polyphonic music transcription. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 177–180). Smaragdis, P., & Brown, J.C. (2003). Non-negative matrix factorization for polyphonic music transcription. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 177–180).
Zurück zum Zitat Smaragdis, P. & Mysore, G. J (2009). Separation by humming: User-guided sound extraction from monophonic mixtures. In, IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). USA: New Paltz. Smaragdis, P. & Mysore, G. J (2009). Separation by humming: User-guided sound extraction from monophonic mixtures. In, IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). USA: New Paltz.
Zurück zum Zitat Smaragdis, P., Raj, B. & Shashanka, M (2006). A probabilistic latent variable model for acoustic modeling. In, Neural information processing systems workshop. Canada: Whistler. Smaragdis, P., Raj, B. & Shashanka, M (2006). A probabilistic latent variable model for acoustic modeling. In, Neural information processing systems workshop. Canada: Whistler.
Zurück zum Zitat Vandewalle, P., Kovacevic, J. & Vetterli, M (2009). Reproducible research in signal processing. Signal Processing Magazine, IEEE, 26(3), 37–47.CrossRef Vandewalle, P., Kovacevic, J. & Vetterli, M (2009). Reproducible research in signal processing. Signal Processing Magazine, IEEE, 26(3), 37–47.CrossRef
Zurück zum Zitat Vincent, E., Bertin, N. & Badeau, R (2010). Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 528–537.CrossRef Vincent, E., Bertin, N. & Badeau, R (2010). Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 528–537.CrossRef
Zurück zum Zitat Wang, Y. & Zhang, B (2008). Application-specific music transcription for tutoring. IEEE MultiMedia, 15(3), 70–74.CrossRef Wang, Y. & Zhang, B (2008). Application-specific music transcription for tutoring. IEEE MultiMedia, 15(3), 70–74.CrossRef
Zurück zum Zitat Wilson, G., Aruliah, D., Brown, C.T., Hong, N.P.C., Davis, M., Guy, R.T., Haddock, S.H., Huff, K., Mitchell, I.M., Plumbley, M.D., et al. (2012). Best practices for scientific computing. arXiv preprint arXiv:1210.0530. Wilson, G., Aruliah, D., Brown, C.T., Hong, N.P.C., Davis, M., Guy, R.T., Haddock, S.H., Huff, K., Mitchell, I.M., Plumbley, M.D., et al. (2012). Best practices for scientific computing. arXiv preprint arXiv:1210.​0530.
Zurück zum Zitat Wu, J., Vincent, E., Raczynski, S., Nishimoto, T., Ono, N., Sagayama, S. (2011). Multipitch estimation by joint modeling of harmonic and transient sounds. In Int. conf. audio, speech, and signal processing (pp. 25–28). Wu, J., Vincent, E., Raczynski, S., Nishimoto, T., Ono, N., Sagayama, S. (2011). Multipitch estimation by joint modeling of harmonic and transient sounds. In Int. conf. audio, speech, and signal processing (pp. 25–28).
Zurück zum Zitat Yeh, C. (2008). Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis, Université Paris VI - Pierre et Marie Curie, France. Yeh, C. (2008). Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis, Université Paris VI - Pierre et Marie Curie, France.
Zurück zum Zitat Yoshii, K. & Goto, M (2012). A nonparametric Bayesian multipitch analyzer based on infinite latent harmonic allocation. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 717–730.CrossRef Yoshii, K. & Goto, M (2012). A nonparametric Bayesian multipitch analyzer based on infinite latent harmonic allocation. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 717–730.CrossRef
Metadaten
Titel
Automatic music transcription: challenges and future directions
verfasst von
Emmanouil Benetos
Simon Dixon
Dimitrios Giannoulis
Holger Kirchhoff
Anssi Klapuri
Publikationsdatum
01.12.2013
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2013
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-013-0258-3

Weitere Artikel der Ausgabe 3/2013

Journal of Intelligent Information Systems 3/2013 Zur Ausgabe

Premium Partner