nach oben

Journal of Intelligent Information Systems

Erschienen in:

01.12.2013

Automatic music transcription: challenges and future directions

verfasst von: Emmanouil Benetos, Simon Dixon, Dimitrios Giannoulis, Holger Kirchhoff, Anssi Klapuri

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects.

Vorheriger Artikel Classification accuracy is not enough

Nächster Artikel Capturing the workflows of music information retrieval for repeatability and reuse

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://staff.aist.go.jp/m.goto/RWC-MDB/AIST-Annotation/SyncRWC/

http://lilypond.org/

Abdallah, S.A., & Plumbley, M.D. (2004). Polyphonic transcription by non-negative sparse coding of power spectra. In 5th int. conf. on music information retrieval (pp. 318–325).

Arberet, S., Ozerov, A., Bimbot, F. & Gribonval, R (2012). A tractable framework for estimating and combining spectral source models for audio source separation. Signal Processing, 92(8), 1886–1901.CrossRef

Barbancho, A., Klapuri, A., Tardon, L. & Barbancho, I (2012). Automatic transcription of guitar chords and fingering from audio. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 915–921.CrossRef

Barbancho, I., de la Bandera, C., Barbancho, A., Tardon, L. (2009). Transcription and expressiveness detection system for violin music. In Int. conf. audio, speech, and signal processing (pp. 189–192).

Barbedo, J. & Tzanetakis, G (2011). Musical instrument classification using individual partials. IEEE Trans. Audio, Speech, and Language Processing, 19(1), 111–122.CrossRef

Bay, M. & Beauchamp, J. W (2012). Multiple-timbre fundamental frequency tracking using an instrument spectrum library. The. Journal of the Acoustical Society of America, 132(3), 1886.CrossRef

Bay, M., Ehmann, A.F., Downie, J.S. (2009). Evaluation of multiple-F0 estimation and tracking systems. In 10th int. society for music information retrieval conf. (pp. 315–320).

Bello, J., Daudet, L., Abdallah, S., Duxbury, C., Davies, M. & Sandler, M (2005). A tutorial on onset detection in musical signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.CrossRef

Bello, J.P. (2003). Towards the automated analysis of simple polyphonic music: A knowledge-based approach. Ph.D. thesis, Department of Electronic Engineering, Queen Mary University of London.

Bello, J. P., Daudet, L. & Sandler, M. B (2006). Automatic piano transcription using frequency and time-domain information. IEEE Transactions on Audio, Speech, and Language Processing, 14(6), 2242–2251.CrossRef

Benetos, E., & Dixon, S. (2011). Polyphonic music transcription using note onset and offset detection. In IEEE international conference on acoustics, speech, and signal processing (pp. 37–40). Prague, Czech Republic.

Benetos, E. & Dixon, S (2012). A shift-invariant latent variable model for automatic music transcription. Computer Music Journal, 36(4), 81–94.CrossRef

Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., Klapuri, A. (2012). Automatic music transcription: Breaking the glass ceiling. In 13th int. society for music information retrieval conf. (pp. 379–384).

Benetos, E., Klapuri, A., Dixon, S. (2012). Score-informed transcription for automatic piano tutoring. In 20th European signal processing conf. (pp. 2153–2157).

Bertin, N., Badeau, R., Richard, G. (2007). Blind signal decompositions for automatic transcription of polyphonic music: NMF and K-SVD on the benchmark. In IEEE international conference on acoustics, speech, and signal processing (pp. 65–68).

Bertin, N., Badeau, R. & Vincent, E (2010). Enforcing harmonicity and smoothness in Bayesian non-negative matrix factorization applied to polyphonic music transcription. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 538–549.CrossRef

Böck, S., Arzt, A., Krebs, F., Schedl, M. (2012). Online realtime onset detection with recurrent neural networks. In Proceedings of the 15th international conference on digital audio effects.

Bosch, J., Janer, J., Fuhrmann, F., Herrera, P. (2012). A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In 13th int. society for music information retrieval conf. (pp. 559–564).

Brown, J (1991). Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89(1), 425–434.CrossRef

Buckheit, J.B., & Donoho, D.L. (1995). WaveLab and reproducible research. Tech. Rep. 474, Dept of Statistics, Stanford Univ.

Burred, J., Robel, A., Sikora, T. (2009). Polyphonic musical instrument recognition based on a dynamic model of the spectral envelope. In Int. conf. audio, speech, and signal processing (pp. 173–176).

Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C. & Slaney, M (2008). Content-based music information retrieval: current directions and future challenges. Proceedings of the IEEE, 96(4), 668–696.CrossRef

Cemgil, A. & Kappen, B (2003). Monte carlo methods for tempo tracking and rhythm quantization. Journal of Artificial Intelligence Research, 18, 45–81.MATH

Cemgil, A.T. (2004). Bayesian music transcription. Ph.D. thesis, Radboud University Nijmegen, Netherlands.

Cemgil, A. T., Kappen, H. J. & Barber, D (2006). A generative model for music transcription. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 679–694.CrossRef

Collins, N. (2005). A comparison of sound onset detection algorithms with emphasis on psychoacoustically motivated detection functions. In 118th convention of the audio engineering society. Barcelona, Spain.

Cont, A. (2006). Realtime multiple pitch observation using sparse non-negative constraints. In 7th international conference on music information retrieval.

Dannenberg, R. (2005). Toward automated holistic beat tracking, music analysis, and understanding. In 6th int. conf. on music information retrieval (pp. 366–373).

Davies, M. & Plumbley, M (2007). Context-dependent beat tracking of musical audio. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 1009–1020.CrossRef

Davy, M., Godsill, S. & Idier, J (2006). Bayesian analysis of western tonal music. Journal of the Acoustical Society of America, 119(4), 2498–2517.CrossRef

Degara, N., Davies, M., Pena, A. & Plumbley, M (2011). Onset event decoding exploiting the rhythmic structure of polyphonic music. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1228–1239.CrossRef

Degara, N., Rua, E. A., Pena, A., Torres-Guijarro, S., Davies, M. & Plumbley, M (2012). Reliability-informed beat tracking of musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 290–301.CrossRef

Desain, P. & Honing, H (1999). Computational models of beat induction: the rule-based approach. Journal of New. Music Research, 28(1), 29–42.CrossRef

Dessein, A., Cont, A., Lemaitre, G. (2010). Real-time polyphonic music transcription with non-negative matrix factorization and beta-divergence. In 11th int. society for music information retrieval conf. (pp. 489–494).

Dittmar, C., & Abeßer, J. (2008). Automatic music transcription with user interaction. In 34. Deutsche jahrestagung für akustik (DAGA) (pp. 567–568).

Dittmar, C., Cano, E., Abeßer, J., Grollmisch, S. (2012). Music information retrieval meets music education. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 95–120). Schloss Dagstuhl–Leibniz-Zentrum für Informatik.

Dixon, S (2001). Automatic extraction of tempo and beat from expressive performances. Journal of New. Music Research, 30(1), 39–58.CrossRef

Dixon, S., Goebl, W. & Cambouropoulos, E (2006). Perceptual smoothness of tempo in expressively performed music. Music Perception, 23(3), 195–214.CrossRef

Dressler, K. (2012). Multiple fundamental frequency extraction for MIREX 2012. In Music information retrieval evaluation eXchange. http:www.music-ir.org/mirex/abstracts/2012/KD1.pdf.

Duan, Z., Pardo, B. & Zhang, C (2010). Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2121–2133.CrossRef

Durrieu, J., & Thiran, J. (2012). Musical audio source separation based on user-selected F0 track. In 10th int. conf. latent variable analysis and source separation (pp. 438–445).

Eggink, J., & Brown, G. (2003). A missing feature approach to instrument identification in polyphonic music. In Int. conf. audio, speech, and signal processing (Vol. 5, pp. 553–556).

Emiya, V., Badeau, R. & David, B (2010). Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1643–1654.CrossRef

Ewert, S., & Müller, M. (2011). Estimating note intensities in music recordings. In Int. conf. audio, speech, and signal processing (pp. 385–388).

Ewert, S., & Müller, M. (2012). Using score-informed constraints for NMF-based source separation. In Int. conf. audio, speech, and signal processing (pp. 129–132).

Ewert, S., Muller, M., Grosche, P. (2009). High resolution audio synchronization using chroma onset features. In IEEE international conference on audio, speech and signal processing (pp. 1869–1872).

Eyben, F., Böck, S., Schuller, B., Graves, A. (2012). Universal onset detection with bidirectional long short-term memory neural networks. In 11th international society for music information retrieval conference.

Fourer, D., & Marchand, S. (2012). Informed multiple-F0 estimation applied to monaural audio source separation. In 20th European signal processing conf. (pp. 2158–2162).

Freund, Y., Schapire, R. & Abe, N (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5), 771–780.

Fuentes, B., Badeau, R., Richard, G. (2011). Adaptive harmonic time-frequency decomposition of audio using shift-invariant PLCA. In Int. conf. audio, speech, and signal processing (pp. 401–404).

Fuentes, B., Badeau, R., Richard, G. (2012). Blind harmonic adaptive decomposition applied to supervised source separation. In 20th European signal processing conf. (pp. 2654–2658).

Gang, R., Bocko, G., Lundberg, J., Roessner, S., Headlam, D., Bocko, M. (2011). A real-time signal processing framework of musical expressive feature extraction using MATLAB. In 12th int. society for music information retrieval conf. (pp. 115–120).

Giannoulis, D., & Klapuri, A. (2013). Musical instrument recognition in polyphonic audio using missing feature approach. In IEEE transactions on audio, speech, and language processing (Vol. 21, no. 9, pp. 1805–1817). doi:10.1109/TASL.2013.2248720.

Gillet, O., & Richard, G. (2003). Automatic labelling of tabla signals. In 4th int. conf. on music information retrieval.

Goto, M (2004). A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43, 311–329.CrossRef

Goto, M. (2012). Grand challenges in music information research. In M. Müller, M. Goto, M. Schedl (Eds.), Multimodal music processing. Dagstuhl follow-ups (Vol. 3, pp. 217–225). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.

Goto, M., Hashiguchi, H., Nishimura, T., Oka, R. (2002). RWC music database: Popular, classical, and jazz music databases. In Proc. ISMIR (Vol. 2, pp. 287–288).

Gouyon, F. & Dixon, S (2005). A review of automatic rhythm description systems. Computer Music Journal, 29(1), 34–54.CrossRef

Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G. & Uhle, C (2006). An experimental comparison of audio tempo induction algorithms. IEEE Transactions on Audio, Speech and Language Processing, 14(5), 1832–1844.CrossRef

Grindlay, G. & Ellis, D (2011). Transcribing multi-instrument polyphonic music with hierarchical eigeninstruments. IEEE Journal of Selected Topics in Signal Processing, 5(6), 1159–1169.CrossRef

Grosche, P., Schuller, B., Müller, M. & Rigoll, G (2012). Automatic transcription of recorded music. Acta. Acustica United with Acustica, 98(2), 199–215.CrossRef

Heittola, T., Klapuri, A., Virtanen, T. (2009). Musical instrument recognition in polyphonic audio using source-filter model for sound separation. In 10th int. society for music information retrieval conf. (pp. 327–332).

Herrera-Boyer, P., Klapuri, A., Davy, M. (2006). Automatic classification of pitched musical instrument sounds. In Signal processing methods for music transcription (pp. 163–200).

Holzapfel, A., Stylianou, Y., Gedik, A. & Bozkurt, B (2010). Three dimensions of pitched instrument onset detection. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1517–1527.CrossRef

Huang, X., Acero, A., Hon, H.W. (Eds.). (2001). Spoken language processing: A guide to theory, algorithm and system development. Prentice Hall.

Humphrey, E.J., Bello, J.P., LeCun, Y. (2013). Feature learning and deep architectures: new directions for music informatics. Journal of Intelligent Information Systems. doi:10.1007/s10844-013-0248-5.

Itoyama, K., Goto, M., Komatani, K., Ogata, T., Okuno, H. (2011). Simultaneous processing of sound source separation and musical instrument identification using Bayesian spectral modeling. In Int. conf. audio, speech, and signal processing (pp. 3816–3819).

Izmirli, O. (2005). An algorithm for audio key finding. In Music information retrieval evaluation exchange. http://www.music-ir.org/mirex/abstracts/2005/izmirli.pdf.

Kameoka, H., Nishimoto, T. & Sagayama, S (2007). A multipitch analyzer based on harmonic temporal structured clustering. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 982–994.CrossRef

Kameoka, H., Ochiai, K., Nakano, M., Tsuchiya, M., Sagayama, S. (2012). Context-free 2D tree structure model of musical notes for Bayesian modeling of polyphonic spectrograms. In 13th int. society for music information retrieval conf. (pp. 307–312).

Kasimi, A.A., Nichols, E., Raphael, C. (2007). A simple algorithm for automatic generation of polyphonic piano fingerings. In 8th international conference on music information retrieval (pp. 355–356). Vienna, Austria.

Kirchhoff, H., Dixon, S., Klapuri, A. (2012). Shift-variant non-negative matrix deconvolution for music transcription. In Int. conf. audio, speech, and signal processing (pp. 125–128).

Kitahara, T., Goto, M., Komatani, K., Ogata, T. & Okuno, H. G (2007). Instrogram: probabilistic representation of instrument existence for polyphonic music. Information and Media Technologies, 2(1), 279–291.

Klapuri, A (2003). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Audio, Speech, and Language Processing, 11(6), 804–816.CrossRef

Klapuri, A., Davy, M. (Eds.). (2006). Signal processing methods for music transcription. Springer.

Klapuri, A., Eronen, A. & Astola, J (2006). Analysis of the meter of acoustic musical signals. IEEE Transactions on Audio, Speech, and Language Processing, 14(1), 342–355.CrossRef

Klapuri, A., Eronen, A., Seppänen, J., Virtanen, T. (2001). Automatic transcription of music. In Symposium on stochastic modeling of music. Ghent, Belgium.

Koretz, A. & Tabrikian, J (2011). Maximum a posteriori probability multiple pitch tracking using the harmonic model. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2210–2221.CrossRef

Lacoste, A., & Eck, D. (2007). A supervised classification algorithm for note onset detection. EURASIP Journal on Applied Signal Processing, 2007(1), 1–13. ID 43745.

Large, E. & Kolen, J (1994). Resonance and the perception of musical meter. Connection Science, 6, 177–208.CrossRef

Lee, C. T., Yang, Y. H. & Chen, H (2012). Multipitch estimation of piano music by exemplar-based sparse representation. IEEE Trans. Multimedia, 14(3), 608–618.CrossRef

Lee, K. & Slaney, M (2008). Acoustic chord transcription and key extraction from audio using key-dependent hmms trained on synthesized audio. IEEE Transactions on Audio, Speech and Language Processing, 16(2), 291–301.CrossRef

Leveau, P., Vincent, E., Richard, G. & Daudet, L (2008). Instrument-specific harmonic atoms for mid-level music representation. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 116–128.CrossRef

Little, D., & Pardo, B. (2008). Learning musical instruments from mixtures of audio with weak labels. In 9th int. conf. on music information retrieval (p. 127).

Loscos, A., Wang, Y., Boo, W. (2006). Low level descriptors for automatic violin transcription. In 7th int. conf. on music information retrieval (pp. 164–167).

Maezawa, A., Itoyama, K., Komatani, K., Ogata, T. & Okuno, H. G (2012). Automated violin fingering transcription through analysis of an audio recording. Computer Music Journal, 36(3), 57–72.CrossRef

Marolt, M (2012). Automatic transcription of bell chiming recordings. IEEE Transactions on Audio, Speech, and Language Processing, 20(3), 844–853.CrossRef

Mauch, M. & Dixon, S (2010). Simultaneous estimation of chords and musical context from audio. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1280–1289.CrossRef

Mauch, M., Noland, K., Dixon, S. (2009). Using musical structure to enhance automatic chord transcription. In 10th int. society for music information retrieval conf. (pp. 231–236).

McKinney, M., Moelants, D., Davies, M. & Klapuri, A (2007). Evalutation of audio beat tracking and music tempo extraction algorithms. Journal of New. Music Research, 36(1), 1–16.CrossRef

Music Information Retrieval Evaluation eXchange (MIREX) (2011). http://music-ir.org/mirexwiki/. Accessed 8 Jul 2013.

Müller, M., Ellis, D., Klapuri, A. & Richard, G (2011). Signal processing for music analysis. IEEE J. Selected Topics in Signal Processing, 5(6), 1088–1110.CrossRef

Nam, J., Ngiam, J., Lee, H., Slaney, M. (2011). A classification-based polyphonic piano transcription approach using learned feature representations. In 12th int. society for music information retrieval conf. (pp. 175–180).

Nesbit, A., Hollenberg, L., Senyard, A. (2004). Towards automatic transcription of Australian aboriginal music. In 5th int. conf. on music information retrieval (pp. 326–330).

Noland, K., & Sandler, M. (2006). Key estimation using a hidden markov model. In Proceedings of the 7th international conference on music information retrieval (ISMIR) (pp. 121–126).

Ochiai, K., Kameoka, H., Sagayama, S. (2012). Explicit beat structure modeling for non-negative matrix factorization-based multipitch analysis. In Int. conf. audio, speech, and signal processing (pp. 133–136).

O’Hanlon, K., Nagano, H., Plumbley, M. (2012). Structured sparsity for automatic music transcription. In IEEE international conference on audio, speech and signal processing (pp. 441–444).

Oram, A., & Wilson, G. (2010). Making software: What really works, and why we believe it. O’Reilly Media, Incorporated.

Oudre, L., Grenier, Y., Févotte, C. (2009). Template-based chord recognition: Influence of the chord types. In 10th international society for music information retrieval conference (pp. 153–158).

Özaslan, T., Serra, X., Arcos, J.L. (2012). Characterization of embellishments in Ney performances of Makam music in Turkey. In 13th int. society for music information retrieval conf.

Ozerov, A., Vincent, E. & Bimbot, F (2012). A general flexible framework for the handling of prior information in audio source separation. IEEE Trans. Audio, Speech, and Language Processing, 20(4), 1118–1133.CrossRef

Papadopoulos, H., & Peeters, G. (2008). Simultaneous estimation of chord progression and downbeats from an audio file. In IEEE international conference on acoustics, speech and signal processing (pp. 121–124).

Papadopoulos, H. & Peeters, G (2011). Joint estimation of chords and downbeats from an audio signal. IEEE Transactions on Audio, Speech and Language Processing, 19(1), 138–152.CrossRef

Peeling, P. & Godsill, S (2011). Multiple pitch estimation using non-homogeneous Poisson processes. IEEE J. Selected Topics in Signal Processing, 5(6), 1133–1143.CrossRef

Peeters, G. (2006). Musical key estimation of audio signal based on hidden Markov modeling of chroma vectors. In Proceedings of the 9th international conference on digital audio effects (pp. 127–131).

Pertusa, A., & Iñesta, J.M. (2008). Multiple fundamental frequency estimation using Gaussian smoothness. In int. conf. audio, speech, and signal processing (pp. 105–108).

Poliner, G. & Ellis, D (2007). A discriminative model for polyphonic piano transcription. EURASIP J. Advances in Signal Processing, 8, 154–162.

Poliner, G., Ellis, D., Ehmann, A., Gomez, E., Streich, S. & Ong, B (2007). Melody transcription from music audio: Approaches and evaluation. IEEE Trans. Audio, Speech, and Language Processing, 15(4), 1247–1256.CrossRef

Raczyński, S.A., Ono, N., Sagayama, S. (2009). Note detection with dynamic bayesian networks as a postanalysis step for NMF-based multiple pitch estimation techniques. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 49–52).

Raczynski, S.A., Vincent, E., Bimbot, F., Sagayama, S., et al. (2010). Multiple pitch transcription using DBN-based musicological models. In 2010 int. society for music information retrieval conf. (ISMIR) (pp. 363–368).

Radicioni, D.P., & Lombardo, V. (2005) Fingering for music performance. In International computer music conference (pp. 527–530).

Raphael, C. (2005). A graphical model for recognizing sung melodies. In 6th international conference on music information retrieval (pp. 658–663).

Reis, G., Fonseca, N., de Vega, F.F., Ferreira, A. (2008). Hybrid genetic algorithm based on gene fragment competition for polyphonic music transcription. In Conf. applications of evolutionary computing (pp. 305–314).

Röbel, A. (2005). Onset detection in polyphonic signals by means of transient peak classification. In Music information retrieval evaluation exchange. http://www.music-ir.org/evaluation/mirex-results/articles/onset/roebel.pdf.

Ryynänen, M., & Klapuri, A. (2005). Polyphonic music transcription using note event modeling. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 319–322).

Ryynänen, M. & Klapuri, A (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.CrossRef

Scheirer, E. (1997). Using musical knowledge to extract expressive performance information from audio recordings. In H. Okuno, D. Rosenthal (Eds.), Readings in computational auditory scene analysis. Lawrence Erlbaum.

Serra, X., Magas, M., Benetos, E., Chudy, M., Dixon, S., Flexer, A., Gómez, E., Gouyon, F., Herrera, P., Jorda, S., Paytuvi, O., Peeters, G., Schlüter, J., Vinet, H., Widmer, G. (2013). Roadmap for music information research. Creative Commons BY-NC-ND 3.0 license. http://mires.eecs.qmul.ac.uk.

Smaragdis, P., & Brown, J.C. (2003). Non-negative matrix factorization for polyphonic music transcription. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 177–180).

Smaragdis, P. & Mysore, G. J (2009). Separation by humming: User-guided sound extraction from monophonic mixtures. In, IEEE workshop on applications of signal processing to audio and acoustics (WASPAA). USA: New Paltz.

Smaragdis, P., Raj, B. & Shashanka, M (2006). A probabilistic latent variable model for acoustic modeling. In, Neural information processing systems workshop. Canada: Whistler.

Vandewalle, P., Kovacevic, J. & Vetterli, M (2009). Reproducible research in signal processing. Signal Processing Magazine, IEEE, 26(3), 37–47.CrossRef

Vincent, E., Bertin, N. & Badeau, R (2010). Adaptive harmonic spectral decomposition for multiple pitch estimation. IEEE Trans. Audio, Speech, and Language Processing, 18(3), 528–537.CrossRef

Wang, Y. & Zhang, B (2008). Application-specific music transcription for tutoring. IEEE MultiMedia, 15(3), 70–74.CrossRef

Wilson, G., Aruliah, D., Brown, C.T., Hong, N.P.C., Davis, M., Guy, R.T., Haddock, S.H., Huff, K., Mitchell, I.M., Plumbley, M.D., et al. (2012). Best practices for scientific computing. arXiv preprint arXiv:1210.0530.

Wu, J., Vincent, E., Raczynski, S., Nishimoto, T., Ono, N., Sagayama, S. (2011). Multipitch estimation by joint modeling of harmonic and transient sounds. In Int. conf. audio, speech, and signal processing (pp. 25–28).

Yeh, C. (2008). Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis, Université Paris VI - Pierre et Marie Curie, France.

Yoshii, K. & Goto, M (2012). A nonparametric Bayesian multipitch analyzer based on infinite latent harmonic allocation. IEEE Trans. Audio, Speech, and Language Processing, 20(3), 717–730.CrossRef

Zhou, R., & Reiss, J. (2007). Music onset detection combining energy-based and pitch-based approaches. In Music information retrieval evaluation exchange. http://www.music-ir.org/mirex/abstracts/2007/OD_zhou.pdf.

Titel: Automatic music transcription: challenges and future directions
verfasst von: Emmanouil Benetos
Simon Dixon
Dimitrios Giannoulis
Holger Kirchhoff
Anssi Klapuri
Publikationsdatum: 01.12.2013
Verlag: Springer US
Erschienen in: Journal of Intelligent Information Systems / Ausgabe 3/2013
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-013-0258-3

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2013

Capturing the workflows of music information retrieval for repeatability and reuse

Toward an understanding of the history and impact of user studies in music information retrieval

Seven problems that keep MIR from attracting the interest of cognition and neuroscience

MIRrors: Music Information Research reflects on its future

The neglected user in music information retrieval research

Evaluation in Music Information Retrieval

Premium Partner