Skip to main content
Top
Published in: Multimedia Systems 6/2019

12-03-2019 | Special Issue Paper

Deep learning-based automatic downbeat tracking: a brief review

Authors: Bijue Jia, Jiancheng Lv, Dayiheng Liu

Published in: Multimedia Systems | Issue 6/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

As an important format of multimedia, music has filled almost everyone’s life. Automatic analyzing of music is a significant step to satisfy people’s need for music retrieval and music recommendation in an effortless way. Thereinto, downbeat tracking has been a fundamental and continuous problem in Music Information Retrieval (MIR) area. Despite significant research efforts, downbeat tracking still remains a challenge. Previous researches either focus on feature engineering (extracting certain features by signal processing, which are semi-automatic solutions); or have some limitations: they can only model music audio recordings within limited time signatures and tempo ranges. Recently, deep learning has surpassed traditional machine learning methods and has become the primary algorithm in feature learning; the combination of traditional and deep learning methods also has made better performance. In this paper, we begin with a background introduction of downbeat tracking problem. Then, we give detailed discussions of the following topics: system architecture, feature extraction, deep neural network algorithms, data sets, and evaluation strategy. In addition, we take a look at the results from the annual benchmark evaluation—Music Information Retrieval Evaluation eXchange—as well as the developments in software implementations. Although much has been achieved in the area of automatic downbeat tracking, some problems still remain. We point out these problems and conclude with possible directions and challenges for future research.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
For a more comprehensive survey of MIR, containing background, history, fundamentals, tasks and applications, we refer readers to the overview by  [14, 1820].
 
2
A more complete list of data sets for MIR research is at: http://​www.​audiocontentanal​ysis.​org/​data-sets/​.
 
6
Here we only talk about the six data sets used since 2014 because there are no comparisons for RWC classical and GTZAN.
 
11
Note that almost all researchers of automatic downbeat tracking have participated in MIREX Automatic Downbeat Estimation task and systems they proposed in their papers are similar to ones in MIREX. So results in Table 2 are quite representative and sufficient enough to analysis
 
Literature
1.
go back to reference Lerdahl, F., Jackendoff, R.S.: A Generative Theory of Tonal Music. MIT Press, Cambridge (1985) Lerdahl, F., Jackendoff, R.S.: A Generative Theory of Tonal Music. MIT Press, Cambridge (1985)
2.
go back to reference Sigtia, S., Benetos, E., Cherla, S., Weyde, T., Garcez, A.S.d., Dixon, S.: RNN-based music language models for improving automatic music transcription. In: International Society for Music Information Retrieval Conference (2014) Sigtia, S., Benetos, E., Cherla, S., Weyde, T., Garcez, A.S.d., Dixon, S.: RNN-based music language models for improving automatic music transcription. In: International Society for Music Information Retrieval Conference (2014)
3.
go back to reference Sigtia, S., Benetos, E., Dixon, S.: An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24(5), 927–939 (2016)CrossRef Sigtia, S., Benetos, E., Dixon, S.: An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24(5), 927–939 (2016)CrossRef
4.
go back to reference Sturm, B.L., Santos, J.a.F., Ben-Tal, O., Korshunova, I.: Music transcription modelling and composition using deep learning (2016). arXiv:1604.08723 Sturm, B.L., Santos, J.a.F., Ben-Tal, O., Korshunova, I.: Music transcription modelling and composition using deep learning (2016). arXiv:​1604.​08723
5.
go back to reference Cogliati, A., Duan, Z., Wohlberg, B.: Context-dependent piano music transcription with convolutional sparse coding. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2218–2230 (2016)CrossRef Cogliati, A., Duan, Z., Wohlberg, B.: Context-dependent piano music transcription with convolutional sparse coding. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2218–2230 (2016)CrossRef
6.
go back to reference Oudre, L., Févotte, C., Grenier, Y.: Probabilistic template-based chord recognition. IEEE Trans. Audio Speech Lang. Process. 19(8), 2249–2259 (2011)CrossRef Oudre, L., Févotte, C., Grenier, Y.: Probabilistic template-based chord recognition. IEEE Trans. Audio Speech Lang. Process. 19(8), 2249–2259 (2011)CrossRef
7.
go back to reference Di Giorgi, B., Zanoni, M., Sarti, A., Tubaro, S.: Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony. Multidimensional Systems (nDS). In: Proceedings of the 8th International Workshop on (VDE, 2013), pp. 1–6 (2013) Di Giorgi, B., Zanoni, M., Sarti, A., Tubaro, S.: Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony. Multidimensional Systems (nDS). In: Proceedings of the 8th International Workshop on (VDE, 2013), pp. 1–6 (2013)
8.
go back to reference Maddage, N.C.: Automatic structure detection for popular music. IEEE Multimed. 13(1), 65–77 (2006)CrossRef Maddage, N.C.: Automatic structure detection for popular music. IEEE Multimed. 13(1), 65–77 (2006)CrossRef
9.
go back to reference Serra, J., Müller, M., Grosche, P., Arcos, J.L.: Unsupervised music structure annotation by time series structure features and segment similarity. IEEE Trans. Multimed. 16(5), 1229–1240 (2014)CrossRef Serra, J., Müller, M., Grosche, P., Arcos, J.L.: Unsupervised music structure annotation by time series structure features and segment similarity. IEEE Trans. Multimed. 16(5), 1229–1240 (2014)CrossRef
10.
go back to reference Panagakis, Y., Kotropoulos, C.: Elastic net subspace clustering applied to pop/rock music structure analysis. Pattern Recogn. Lett. 38, 46–53 (2014)CrossRef Panagakis, Y., Kotropoulos, C.: Elastic net subspace clustering applied to pop/rock music structure analysis. Pattern Recogn. Lett. 38, 46–53 (2014)CrossRef
11.
go back to reference Pauwels, J., Kaiser, F., Peeters, G.: Combining Harmony-Based and Novelty-Based Approaches for Structural Segmentation. In: International Society for Music Information Retrieval Conference, pp. 601–606 (2013) Pauwels, J., Kaiser, F., Peeters, G.: Combining Harmony-Based and Novelty-Based Approaches for Structural Segmentation. In: International Society for Music Information Retrieval Conference, pp. 601–606 (2013)
12.
go back to reference Müller, M.: Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, New York (2015)CrossRef Müller, M.: Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, New York (2015)CrossRef
14.
go back to reference Downie, J.S.: Music information retrieval. Ann. Rev. Inf. Sci. Technol. 37(1), 295–340 (2003)CrossRef Downie, J.S.: Music information retrieval. Ann. Rev. Inf. Sci. Technol. 37(1), 295–340 (2003)CrossRef
15.
16.
go back to reference Park, S.H., Ihm, S.Y., Jang, W.I., Nasridinov, A., Park, Y.H.: A Music Recommendation Method with Emotion Recognition Using Ranked Attributes. Springer, Berlin, Heidelberg (2015)CrossRef Park, S.H., Ihm, S.Y., Jang, W.I., Nasridinov, A., Park, Y.H.: A Music Recommendation Method with Emotion Recognition Using Ranked Attributes. Springer, Berlin, Heidelberg (2015)CrossRef
17.
go back to reference Yang, X., Dong, Y., Li, J.: Review of data features-based music emotion recognition methods. Multimed. Syst. 24(4), 365–389 (2018)CrossRef Yang, X., Dong, Y., Li, J.: Review of data features-based music emotion recognition methods. Multimed. Syst. 24(4), 365–389 (2018)CrossRef
18.
go back to reference Typke, R., Wiering, F., Veltkamp, R.C.: A survey of music information retrieval systems. In: Proc. 6th International Conference on Music Information Retrieval, pp. 153–160. Queen Mary, University of London, London (2005) Typke, R., Wiering, F., Veltkamp, R.C.: A survey of music information retrieval systems. In: Proc. 6th International Conference on Music Information Retrieval, pp. 153–160. Queen Mary, University of London, London (2005)
19.
go back to reference Casey, M.A., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: Current directions and future challenges. Proc. IEEE 96(4), 668–696 (2008)CrossRef Casey, M.A., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: Current directions and future challenges. Proc. IEEE 96(4), 668–696 (2008)CrossRef
20.
go back to reference Downie, J.S.: The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoust. Sci. Technol. 29(4), 247–255 (2008)CrossRef Downie, J.S.: The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoust. Sci. Technol. 29(4), 247–255 (2008)CrossRef
21.
go back to reference Goto, M., Muraoka, Y.: A beat tracking system for acoustic signals of music. In: Proceedings of the Second ACM International Conference on Multimedia, pp. 365–372. ACM, New York (1994) Goto, M., Muraoka, Y.: A beat tracking system for acoustic signals of music. In: Proceedings of the Second ACM International Conference on Multimedia, pp. 365–372. ACM, New York (1994)
22.
go back to reference Goto, M., Muraoka, Y.: A Real-Time Beat Tracking System for Audio Signals. In: ICMC (1995) Goto, M., Muraoka, Y.: A Real-Time Beat Tracking System for Audio Signals. In: ICMC (1995)
23.
go back to reference Goto, M.: An audio-based real-time beat tracking system for music with or without drum-sounds. J. New Music Res. 30(2), 159–171 (2001)CrossRef Goto, M.: An audio-based real-time beat tracking system for music with or without drum-sounds. J. New Music Res. 30(2), 159–171 (2001)CrossRef
24.
go back to reference Davies, M.E., Plumbley, M.D.: Beat tracking with a two state model [music applications]. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP'05), vol. 3, pp. iii–241. IEEE (2005) Davies, M.E., Plumbley, M.D.: Beat tracking with a two state model [music applications]. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP'05), vol. 3, pp. iii–241. IEEE (2005)
25.
go back to reference Seppänen, J., Eronen, A.J., Hiipakka, J.: Joint Beat & Tatum Tracking from Music Signals. In: ISMIR, pp. 23–28 (2006) Seppänen, J., Eronen, A.J., Hiipakka, J.: Joint Beat & Tatum Tracking from Music Signals. In: ISMIR, pp. 23–28 (2006)
26.
go back to reference Gkiokas, A., Katsouros, V., Carayannis, G., Stajylakis, T.: Music tempo estimation and beat tracking by applying source separation and metrical relations. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 421–424. IEEE (2012) Gkiokas, A., Katsouros, V., Carayannis, G., Stajylakis, T.: Music tempo estimation and beat tracking by applying source separation and metrical relations. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 421–424. IEEE (2012)
27.
go back to reference Peeters, G., Papadopoulos, H.: Simultaneous beat and downbeat-tracking using a probabilistic framework: Theory and large-scale evaluation. IEEE Trans. Audio Speech Lang. Process. 19(6), 1754–1769 (2011)CrossRef Peeters, G., Papadopoulos, H.: Simultaneous beat and downbeat-tracking using a probabilistic framework: Theory and large-scale evaluation. IEEE Trans. Audio Speech Lang. Process. 19(6), 1754–1769 (2011)CrossRef
28.
go back to reference Krebs, F., Böck, S., Widmer, G.: Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio. In: ISMIR, pp. 227–232 (2013) Krebs, F., Böck, S., Widmer, G.: Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio. In: ISMIR, pp. 227–232 (2013)
29.
go back to reference Krebs, F., Korzeniowski, F., Grachten, M., Widmer, G.: Unsupervised learning and refinement of rhythmic patterns for beat and downbeat tracking. In: 2014 Proceedings of the 22nd European, Signal Processing Conference (EUSIPCO), pp. 611–615. IEEE (2014) Krebs, F., Korzeniowski, F., Grachten, M., Widmer, G.: Unsupervised learning and refinement of rhythmic patterns for beat and downbeat tracking. In: 2014 Proceedings of the 22nd European, Signal Processing Conference (EUSIPCO), pp. 611–615. IEEE (2014)
30.
go back to reference Böck, S., Krebs, F., Widmer, G.: Joint Beat and Downbeat Tracking with Recurrent Neural Networks. In: ISMIR, pp. 255–261 (2016) Böck, S., Krebs, F., Widmer, G.: Joint Beat and Downbeat Tracking with Recurrent Neural Networks. In: ISMIR, pp. 255–261 (2016)
31.
go back to reference Goto, M., Muraoka, Y.: Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions. Speech Commun. 27(3–4), 311 (1999)CrossRef Goto, M., Muraoka, Y.: Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions. Speech Commun. 27(3–4), 311 (1999)CrossRef
32.
go back to reference Davies, M.E., Plumbley, M.D.: A spectral difference approach to downbeat extraction in musical audio. In: Proceedings of the 14th European Signal Processing Conference (EUSIPCO), pp. 1–4 (2006) Davies, M.E., Plumbley, M.D.: A spectral difference approach to downbeat extraction in musical audio. In: Proceedings of the 14th European Signal Processing Conference (EUSIPCO), pp. 1–4 (2006)
33.
go back to reference Durand, S., David, B., Richard, G.: Enhancing downbeat detection when facing different music styles. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3132–3136 (2014) Durand, S., David, B., Richard, G.: Enhancing downbeat detection when facing different music styles. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3132–3136 (2014)
34.
go back to reference Klapuri, A.P., Eronen, A.J., Astola, J.T.: Analysis of the meter of acoustic musical signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 342–355 (2006)CrossRef Klapuri, A.P., Eronen, A.J., Astola, J.T.: Analysis of the meter of acoustic musical signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 342–355 (2006)CrossRef
35.
go back to reference Jehan, T.: Downbeat prediction by listening and learning. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. pp. 267–270. IEEE (2005) Jehan, T.: Downbeat prediction by listening and learning. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. pp. 267–270. IEEE (2005)
36.
go back to reference Papadopoulos, H., Peeters, G.: Joint estimation of chords and downbeats from an audio signal. IEEE Trans. Audio Speech Lang. Process. 19(1), 138–152 (2011)CrossRef Papadopoulos, H., Peeters, G.: Joint estimation of chords and downbeats from an audio signal. IEEE Trans. Audio Speech Lang. Process. 19(1), 138–152 (2011)CrossRef
37.
go back to reference Gärtner, D.: Unsupervised learning of the downbeat in drum patterns. In: Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society (2014) Gärtner, D.: Unsupervised learning of the downbeat in drum patterns. In: Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society (2014)
38.
go back to reference Hockman, J., Davies, M.E., Fujinaga, I.: One in the Jungle: Downbeat Detection in Hardcore, Jungle, and Drum and Bass. In: ISMIR, pp. 169–174 (2012) Hockman, J., Davies, M.E., Fujinaga, I.: One in the Jungle: Downbeat Detection in Hardcore, Jungle, and Drum and Bass. In: ISMIR, pp. 169–174 (2012)
39.
go back to reference Srinivasamurthy, A., Holzapfel, A., Serra, X.: In search of automatic rhythm analysis methods for turkish and indian art music. J. New Music Res. 43(1), 94–114 (2014)CrossRef Srinivasamurthy, A., Holzapfel, A., Serra, X.: In search of automatic rhythm analysis methods for turkish and indian art music. J. New Music Res. 43(1), 94–114 (2014)CrossRef
40.
go back to reference Allan, H.: Bar lines and beyond-meter tracking in digital audio. Mémoire de DEA School Inf. Univ. Edinb. 27, 28 (2004) Allan, H.: Bar lines and beyond-meter tracking in digital audio. Mémoire de DEA School Inf. Univ. Edinb. 27, 28 (2004)
41.
go back to reference Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)MATH Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)MATH
42.
go back to reference Wang, X., Wang, Y.: Improving content-based and hybrid music recommendation using deep learning. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 627–636. ACM, New York (2014) Wang, X., Wang, Y.: Improving content-based and hybrid music recommendation using deep learning. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 627–636. ACM, New York (2014)
43.
go back to reference Yan, Y., Chen, M., Shyu, M.L., Chen, S.C.: Deep learning for imbalanced multimedia data classification. In: 2015 IEEE International Symposium on Multimedia (ISM), pp. 483–488. IEEE (2015) Yan, Y., Chen, M., Shyu, M.L., Chen, S.C.: Deep learning for imbalanced multimedia data classification. In: 2015 IEEE International Symposium on Multimedia (ISM), pp. 483–488. IEEE (2015)
44.
go back to reference Zou, H., Du, J.X., Zhai, C.M., Wang, J.: Deep learning and shared representation space learning based cross-modal multimedia retrieval. In: International Conference on Intelligent Computing, pp. 322–331. Springer, New York (2016)CrossRef Zou, H., Du, J.X., Zhai, C.M., Wang, J.: Deep learning and shared representation space learning based cross-modal multimedia retrieval. In: International Conference on Intelligent Computing, pp. 322–331. Springer, New York (2016)CrossRef
45.
go back to reference Nie, W., Cao, Q., Liu, A., Su, Y.: Convolutional deep learning for 3d object retrieval. Multimed. Syst. 23(3), 325–332 (2017)CrossRef Nie, W., Cao, Q., Liu, A., Su, Y.: Convolutional deep learning for 3d object retrieval. Multimed. Syst. 23(3), 325–332 (2017)CrossRef
46.
go back to reference Durand, S., Bello, J.P., David, B., Richard, G.: Downbeat tracking with multiple features and deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 409–413. IEEE (2015) Durand, S., Bello, J.P., David, B., Richard, G.: Downbeat tracking with multiple features and deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 409–413. IEEE (2015)
47.
go back to reference Durand, S., Bello, J.P., David, B., Richard, G.: Feature adapted convolutional neural networks for downbeat tracking. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 296–300. IEEE (2016) Durand, S., Bello, J.P., David, B., Richard, G.: Feature adapted convolutional neural networks for downbeat tracking. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 296–300. IEEE (2016)
48.
go back to reference Durand, S., Bello, J.P., David, B., Richard, G.: Robust downbeat tracking using an ensemble of convolutional networks. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 25(1), 76–89 (2017)CrossRef Durand, S., Bello, J.P., David, B., Richard, G.: Robust downbeat tracking using an ensemble of convolutional networks. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 25(1), 76–89 (2017)CrossRef
49.
go back to reference Krebs, F., Böck, S., Dorfer, M., Widmer, G.: Downbeat Tracking Using Beat Synchronous Features with Recurrent Neural Networks. In: ISMIR, pp. 129–135 (2016) Krebs, F., Böck, S., Dorfer, M., Widmer, G.: Downbeat Tracking Using Beat Synchronous Features with Recurrent Neural Networks. In: ISMIR, pp. 129–135 (2016)
50.
go back to reference Graves, A.: Supervised sequence labelling. Supervised sequence labelling with recurrent neural networks. Springer, Berlin, Heidelberg, pp. 5–13 (2012)CrossRef Graves, A.: Supervised sequence labelling. Supervised sequence labelling with recurrent neural networks. Springer, Berlin, Heidelberg, pp. 5–13 (2012)CrossRef
51.
go back to reference Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)CrossRef Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)CrossRef
52.
go back to reference Grosche, P., Müller, M.: Tempogram toolbox: Matlab implementations for tempo and pulse analysis of music recordings. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR). Miami, FL, USA (2011) Grosche, P., Müller, M.: Tempogram toolbox: Matlab implementations for tempo and pulse analysis of music recordings. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR). Miami, FL, USA (2011)
53.
go back to reference Dixon, S.: Evaluation of the audio beat tracking system beatroot. J. New Music Res. 36(1), 39–50 (2007)CrossRef Dixon, S.: Evaluation of the audio beat tracking system beatroot. J. New Music Res. 36(1), 39–50 (2007)CrossRef
54.
go back to reference Khadkevich, M., Fillon, T., Richard, G., Omologo, M.: A probabilistic approach to simultaneous extraction of beats and downbeats. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 445–448. IEEE (2012) Khadkevich, M., Fillon, T., Richard, G., Omologo, M.: A probabilistic approach to simultaneous extraction of beats and downbeats. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 445–448. IEEE (2012)
55.
go back to reference Holzapfel, A., Krebs, F., Srinivasamurthy, A.: Tracking the “odd”: Meter inference in a culturally diverse music corpus. In: ISMIR-International Conference on Music Information Retrieval, pp. 425–430. ISMIR (2014) Holzapfel, A., Krebs, F., Srinivasamurthy, A.: Tracking the “odd”: Meter inference in a culturally diverse music corpus. In: ISMIR-International Conference on Music Information Retrieval, pp. 425–430. ISMIR (2014)
56.
go back to reference Malm, W.P.: Music Cultures of the Pacific, the Near East, and Asia. Pearson College Division, London (1996) Malm, W.P.: Music Cultures of the Pacific, the Near East, and Asia. Pearson College Division, London (1996)
57.
go back to reference Bello, J.P., Pickens, J.: A Robust Mid-Level Representation for Harmonic Content in Music Signals. In: ISMIR, vol. 5, pp. 304–311 (2005) Bello, J.P., Pickens, J.: A Robust Mid-Level Representation for Harmonic Content in Music Signals. In: ISMIR, vol. 5, pp. 304–311 (2005)
58.
go back to reference Müller, M., Ewert, S.: Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR) (2011) Müller, M., Ewert, S.: Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR) (2011)
60.
go back to reference Pfordresher, P.Q.: The role of melodic and rhythmic accents in musical structure. Music Percept. Interdiscip. J. 20(4), 431–464 (2003)CrossRef Pfordresher, P.Q.: The role of melodic and rhythmic accents in musical structure. Music Percept. Interdiscip. J. 20(4), 431–464 (2003)CrossRef
61.
go back to reference Hannon, E.E., Snyder, J.S., Eerola, T., Krumhansl, C.L.: The role of melodic and temporal cues in perceiving musical meter. J. Exp. Psychol. Human Percept. Perform. 30(5), 956 (2004)CrossRef Hannon, E.E., Snyder, J.S., Eerola, T., Krumhansl, C.L.: The role of melodic and temporal cues in perceiving musical meter. J. Exp. Psychol. Human Percept. Perform. 30(5), 956 (2004)CrossRef
62.
go back to reference Ellis, R.J., Jones, M.R.: The role of accent salience and joint accent structure in meter perception. J. Exp. Psychol. Human Percept. Perform. 35(1), 264 (2009)CrossRef Ellis, R.J., Jones, M.R.: The role of accent salience and joint accent structure in meter perception. J. Exp. Psychol. Human Percept. Perform. 35(1), 264 (2009)CrossRef
69.
go back to reference Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011) Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
71.
go back to reference Zhou, Y., Chellappa, R.: Computation of optical flow using a neural network. In: IEEE International Conference on Neural Networks, vol. 27, pp. 71–78 (1988) Zhou, Y., Chellappa, R.: Computation of optical flow using a neural network. In: IEEE International Conference on Neural Networks, vol. 27, pp. 71–78 (1988)
72.
go back to reference Sainath, T.N., Kingsbury, B., Mohamed, A.r., Dahl, G.E., Saon, G., Soltau, H., Beran, T., Aravkin, A.Y., Ramabhadran, B.: Improvements to deep convolutional neural networks for LVCSR. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 315–320. IEEE (2013) Sainath, T.N., Kingsbury, B., Mohamed, A.r., Dahl, G.E., Saon, G., Soltau, H., Beran, T., Aravkin, A.Y., Ramabhadran, B.: Improvements to deep convolutional neural networks for LVCSR. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 315–320. IEEE (2013)
73.
go back to reference Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)CrossRef Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)CrossRef
74.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
75.
go back to reference Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation (2014). arXiv:1406.1078 Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation (2014). arXiv:​1406.​1078
76.
go back to reference Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G., Uhle, C., Cano, P.: An experimental comparison of audio tempo induction algorithms. IEEE Trans. Audio Speech Lang. Process. 14(5), 1832–1844 (2006)CrossRef Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G., Uhle, C., Cano, P.: An experimental comparison of audio tempo induction algorithms. IEEE Trans. Audio Speech Lang. Process. 14(5), 1832–1844 (2006)CrossRef
77.
go back to reference Harte, C.: Towards automatic extraction of harmony information from music signals. Ph.D. thesis (2010) Harte, C.: Towards automatic extraction of harmony information from music signals. Ph.D. thesis (2010)
78.
go back to reference Davies, M.E., Degara, N., Plumbley, M.D.: Evaluation methods for musical audio beat tracking algorithms. Queen Mary University of London, Centre for Digital Music, Tech. Rep. C4DM-TR-09-06 (2009) Davies, M.E., Degara, N., Plumbley, M.D.: Evaluation methods for musical audio beat tracking algorithms. Queen Mary University of London, Centre for Digital Music, Tech. Rep. C4DM-TR-09-06 (2009)
79.
go back to reference Srinivasamurthy, A., Serra, X.: A supervised approach to hierarchical metrical cycle tracking from audio music recordings. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5217–5221. IEEE (2014) Srinivasamurthy, A., Serra, X.: A supervised approach to hierarchical metrical cycle tracking from audio music recordings. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5217–5221. IEEE (2014)
80.
go back to reference Srinivasamurthy, A., Holzapfel, A., Cemgil, A.T., Serra, X.: Particle filters for efficient meter tracking with dynamic bayesian networks. In: International Society for Music Information Retrieval Conference (ISMIR) (2015) Srinivasamurthy, A., Holzapfel, A., Cemgil, A.T., Serra, X.: Particle filters for efficient meter tracking with dynamic bayesian networks. In: International Society for Music Information Retrieval Conference (ISMIR) (2015)
81.
go back to reference Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)CrossRef Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)CrossRef
82.
go back to reference Marchand, U., Peeters, G.: Swing ratio estimation. Digital Audio Effects 2015 (Dafx15) (2015) Marchand, U., Peeters, G.: Swing ratio estimation. Digital Audio Effects 2015 (Dafx15) (2015)
83.
go back to reference Hainsworth, S.W.: Techniques for the automated analysis of musical audio. PhD thesis. University of Cambridge (2003) Hainsworth, S.W.: Techniques for the automated analysis of musical audio. PhD thesis. University of Cambridge (2003)
84.
go back to reference Hainsworth, S.W., Macleod, M.D.: Particle filtering applied to musical tempo tracking. EURASIP J. Adv. Signal Process. 2004(15), 927847 (2004)CrossRef Hainsworth, S.W., Macleod, M.D.: Particle filtering applied to musical tempo tracking. EURASIP J. Adv. Signal Process. 2004(15), 927847 (2004)CrossRef
85.
go back to reference Giorgi, B.D., Zanoni, M., Böck, S., Sarti, A.: Multipath beat tracking. J. Audio Eng. Soc. 64(7/8), 493–502 (2016)CrossRef Giorgi, B.D., Zanoni, M., Böck, S., Sarti, A.: Multipath beat tracking. J. Audio Eng. Soc. 64(7/8), 493–502 (2016)CrossRef
86.
go back to reference De Clercq, T., Temperley, D.: A corpus analysis of rock harmony. Popul. Music 30(1), 47–70 (2011)CrossRef De Clercq, T., Temperley, D.: A corpus analysis of rock harmony. Popul. Music 30(1), 47–70 (2011)CrossRef
87.
go back to reference Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC Music Database: Popular, Classical and Jazz Music Databases. In: ISMIR, vol. 2, pp. 287–288 (2002) Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC Music Database: Popular, Classical and Jazz Music Databases. In: ISMIR, vol. 2, pp. 287–288 (2002)
88.
go back to reference Goto, M., et al.: Development of the RWC music database. In: Proceedings of the 18th International Congress on Acoustics (ICA 2004), vol. 1, pp. 553–556 (2004) Goto, M., et al.: Development of the RWC music database. In: Proceedings of the 18th International Congress on Acoustics (ICA 2004), vol. 1, pp. 553–556 (2004)
89.
go back to reference Goto, M.: AIST Annotation for the RWC Music Database. In: ISMIR, pp. 359–360 (2006) Goto, M.: AIST Annotation for the RWC Music Database. In: ISMIR, pp. 359–360 (2006)
90.
go back to reference Livshin, A., Rodex, X.: The importance of cross database evaluation in sound classification. In: ISMIR (2003) Livshin, A., Rodex, X.: The importance of cross database evaluation in sound classification. In: ISMIR (2003)
96.
go back to reference Böck, S., Korzeniowski, F., Schlüter, J., Krebs, F., Widmer, G.: Madmom: A new Python audio and music signal processing library. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 1174–1178. ACM (2016) Böck, S., Korzeniowski, F., Schlüter, J., Krebs, F., Widmer, G.: Madmom: A new Python audio and music signal processing library. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 1174–1178. ACM (2016)
97.
go back to reference Lartillot, O., Toiviainen, P.: A Matlab toolbox for musical feature extraction from audio. In: International conference on digital audio effects, pp. 237–244. Bordeaux, FR (2007) Lartillot, O., Toiviainen, P.: A Matlab toolbox for musical feature extraction from audio. In: International conference on digital audio effects, pp. 237–244. Bordeaux, FR (2007)
98.
go back to reference Latrillot, O., Toiviainen, P.: MIR in Matlab: A toolbox for musical feature extraction. In: Proceedings of the International Conference on Music Information Retrieval (2007) Latrillot, O., Toiviainen, P.: MIR in Matlab: A toolbox for musical feature extraction. In: Proceedings of the International Conference on Music Information Retrieval (2007)
99.
go back to reference Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., Serra, X.: ESSENTIA: an open-source library for sound and music analysis. In: Proceedings of the 21st ACM international conference on Multimedia, pp. 855–858. ACM (2013) Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., Serra, X.: ESSENTIA: an open-source library for sound and music analysis. In: Proceedings of the 21st ACM international conference on Multimedia, pp. 855–858. ACM (2013)
100.
go back to reference McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O.: librosa: Audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, pp. 18–25 (2015) McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O.: librosa: Audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, pp. 18–25 (2015)
101.
go back to reference Krebs, F., Holzapfel, A., Cemgil, A.T., Widmer, G.: Inferring metrical structure in music using particle filters. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(5), 817–827 (2015) Krebs, F., Holzapfel, A., Cemgil, A.T., Widmer, G.: Inferring metrical structure in music using particle filters. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(5), 817–827 (2015)
103.
go back to reference Zhang, H., Yang, Y., Luan, H., Yang, S., Chua, T.S.: Start from scratch: Towards automatically identifying, modeling, and naming visual attributes. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 187–196. ACM (2014) Zhang, H., Yang, Y., Luan, H., Yang, S., Chua, T.S.: Start from scratch: Towards automatically identifying, modeling, and naming visual attributes. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 187–196. ACM (2014)
104.
go back to reference Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968. IEEE (2014) Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968. IEEE (2014)
105.
go back to reference Miao, Y., Gowayyed, M., Metze, F.: EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 167–174. IEEE (2015) Miao, Y., Gowayyed, M., Metze, F.: EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 167–174. IEEE (2015)
107.
go back to reference Zhang, H., Wang, M., Hong, R., Chua, T.S.: Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 781–790. ACM (2016) Zhang, H., Wang, M., Hong, R., Chua, T.S.: Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 781–790. ACM (2016)
Metadata
Title
Deep learning-based automatic downbeat tracking: a brief review
Authors
Bijue Jia
Jiancheng Lv
Dayiheng Liu
Publication date
12-03-2019
Publisher
Springer Berlin Heidelberg
Published in
Multimedia Systems / Issue 6/2019
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-019-00607-x

Other articles of this Issue 6/2019

Multimedia Systems 6/2019 Go to the issue