Skip to main content
Erschienen in: Pattern Analysis and Applications 4/2019

30.03.2019 | Short paper

Hybrid hidden Markov models and artificial neural networks for handwritten music recognition in mensural notation

verfasst von: Jorge Calvo-Zaragoza, Alejandro H. Toselli, Enrique Vidal

Erschienen in: Pattern Analysis and Applications | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we present a hybrid approach using hidden Markov models (HMM) and artificial neural networks to deal with the task of handwritten Music Recognition in mensural notation. Previous works have shown that the task can be addressed with Gaussian density HMMs that can be trained and used in an end-to-end manner, that is, without prior segmentation of the symbols. However, the results achieved using that approach are not sufficiently accurate to be useful in practice. In this work, we hybridize HMMs with deep multilayer perceptrons (MLPs), which lead to remarkable improvements in optical symbol modeling. Moreover, this hybrid architecture maintains important advantages of HMMs such as the ability to properly model variable-length symbol sequences through segmentation-free training, and the simplicity and robustness of combining optical models with N-gram language models, which provide statistical a priori information about regularities in musical symbol concatenation observed in the training data. The results obtained with the proposed hybrid MLP-HMM approach outperform previous works by a wide margin, achieving symbol-level error rates around 26%, as compared with about 40% reported in previous works.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
This term comes from the speech recognition community. Here, “phones” refer to the music symbols.
 
2
For the sake of notation simplicity, for any sequence \(\mathbf {z}\) if \(j<1\), \(P(z_k\mid z_j\ldots \,z_{k-1})\) is assumed to denote \(P(z_k\mid z_1\ldots \,z_{k-1})\). If \(j=1\), it is just \(P(z_1\mid \lambda )\equiv P(z_1)\), where \(\lambda\) is the empty sequence.
 
Literatur
1.
Zurück zum Zitat Bainbridge D, Bell T (2001) The challenge of optical music recognition. Comput Humanit 35(2):95–121CrossRef Bainbridge D, Bell T (2001) The challenge of optical music recognition. Comput Humanit 35(2):95–121CrossRef
2.
Zurück zum Zitat Bertolami R, Bunke H (2008) Hidden markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recognit 41(11):3452–3460CrossRefMATH Bertolami R, Bunke H (2008) Hidden markov model-based ensemble methods for offline handwritten text line recognition. Pattern Recognit 41(11):3452–3460CrossRefMATH
3.
Zurück zum Zitat Bicego M, Pekalska E, Tax DMJ, Duin RPW (2009) Component-based discriminative classification for hidden markov models. Pattern Recognit 42(11):2637–2648CrossRefMATH Bicego M, Pekalska E, Tax DMJ, Duin RPW (2009) Component-based discriminative classification for hidden markov models. Pattern Recognit 42(11):2637–2648CrossRefMATH
4.
Zurück zum Zitat Bosch V, Calvo-Zaragoza J, Toselli AH, Vidal-Ruiz E (2016) Sheet music statistical layout analysis. In: 15th International conference on frontiers in handwriting recognition, ICFHR 2016, Shenzhen, China, 23–26 Oct 2016, pp 313–318 Bosch V, Calvo-Zaragoza J, Toselli AH, Vidal-Ruiz E (2016) Sheet music statistical layout analysis. In: 15th International conference on frontiers in handwriting recognition, ICFHR 2016, Shenzhen, China, 23–26 Oct 2016, pp 313–318
5.
Zurück zum Zitat Bourlard H, Wellekens C (1990) Links between markov models and multilayer perceptrons. IEEE Trans Pattern Anal Mach Intell 12(11):1167–1178CrossRef Bourlard H, Wellekens C (1990) Links between markov models and multilayer perceptrons. IEEE Trans Pattern Anal Mach Intell 12(11):1167–1178CrossRef
6.
Zurück zum Zitat Calvo-Zaragoza J, Barbancho I, Tardón LJ, Barbancho AM (2015) Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation. Pattern Anal Appl 18(4):933–943MathSciNetCrossRef Calvo-Zaragoza J, Barbancho I, Tardón LJ, Barbancho AM (2015) Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation. Pattern Anal Appl 18(4):933–943MathSciNetCrossRef
7.
Zurück zum Zitat Calvo-Zaragoza J, Toselli AH, Vidal E (2016) Early handwritten music recognition with hidden markov models. In: 15th International conference on frontiers in handwriting recognition, ICFHR 2016, Shenzhen, China, 23–26 Oct 2016, pp 319–324 Calvo-Zaragoza J, Toselli AH, Vidal E (2016) Early handwritten music recognition with hidden markov models. In: 15th International conference on frontiers in handwriting recognition, ICFHR 2016, Shenzhen, China, 23–26 Oct 2016, pp 319–324
8.
Zurück zum Zitat Calvo-Zaragoza J, Toselli AH, Vidal E (2017) Handwritten music recognition for mensural notation: formulation, data and baseline results. In: 14th International conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, 13–15 Aug 2017, pp 1081–1086 Calvo-Zaragoza J, Toselli AH, Vidal E (2017) Handwritten music recognition for mensural notation: formulation, data and baseline results. In: 14th International conference on document analysis and recognition, ICDAR 2017, Kyoto, Japan, 13–15 Aug 2017, pp 1081–1086
9.
Zurück zum Zitat Cardoso JS, Capela A, Rebelo A, Guedes C, Pinto J (2009) Staff detection with stable paths. IEEE Trans Pattern Anal Mach Intell 31(6):1134–1139CrossRef Cardoso JS, Capela A, Rebelo A, Guedes C, Pinto J (2009) Staff detection with stable paths. IEEE Trans Pattern Anal Mach Intell 31(6):1134–1139CrossRef
10.
Zurück zum Zitat Espana-Boquera S, Castro-Bleda MJ, Gorbe-Moya J, Zamora-Martinez F (2011) Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Trans Pattern Anal Mach Intell 33(4):767–779CrossRef Espana-Boquera S, Castro-Bleda MJ, Gorbe-Moya J, Zamora-Martinez F (2011) Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Trans Pattern Anal Mach Intell 33(4):767–779CrossRef
11.
Zurück zum Zitat Fujinaga I, Hankinson A, Cumming JE (2014) Introduction to SIMSSA (single interface for music score searching and analysis). In: Proceedings of the 1st international workshop on digital libraries for musicology, DLfM@JCDL 2014, London, UK, 12 Sept 2014, pp 1–3 Fujinaga I, Hankinson A, Cumming JE (2014) Introduction to SIMSSA (single interface for music score searching and analysis). In: Proceedings of the 1st international workshop on digital libraries for musicology, DLfM@JCDL 2014, London, UK, 12 Sept 2014, pp 1–3
12.
Zurück zum Zitat Gallego A, Calvo-Zaragoza J (2017) Staff-line removal with selectional auto-encoders. Expert Syst Appl 89:138–148CrossRef Gallego A, Calvo-Zaragoza J (2017) Staff-line removal with selectional auto-encoders. Expert Syst Appl 89:138–148CrossRef
13.
Zurück zum Zitat Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The MIT Press, CambridgeMATH Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The MIT Press, CambridgeMATH
14.
Zurück zum Zitat Günter S, Bunke H (2004) Hmm-based handwritten word recognition: on the optimization of the number of states, training iterations and gaussian components. Pattern Recognit 37(10):2069–2079CrossRef Günter S, Bunke H (2004) Hmm-based handwritten word recognition: on the optimization of the number of states, training iterations and gaussian components. Pattern Recognit 37(10):2069–2079CrossRef
15.
Zurück zum Zitat Hankinson A, Burgoyne JA, Vigliensoni G, Fujinaga I (2012) Creating a large-scale searchable digital collection from printed music materials. In: Proceedings of the 21st world wide web conference, WWW 2012, Lyon, France, 16–20 April 2012 (Companion Volume), pp 903–908 Hankinson A, Burgoyne JA, Vigliensoni G, Fujinaga I (2012) Creating a large-scale searchable digital collection from printed music materials. In: Proceedings of the 21st world wide web conference, WWW 2012, Lyon, France, 16–20 April 2012 (Companion Volume), pp 903–908
16.
Zurück zum Zitat Jelinek F (1998) Statistical methods for speech recognition. MIT Press, Cambridge Jelinek F (1998) Statistical methods for speech recognition. MIT Press, Cambridge
17.
Zurück zum Zitat Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: International conference on acoustics, speech, and signal processing, ICASSP ’95, Detroit, Michigan, USA, 08–12 May 1995, pp 181–184 Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: International conference on acoustics, speech, and signal processing, ICASSP ’95, Detroit, Michigan, USA, 08–12 May 1995, pp 181–184
18.
Zurück zum Zitat Lee S, Son SJ, Oh J, Kwak N (2016) Handwritten music symbol classification using deep convolutional neural networks. In: International conference on information science and security (ICISS), 2016. IEEE, pp 1–5 Lee S, Son SJ, Oh J, Kwak N (2016) Handwritten music symbol classification using deep convolutional neural networks. In: International conference on information science and security (ICISS), 2016. IEEE, pp 1–5
19.
Zurück zum Zitat Ortmanns S, Ney H, Aubert X (1997) A word graph algorithm for large vocabulary continuous speech recognition. Comput Speech Lang 11(1):43–72CrossRef Ortmanns S, Ney H, Aubert X (1997) A word graph algorithm for large vocabulary continuous speech recognition. Comput Speech Lang 11(1):43–72CrossRef
20.
Zurück zum Zitat OShaughnessy D (2008) Automatic speech recognition: History, methods and challenges. Pattern Recognit 41(10):2965–2979CrossRefMATH OShaughnessy D (2008) Automatic speech recognition: History, methods and challenges. Pattern Recognit 41(10):2965–2979CrossRefMATH
21.
Zurück zum Zitat Povey D (2003) Discriminative training for large vocabulary speech recognition. Ph.D. thesis, University of Cambridge Povey D (2003) Discriminative training for large vocabulary speech recognition. Ph.D. thesis, University of Cambridge
22.
Zurück zum Zitat Pugin L (2006) Optical music recognition of early typographic prints using hidden markov models. In: Proceedings of the ISMIR 2006, 7th international conference on music information retrieval, Victoria, Canada, Oct 8–12, pp 53–56 Pugin L (2006) Optical music recognition of early typographic prints using hidden markov models. In: Proceedings of the ISMIR 2006, 7th international conference on music information retrieval, Victoria, Canada, Oct 8–12, pp 53–56
23.
Zurück zum Zitat Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall, Upper Saddle River Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall, Upper Saddle River
24.
Zurück zum Zitat Ramirez C, Ohya J (2014) Automatic recognition of square notation symbols in western plainchant manuscripts. J New Music Res 43(4):390–399CrossRef Ramirez C, Ohya J (2014) Automatic recognition of square notation symbols in western plainchant manuscripts. J New Music Res 43(4):390–399CrossRef
25.
Zurück zum Zitat Rebelo A, Fujinaga I, Paszkiewicz F, Marçal ARS, Guedes C, Cardoso JS (2012) Optical music recognition: state-of-the-art and open issues. Int J Multimed Inf Retrieval 1(3):173–190CrossRef Rebelo A, Fujinaga I, Paszkiewicz F, Marçal ARS, Guedes C, Cardoso JS (2012) Optical music recognition: state-of-the-art and open issues. Int J Multimed Inf Retrieval 1(3):173–190CrossRef
26.
Zurück zum Zitat Toselli AH, Juan A, Vidal E (2004) Spontaneous handwriting recognition and classification. In: 17th International conference on pattern recognition, ICPR 2004, Cambridge, UK, 23–26 August 2004, pp 433–436 Toselli AH, Juan A, Vidal E (2004) Spontaneous handwriting recognition and classification. In: 17th International conference on pattern recognition, ICPR 2004, Cambridge, UK, 23–26 August 2004, pp 433–436
27.
Zurück zum Zitat Toselli AH, Romero V, Pastor M, Vidal E (2010) Multimodal interactive transcription of text images. Pattern Recognit 43(5):1814–1825CrossRefMATH Toselli AH, Romero V, Pastor M, Vidal E (2010) Multimodal interactive transcription of text images. Pattern Recognit 43(5):1814–1825CrossRefMATH
28.
Zurück zum Zitat Toselli AH, Romero V, Vidal E (2011) Alignment between text images and their transcripts for handwritten documents. Language Technology for Cultural Heritage, pp 23–37 Toselli AH, Romero V, Vidal E (2011) Alignment between text images and their transcripts for handwritten documents. Language Technology for Cultural Heritage, pp 23–37
29.
Zurück zum Zitat Vidal E, Thollard F, De La Higuera C, Casacuberta F, Carrasco RC (2005) Probabilistic finite-state machines-part ii. IEEE Trans Pattern Anal Mach Intell 27(7):1026–1039CrossRef Vidal E, Thollard F, De La Higuera C, Casacuberta F, Carrasco RC (2005) Probabilistic finite-state machines-part ii. IEEE Trans Pattern Anal Mach Intell 27(7):1026–1039CrossRef
30.
Zurück zum Zitat Yang W, Tao J, Ye Z (2016) Continuous sign language recognition using level building based on fast hidden markov model. Pattern Recognit Lett 78:28–35CrossRef Yang W, Tao J, Ye Z (2016) Continuous sign language recognition using level building based on fast hidden markov model. Pattern Recognit Lett 78:28–35CrossRef
31.
Zurück zum Zitat Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, et al (2015) The HTK book, vol 3.5. Entropic Cambridge Research Laboratory, Cambridge Young S, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, et al (2015) The HTK book, vol 3.5. Entropic Cambridge Research Laboratory, Cambridge
Metadaten
Titel
Hybrid hidden Markov models and artificial neural networks for handwritten music recognition in mensural notation
verfasst von
Jorge Calvo-Zaragoza
Alejandro H. Toselli
Enrique Vidal
Publikationsdatum
30.03.2019
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 4/2019
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-019-00807-1

Weitere Artikel der Ausgabe 4/2019

Pattern Analysis and Applications 4/2019 Zur Ausgabe

Industrial and commercial application

Neuro-probabilistic model for object tracking

Premium Partner