Skip to main content
Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) 4/2016

01.12.2016 | Original Paper

Document segmentation and classification into musical scores and text

verfasst von: Fabrizio Pedersoli, George Tzanetakis

Erschienen in: International Journal on Document Analysis and Recognition (IJDAR) | Ausgabe 4/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that “over”-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at https://​github.​com/​fpeder/​mscr under the FreeBSD license to support reproducibility.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Width \(\times \) height.
 
2
Conversely, the real dataset meets the width requirement in the majority of the cases.
 
Literatur
2.
Zurück zum Zitat Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008) Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)
3.
Zurück zum Zitat Breuel, T.M.: The ocropus open source ocr system. In: Electronic Imaging 2008, pp. 68,150F–68,150F. International Society for Optics and Photonics (2008) Breuel, T.M.: The ocropus open source ocr system. In: Electronic Imaging 2008, pp. 68,150F–68,150F. International Society for Optics and Photonics (2008)
4.
Zurück zum Zitat Bukhari, S.S., Al Azawi, M.I.A., Shafait, F., Breuel, T.M.: Document Image Segmentation Using Discriminative Learning over Connected Components. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 183–190 (2010). doi:10.1145/1815330.1815354 Bukhari, S.S., Al Azawi, M.I.A., Shafait, F., Breuel, T.M.: Document Image Segmentation Using Discriminative Learning over Connected Components. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 183–190 (2010). doi:10.​1145/​1815330.​1815354
6.
Zurück zum Zitat Cardoso, J., Capela, A., Rebelo, A., Guedes, C.: A connected path approach for staff detection on a music score. In: Proceedings of International Conference on Image Processing. ICIP, pp. 1005–1008 (2008). doi:10.1109/ICIP.2008.4711927 Cardoso, J., Capela, A., Rebelo, A., Guedes, C.: A connected path approach for staff detection on a music score. In: Proceedings of International Conference on Image Processing. ICIP, pp. 1005–1008 (2008). doi:10.​1109/​ICIP.​2008.​4711927
7.
Zurück zum Zitat Chaudhury, S., Jindal, M., Roy, S.D.: Model-guided segmentation and layout labelling of document images using a hierarchical conditional random field. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) International Conference on Pattern Recognition and Machine Intelligence, pp. 375–380. Springer, Berlin, Heidelberg (2009) Chaudhury, S., Jindal, M., Roy, S.D.: Model-guided segmentation and layout labelling of document images using a hierarchical conditional random field. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds.) International Conference on Pattern Recognition and Machine Intelligence, pp. 375–380. Springer, Berlin, Heidelberg (2009)
8.
Zurück zum Zitat Cote, M., Albu, A.B.: Texture sparseness for pixel classification of business document images. Int. J. Doc. Anal. Recognit. (IJDAR) 17(3), 257–273 (2014)CrossRef Cote, M., Albu, A.B.: Texture sparseness for pixel classification of business document images. Int. J. Doc. Anal. Recognit. (IJDAR) 17(3), 257–273 (2014)CrossRef
10.
Zurück zum Zitat d’Andecy, V., Camillerapp, J., Leplumey, I.: Kalman filtering for segment detection: application to music scores analysis. In: Proceedings of the 12th International Conference on Pattern Recognition. IAPR, vol. 1, pp. 301–305 (1994). doi:10.1109/ICPR.1994.576283 d’Andecy, V., Camillerapp, J., Leplumey, I.: Kalman filtering for segment detection: application to music scores analysis. In: Proceedings of the 12th International Conference on Pattern Recognition. IAPR, vol. 1, pp. 301–305 (1994). doi:10.​1109/​ICPR.​1994.​576283
11.
Zurück zum Zitat Droettboom, M., MacMillan, K., Fujinaga, I.: The Gamera framework for building custom recognition systems. In: Symposium on Document Image Understanding Technologies, pp. 275–286. Citeseer (2003) Droettboom, M., MacMillan, K., Fujinaga, I.: The Gamera framework for building custom recognition systems. In: Symposium on Document Image Understanding Technologies, pp. 275–286. Citeseer (2003)
12.
13.
Zurück zum Zitat Hori, T., Wada, S., Tai, H., Kung, S.Y.: Automatic music score recognition/play system based on decision based neural network. In: IEEE 3rd Workshop on Multimedia Signal Processing, 1999, pp. 183–184 (1999) Hori, T., Wada, S., Tai, H., Kung, S.Y.: Automatic music score recognition/play system based on decision based neural network. In: IEEE 3rd Workshop on Multimedia Signal Processing, 1999, pp. 183–184 (1999)
15.
Zurück zum Zitat Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRef
18.
Zurück zum Zitat Miyao, H., Okamoto, M.: Stave extraction for printed music scores using dp matching. JACIII 8(2), 208–215 (2004)CrossRef Miyao, H., Okamoto, M.: Stave extraction for printed music scores using dp matching. JACIII 8(2), 208–215 (2004)CrossRef
19.
Zurück zum Zitat Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975) Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11(285–296), 23–27 (1975)
21.
22.
Zurück zum Zitat dos Santos Cardoso, J., Capela, A., Rebelo, A., Guedes, C., Pinto da Costa, J.: Staff detection with stable paths. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1134–1139 (2009). doi:10.1109/TPAMI.2009.34 CrossRef dos Santos Cardoso, J., Capela, A., Rebelo, A., Guedes, C., Pinto da Costa, J.: Staff detection with stable paths. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1134–1139 (2009). doi:10.​1109/​TPAMI.​2009.​34 CrossRef
24.
Zurück zum Zitat Sicard, E.: An efficient method for the recognition of printed music. In: Proceedings of ICPR, pp. 573–573 (1992) Sicard, E.: An efficient method for the recognition of printed music. In: Proceedings of ICPR, pp. 573–573 (1992)
25.
Zurück zum Zitat Su, B., Lu, S., Pal, U., Tan, C.: An effective staff detection and removal technique for musical documents. In: IAPR International Workshop on Document Analysis Systems, pp. 160–164 (2012). doi:10.1109/DAS.2012.16 Su, B., Lu, S., Pal, U., Tan, C.: An effective staff detection and removal technique for musical documents. In: IAPR International Workshop on Document Analysis Systems, pp. 160–164 (2012). doi:10.​1109/​DAS.​2012.​16
26.
Zurück zum Zitat Zirari, F., Ennaji, A., Nicolas, S., Mammass, D.: A document image segmentation system using analysis of connected components. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 753–757 (2013). doi:10.1109/ICDAR.2013.154 Zirari, F., Ennaji, A., Nicolas, S., Mammass, D.: A document image segmentation system using analysis of connected components. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 753–757 (2013). doi:10.​1109/​ICDAR.​2013.​154
Metadaten
Titel
Document segmentation and classification into musical scores and text
verfasst von
Fabrizio Pedersoli
George Tzanetakis
Publikationsdatum
01.12.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal on Document Analysis and Recognition (IJDAR) / Ausgabe 4/2016
Print ISSN: 1433-2833
Elektronische ISSN: 1433-2825
DOI
https://doi.org/10.1007/s10032-016-0271-5

Weitere Artikel der Ausgabe 4/2016

International Journal on Document Analysis and Recognition (IJDAR) 4/2016 Zur Ausgabe

Premium Partner