Skip to main content
Top

2021 | OriginalPaper | Chapter

Ensemble Size Classification in Colombian Andean String Music Recordings

Authors : Sascha Grollmisch, Estefanía Cano, Fernando Mora Ángel, Gustavo López Gil

Published in: Perception, Representations, Image, Sound, Music

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Reliable methods for automatic retrieval of semantic information from large digital music archives can play a critical role in musicological research and musical heritage preservation. With the advancement of machine learning techniques, new possibilities for information retrieval in scenarios where ground-truth data is scarce are now available. This work investigates the problem of ensemble size classification in music recordings. For this purpose, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts. Different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized. The best deep neural network architecture achieved 81.5% file-wise mean class accuracy using only feed forward layers with linear magnitude spectrograms as input representation. This model will serve as a baseline for future research on ensemble size classification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
4
Tensorflow (1.12): www.​tensorflow.​org.
 
6
Implementation from librosa (0.7.2): https://​librosa.​github.​io/​.
 
7
Implementation from scikit-learn (0.22.2): https://​scikit-learn.​org/​.
 
8
For random brightness, random rotate and grid distortion implementations were taken from [5]. For random erase, mixup and SpecAugment, we used the implementation provided in the corresponding publications: [23, 24], and [16].
 
Literature
1.
go back to reference Adapa, S.: Urban sound tagging using convolutional neural networks. Technical report, DCASE2019 Challenge (2019) Adapa, S.: Urban sound tagging using convolutional neural networks. Technical report, DCASE2019 Challenge (2019)
2.
go back to reference Andrei, V., Cucu, H., Buzo, A., Burileanu, C.: Counting competing speakers in a timeframe - human versus computer. In: Interspeech Conference. ISCA, Dresden, Germany (2015) Andrei, V., Cucu, H., Buzo, A., Burileanu, C.: Counting competing speakers in a timeframe - human versus computer. In: Interspeech Conference. ISCA, Dresden, Germany (2015)
3.
go back to reference Bittner, R.M., Mcfee, B., Salamon, J., Li, P., Bello, J.P.: Deep salience representations for F0 estimation in polyphonic music. In: 18th International Society for Music Information Retrieval Conference. Suzhou, China (2017) Bittner, R.M., Mcfee, B., Salamon, J., Li, P., Bello, J.P.: Deep salience representations for F0 estimation in polyphonic music. In: 18th International Society for Music Information Retrieval Conference. Suzhou, China (2017)
4.
go back to reference Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: 13th International Society for Music Information Retrieval Conference, Porto, Portugal, pp. 559–564 (2012) Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: 13th International Society for Music Information Retrieval Conference, Porto, Portugal, pp. 559–564 (2012)
5.
go back to reference Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)CrossRef Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)CrossRef
6.
go back to reference Cano, E., et al.: ACMUS - advancing computational musicology: semi-supervised and unsupervised segmentation and annotation of musical collections. In: Late-breaking-demo of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands (2019) Cano, E., et al.: ACMUS - advancing computational musicology: semi-supervised and unsupervised segmentation and annotation of musical collections. In: Late-breaking-demo of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands (2019)
7.
go back to reference Diment, A., Heittola, T., Virtanen, T.: Semi-supervised learning for musical instrument recognition. In: 21st European Signal Processing Conference (EUSIPCO). IEEE, Marrakech, Morocco (2013) Diment, A., Heittola, T., Virtanen, T.: Semi-supervised learning for musical instrument recognition. In: 21st European Signal Processing Conference (EUSIPCO). IEEE, Marrakech, Morocco (2013)
8.
go back to reference Essid, S., Richard, G., David, B.: Efficient musical instrument recognition on solo performance music using basic features. In: 25th International AES Conference, London, UK (2004) Essid, S., Richard, G., David, B.: Efficient musical instrument recognition on solo performance music using basic features. In: 25th International AES Conference, London, UK (2004)
9.
go back to reference Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS). Society for Artificial Intelligence and Statistics, Sardinia, Italy (2010) Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS). Society for Artificial Intelligence and Statistics, Sardinia, Italy (2010)
10.
go back to reference Gómez, J.S., Abeßer, J., Cano, E.: Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In: 19th International Society for Music Information Retrieval Conference, Paris, France (2018) Gómez, J.S., Abeßer, J., Cano, E.: Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In: 19th International Society for Music Information Retrieval Conference, Paris, France (2018)
11.
go back to reference Grasis, M., Abeßer, J., Dittmar, C., Lukashevich, H.: A multiple-expert framework for instrument recognition. In: International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, pp. 619–634 (2013) Grasis, M., Abeßer, J., Dittmar, C., Lukashevich, H.: A multiple-expert framework for instrument recognition. In: International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, pp. 619–634 (2013)
12.
go back to reference Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 208–221 (2017)CrossRef Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 208–221 (2017)CrossRef
13.
go back to reference Kareer, S., Basu, S.: Musical polyphony estimation. In: Audio Engineering Society Convention 144, Milan, Italy (2018) Kareer, S., Basu, S.: Musical polyphony estimation. In: Audio Engineering Society Convention 144, Milan, Italy (2018)
14.
go back to reference Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), San Diego, USA (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), San Diego, USA (2015)
15.
go back to reference Nadar, C.R., Abeßer, J., Grollmisch, S.: Towards CNN-based acoustic modeling of seventh chords for automatic chord recognition. In: International Conference on Sound and Music Computing, Málaga, Spain (2019) Nadar, C.R., Abeßer, J., Grollmisch, S.: Towards CNN-based acoustic modeling of seventh chords for automatic chord recognition. In: International Conference on Sound and Music Computing, Málaga, Spain (2019)
16.
go back to reference Park, D.S., et al.: SpecAugment: a simple augmentation method for automatic speech recognition. In: INTERSPEECH, Graz, Austria (2019) Park, D.S., et al.: SpecAugment: a simple augmentation method for automatic speech recognition. In: INTERSPEECH, Graz, Austria (2019)
17.
go back to reference Prétet, L., Hennequin, R., Royo-Letelier, J., Vaglio, A.: Singing voice separation: a study on training data. IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Brighton, UK, pp. 506–510 (2019) Prétet, L., Hennequin, R., Royo-Letelier, J., Vaglio, A.: Singing voice separation: a study on training data. IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Brighton, UK, pp. 506–510 (2019)
18.
go back to reference Sayoud, H., Boumediene, T.H., Ouamour, S., Boumediene, T.H.: Proposal of a new confidence parameter estimating the number of speakers - an experimental investigation. J. Inf. Hiding Multimedia Signal Process. 1(2), 101–109 (2010) Sayoud, H., Boumediene, T.H., Ouamour, S., Boumediene, T.H.: Proposal of a new confidence parameter estimating the number of speakers - an experimental investigation. J. Inf. Hiding Multimedia Signal Process. 1(2), 101–109 (2010)
19.
go back to reference Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian Optimization of Machine Learning Algorithms. In: 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, pp. 2951–2959 (2012) Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian Optimization of Machine Learning Algorithms. In: 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, pp. 2951–2959 (2012)
20.
go back to reference Stöter, F.R., Chakrabarty, S., Edler, B., Habets, E.A.P.: Classification vs. regression in supervised learning for single channel speaker count estimation. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Calgary, Alberta, Canada , pp. 436–440. IEEE(2018) Stöter, F.R., Chakrabarty, S., Edler, B., Habets, E.A.P.: Classification vs. regression in supervised learning for single channel speaker count estimation. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Calgary, Alberta, Canada , pp. 436–440. IEEE(2018)
21.
go back to reference Wang, Y., Getreuer, P., Hughes, T., Lyon, R.F., Saurous, R.A.: Trainable frontend for robust and far-field keyword spotting. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), New Orleans, LA, USA, pp. 5670–5674. IEEE (2017) Wang, Y., Getreuer, P., Hughes, T., Lyon, R.F., Saurous, R.A.: Trainable frontend for robust and far-field keyword spotting. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), New Orleans, LA, USA, pp. 5670–5674. IEEE (2017)
22.
go back to reference Xu, C., Li, S., Liu, G., Zhang, Y.: Crowd ++ : Unsupervised speaker count with smartphones. In: ACM International Joint Conference on Pervasive and Ubiquitous Computing, Zurich, Switzerland, pp. 43–52. ACM (2013) Xu, C., Li, S., Liu, G., Zhang, Y.: Crowd ++ : Unsupervised speaker count with smartphones. In: ACM International Joint Conference on Pervasive and Ubiquitous Computing, Zurich, Switzerland, pp. 43–52. ACM (2013)
23.
go back to reference Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR), Vancouver, BC, Canada (2018) Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR), Vancouver, BC, Canada (2018)
24.
go back to reference Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA (2020) Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA (2020)
Metadata
Title
Ensemble Size Classification in Colombian Andean String Music Recordings
Authors
Sascha Grollmisch
Estefanía Cano
Fernando Mora Ángel
Gustavo López Gil
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-70210-6_4