Top

Published in:

2021 | OriginalPaper | Chapter

Ensemble Size Classification in Colombian Andean String Music Recordings

Authors : Sascha Grollmisch, Estefanía Cano, Fernando Mora Ángel, Gustavo López Gil

Published in: Perception, Representations, Image, Sound, Music

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Reliable methods for automatic retrieval of semantic information from large digital music archives can play a critical role in musicological research and musical heritage preservation. With the advancement of machine learning techniques, new possibilities for information retrieval in scenarios where ground-truth data is scarce are now available. This work investigates the problem of ensemble size classification in music recordings. For this purpose, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts. Different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized. The best deep neural network architecture achieved 81.5% file-wise mean class accuracy using only feed forward layers with linear magnitude spectrograms as input representation. This model will serve as a baseline for future research on ensemble size classification.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Towards Deep Learning Strategies for Transcribing Electroacoustic Music

next chapter Tapping Along to the Difficult Ones: Leveraging User-Input for Beat Tracking in Highly Expressive Musical Content

https://acmus-mir.github.io/.

Detailed results: http://dcase.community/challenge2019/task-urban-sound-tagging-results.

Dataset published at: https://zenodo.org/record/3268961.

Tensorflow (1.12): www.tensorflow.org.

Implementation from https://github.com/fmfn/BayesianOptimization.

Implementation from librosa (0.7.2): https://librosa.github.io/.

Implementation from scikit-learn (0.22.2): https://scikit-learn.org/.

For random brightness, random rotate and grid distortion implementations were taken from [5]. For random erase, mixup and SpecAugment, we used the implementation provided in the corresponding publications: [23, 24], and [16].

Adapa, S.: Urban sound tagging using convolutional neural networks. Technical report, DCASE2019 Challenge (2019)

Andrei, V., Cucu, H., Buzo, A., Burileanu, C.: Counting competing speakers in a timeframe - human versus computer. In: Interspeech Conference. ISCA, Dresden, Germany (2015)

Bittner, R.M., Mcfee, B., Salamon, J., Li, P., Bello, J.P.: Deep salience representations for F0 estimation in polyphonic music. In: 18th International Society for Music Information Retrieval Conference. Suzhou, China (2017)

Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: 13th International Society for Music Information Retrieval Conference, Porto, Portugal, pp. 559–564 (2012)

Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)CrossRef

Cano, E., et al.: ACMUS - advancing computational musicology: semi-supervised and unsupervised segmentation and annotation of musical collections. In: Late-breaking-demo of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands (2019)

Diment, A., Heittola, T., Virtanen, T.: Semi-supervised learning for musical instrument recognition. In: 21st European Signal Processing Conference (EUSIPCO). IEEE, Marrakech, Morocco (2013)

Essid, S., Richard, G., David, B.: Efficient musical instrument recognition on solo performance music using basic features. In: 25th International AES Conference, London, UK (2004)

Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS). Society for Artificial Intelligence and Statistics, Sardinia, Italy (2010)

10.

Gómez, J.S., Abeßer, J., Cano, E.: Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In: 19th International Society for Music Information Retrieval Conference, Paris, France (2018)

11.

Grasis, M., Abeßer, J., Dittmar, C., Lukashevich, H.: A multiple-expert framework for instrument recognition. In: International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, pp. 619–634 (2013)

12.

Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 208–221 (2017)CrossRef

13.

Kareer, S., Basu, S.: Musical polyphony estimation. In: Audio Engineering Society Convention 144, Milan, Italy (2018)

14.

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), San Diego, USA (2015)

15.

Nadar, C.R., Abeßer, J., Grollmisch, S.: Towards CNN-based acoustic modeling of seventh chords for automatic chord recognition. In: International Conference on Sound and Music Computing, Málaga, Spain (2019)

16.

Park, D.S., et al.: SpecAugment: a simple augmentation method for automatic speech recognition. In: INTERSPEECH, Graz, Austria (2019)

17.

Prétet, L., Hennequin, R., Royo-Letelier, J., Vaglio, A.: Singing voice separation: a study on training data. IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Brighton, UK, pp. 506–510 (2019)

18.

Sayoud, H., Boumediene, T.H., Ouamour, S., Boumediene, T.H.: Proposal of a new confidence parameter estimating the number of speakers - an experimental investigation. J. Inf. Hiding Multimedia Signal Process. 1(2), 101–109 (2010)

19.

Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian Optimization of Machine Learning Algorithms. In: 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, pp. 2951–2959 (2012)

20.

Stöter, F.R., Chakrabarty, S., Edler, B., Habets, E.A.P.: Classification vs. regression in supervised learning for single channel speaker count estimation. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Calgary, Alberta, Canada , pp. 436–440. IEEE(2018)

21.

Wang, Y., Getreuer, P., Hughes, T., Lyon, R.F., Saurous, R.A.: Trainable frontend for robust and far-field keyword spotting. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), New Orleans, LA, USA, pp. 5670–5674. IEEE (2017)

22.

Xu, C., Li, S., Liu, G., Zhang, Y.: Crowd ++ : Unsupervised speaker count with smartphones. In: ACM International Joint Conference on Pervasive and Ubiquitous Computing, Zurich, Switzerland, pp. 43–52. ACM (2013)

23.

Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR), Vancouver, BC, Canada (2018)

24.

Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA (2020)

Title: Ensemble Size Classification in Colombian Andean String Music Recordings
Authors: Sascha Grollmisch
Estefanía Cano
Fernando Mora Ángel
Gustavo López Gil
Publisher: Springer International Publishing
Book: Perception, Representations, Image, Sound, Music
Print ISBN: 978-3-030-70209-0

Electronic ISBN: 978-3-030-70210-6

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-70210-6_4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"