nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Ensemble Size Classification in Colombian Andean String Music Recordings

verfasst von : Sascha Grollmisch, Estefanía Cano, Fernando Mora Ángel, Gustavo López Gil

Erschienen in: Perception, Representations, Image, Sound, Music

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Reliable methods for automatic retrieval of semantic information from large digital music archives can play a critical role in musicological research and musical heritage preservation. With the advancement of machine learning techniques, new possibilities for information retrieval in scenarios where ground-truth data is scarce are now available. This work investigates the problem of ensemble size classification in music recordings. For this purpose, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts. Different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized. The best deep neural network architecture achieved 81.5% file-wise mean class accuracy using only feed forward layers with linear magnitude spectrograms as input representation. This model will serve as a baseline for future research on ensemble size classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Towards Deep Learning Strategies for Transcribing Electroacoustic Music

Nächstes Kapitel Tapping Along to the Difficult Ones: Leveraging User-Input for Beat Tracking in Highly Expressive Musical Content

https://acmus-mir.github.io/.

Detailed results: http://dcase.community/challenge2019/task-urban-sound-tagging-results.

Dataset published at: https://zenodo.org/record/3268961.

Tensorflow (1.12): www.tensorflow.org.

Implementation from https://github.com/fmfn/BayesianOptimization.

Implementation from librosa (0.7.2): https://librosa.github.io/.

Implementation from scikit-learn (0.22.2): https://scikit-learn.org/.

For random brightness, random rotate and grid distortion implementations were taken from [5]. For random erase, mixup and SpecAugment, we used the implementation provided in the corresponding publications: [23, 24], and [16].

Adapa, S.: Urban sound tagging using convolutional neural networks. Technical report, DCASE2019 Challenge (2019)

Andrei, V., Cucu, H., Buzo, A., Burileanu, C.: Counting competing speakers in a timeframe - human versus computer. In: Interspeech Conference. ISCA, Dresden, Germany (2015)

Bittner, R.M., Mcfee, B., Salamon, J., Li, P., Bello, J.P.: Deep salience representations for F0 estimation in polyphonic music. In: 18th International Society for Music Information Retrieval Conference. Suzhou, China (2017)

Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: 13th International Society for Music Information Retrieval Conference, Porto, Portugal, pp. 559–564 (2012)

Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)CrossRef

Cano, E., et al.: ACMUS - advancing computational musicology: semi-supervised and unsupervised segmentation and annotation of musical collections. In: Late-breaking-demo of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands (2019)

Diment, A., Heittola, T., Virtanen, T.: Semi-supervised learning for musical instrument recognition. In: 21st European Signal Processing Conference (EUSIPCO). IEEE, Marrakech, Morocco (2013)

Essid, S., Richard, G., David, B.: Efficient musical instrument recognition on solo performance music using basic features. In: 25th International AES Conference, London, UK (2004)

Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS). Society for Artificial Intelligence and Statistics, Sardinia, Italy (2010)

10.

Gómez, J.S., Abeßer, J., Cano, E.: Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In: 19th International Society for Music Information Retrieval Conference, Paris, France (2018)

11.

Grasis, M., Abeßer, J., Dittmar, C., Lukashevich, H.: A multiple-expert framework for instrument recognition. In: International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, pp. 619–634 (2013)

12.

Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 208–221 (2017)CrossRef

13.

Kareer, S., Basu, S.: Musical polyphony estimation. In: Audio Engineering Society Convention 144, Milan, Italy (2018)

14.

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), San Diego, USA (2015)

15.

Nadar, C.R., Abeßer, J., Grollmisch, S.: Towards CNN-based acoustic modeling of seventh chords for automatic chord recognition. In: International Conference on Sound and Music Computing, Málaga, Spain (2019)

16.

Park, D.S., et al.: SpecAugment: a simple augmentation method for automatic speech recognition. In: INTERSPEECH, Graz, Austria (2019)

17.

Prétet, L., Hennequin, R., Royo-Letelier, J., Vaglio, A.: Singing voice separation: a study on training data. IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Brighton, UK, pp. 506–510 (2019)

18.

Sayoud, H., Boumediene, T.H., Ouamour, S., Boumediene, T.H.: Proposal of a new confidence parameter estimating the number of speakers - an experimental investigation. J. Inf. Hiding Multimedia Signal Process. 1(2), 101–109 (2010)

19.

Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian Optimization of Machine Learning Algorithms. In: 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, pp. 2951–2959 (2012)

20.

Stöter, F.R., Chakrabarty, S., Edler, B., Habets, E.A.P.: Classification vs. regression in supervised learning for single channel speaker count estimation. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Calgary, Alberta, Canada , pp. 436–440. IEEE(2018)

21.

Wang, Y., Getreuer, P., Hughes, T., Lyon, R.F., Saurous, R.A.: Trainable frontend for robust and far-field keyword spotting. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), New Orleans, LA, USA, pp. 5670–5674. IEEE (2017)

22.

Xu, C., Li, S., Liu, G., Zhang, Y.: Crowd ++ : Unsupervised speaker count with smartphones. In: ACM International Joint Conference on Pervasive and Ubiquitous Computing, Zurich, Switzerland, pp. 43–52. ACM (2013)

23.

Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR), Vancouver, BC, Canada (2018)

24.

Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA (2020)

Titel: Ensemble Size Classification in Colombian Andean String Music Recordings
verfasst von: Sascha Grollmisch
Estefanía Cano
Fernando Mora Ángel
Gustavo López Gil
Verlag: Springer International Publishing
Buch: Perception, Representations, Image, Sound, Music
Print ISBN: 978-3-030-70209-0

Electronic ISBN: 978-3-030-70210-6

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-3-030-70210-6_4

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence_ieS/© Springer Fachmedien Wiesbaden GmbH, Search Icon, Banner Hanser, Dr. Alexandru Oproiescu/© Dr. Alexandru Oproiescu, Julian Erhard/© Packex GmbH, Cloud Netzwerk Open Banking/© vege / Fotolia, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.