Skip to main content

2021 | OriginalPaper | Buchkapitel

Ensemble Size Classification in Colombian Andean String Music Recordings

verfasst von : Sascha Grollmisch, Estefanía Cano, Fernando Mora Ángel, Gustavo López Gil

Erschienen in: Perception, Representations, Image, Sound, Music

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Reliable methods for automatic retrieval of semantic information from large digital music archives can play a critical role in musicological research and musical heritage preservation. With the advancement of machine learning techniques, new possibilities for information retrieval in scenarios where ground-truth data is scarce are now available. This work investigates the problem of ensemble size classification in music recordings. For this purpose, a new dataset of Colombian Andean string music was compiled and annotated by musicological experts. Different neural network architectures, as well as pre-processing steps and data augmentation techniques were systematically evaluated and optimized. The best deep neural network architecture achieved 81.5% file-wise mean class accuracy using only feed forward layers with linear magnitude spectrograms as input representation. This model will serve as a baseline for future research on ensemble size classification.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
4
Tensorflow (1.12): www.​tensorflow.​org.
 
6
Implementation from librosa (0.7.2): https://​librosa.​github.​io/​.
 
7
Implementation from scikit-learn (0.22.2): https://​scikit-learn.​org/​.
 
8
For random brightness, random rotate and grid distortion implementations were taken from [5]. For random erase, mixup and SpecAugment, we used the implementation provided in the corresponding publications: [23, 24], and [16].
 
Literatur
1.
Zurück zum Zitat Adapa, S.: Urban sound tagging using convolutional neural networks. Technical report, DCASE2019 Challenge (2019) Adapa, S.: Urban sound tagging using convolutional neural networks. Technical report, DCASE2019 Challenge (2019)
2.
Zurück zum Zitat Andrei, V., Cucu, H., Buzo, A., Burileanu, C.: Counting competing speakers in a timeframe - human versus computer. In: Interspeech Conference. ISCA, Dresden, Germany (2015) Andrei, V., Cucu, H., Buzo, A., Burileanu, C.: Counting competing speakers in a timeframe - human versus computer. In: Interspeech Conference. ISCA, Dresden, Germany (2015)
3.
Zurück zum Zitat Bittner, R.M., Mcfee, B., Salamon, J., Li, P., Bello, J.P.: Deep salience representations for F0 estimation in polyphonic music. In: 18th International Society for Music Information Retrieval Conference. Suzhou, China (2017) Bittner, R.M., Mcfee, B., Salamon, J., Li, P., Bello, J.P.: Deep salience representations for F0 estimation in polyphonic music. In: 18th International Society for Music Information Retrieval Conference. Suzhou, China (2017)
4.
Zurück zum Zitat Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: 13th International Society for Music Information Retrieval Conference, Porto, Portugal, pp. 559–564 (2012) Bosch, J.J., Janer, J., Fuhrmann, F., Herrera, P.: A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals. In: 13th International Society for Music Information Retrieval Conference, Porto, Portugal, pp. 559–564 (2012)
5.
Zurück zum Zitat Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)CrossRef Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)CrossRef
6.
Zurück zum Zitat Cano, E., et al.: ACMUS - advancing computational musicology: semi-supervised and unsupervised segmentation and annotation of musical collections. In: Late-breaking-demo of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands (2019) Cano, E., et al.: ACMUS - advancing computational musicology: semi-supervised and unsupervised segmentation and annotation of musical collections. In: Late-breaking-demo of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands (2019)
7.
Zurück zum Zitat Diment, A., Heittola, T., Virtanen, T.: Semi-supervised learning for musical instrument recognition. In: 21st European Signal Processing Conference (EUSIPCO). IEEE, Marrakech, Morocco (2013) Diment, A., Heittola, T., Virtanen, T.: Semi-supervised learning for musical instrument recognition. In: 21st European Signal Processing Conference (EUSIPCO). IEEE, Marrakech, Morocco (2013)
8.
Zurück zum Zitat Essid, S., Richard, G., David, B.: Efficient musical instrument recognition on solo performance music using basic features. In: 25th International AES Conference, London, UK (2004) Essid, S., Richard, G., David, B.: Efficient musical instrument recognition on solo performance music using basic features. In: 25th International AES Conference, London, UK (2004)
9.
Zurück zum Zitat Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS). Society for Artificial Intelligence and Statistics, Sardinia, Italy (2010) Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (AISTATS). Society for Artificial Intelligence and Statistics, Sardinia, Italy (2010)
10.
Zurück zum Zitat Gómez, J.S., Abeßer, J., Cano, E.: Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In: 19th International Society for Music Information Retrieval Conference, Paris, France (2018) Gómez, J.S., Abeßer, J., Cano, E.: Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In: 19th International Society for Music Information Retrieval Conference, Paris, France (2018)
11.
Zurück zum Zitat Grasis, M., Abeßer, J., Dittmar, C., Lukashevich, H.: A multiple-expert framework for instrument recognition. In: International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, pp. 619–634 (2013) Grasis, M., Abeßer, J., Dittmar, C., Lukashevich, H.: A multiple-expert framework for instrument recognition. In: International Symposium on Computer Music Multidisciplinary Research (CMMR), Marseille, France, pp. 619–634 (2013)
12.
Zurück zum Zitat Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 208–221 (2017)CrossRef Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 208–221 (2017)CrossRef
13.
Zurück zum Zitat Kareer, S., Basu, S.: Musical polyphony estimation. In: Audio Engineering Society Convention 144, Milan, Italy (2018) Kareer, S., Basu, S.: Musical polyphony estimation. In: Audio Engineering Society Convention 144, Milan, Italy (2018)
14.
Zurück zum Zitat Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), San Diego, USA (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR), San Diego, USA (2015)
15.
Zurück zum Zitat Nadar, C.R., Abeßer, J., Grollmisch, S.: Towards CNN-based acoustic modeling of seventh chords for automatic chord recognition. In: International Conference on Sound and Music Computing, Málaga, Spain (2019) Nadar, C.R., Abeßer, J., Grollmisch, S.: Towards CNN-based acoustic modeling of seventh chords for automatic chord recognition. In: International Conference on Sound and Music Computing, Málaga, Spain (2019)
16.
Zurück zum Zitat Park, D.S., et al.: SpecAugment: a simple augmentation method for automatic speech recognition. In: INTERSPEECH, Graz, Austria (2019) Park, D.S., et al.: SpecAugment: a simple augmentation method for automatic speech recognition. In: INTERSPEECH, Graz, Austria (2019)
17.
Zurück zum Zitat Prétet, L., Hennequin, R., Royo-Letelier, J., Vaglio, A.: Singing voice separation: a study on training data. IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Brighton, UK, pp. 506–510 (2019) Prétet, L., Hennequin, R., Royo-Letelier, J., Vaglio, A.: Singing voice separation: a study on training data. IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Brighton, UK, pp. 506–510 (2019)
18.
Zurück zum Zitat Sayoud, H., Boumediene, T.H., Ouamour, S., Boumediene, T.H.: Proposal of a new confidence parameter estimating the number of speakers - an experimental investigation. J. Inf. Hiding Multimedia Signal Process. 1(2), 101–109 (2010) Sayoud, H., Boumediene, T.H., Ouamour, S., Boumediene, T.H.: Proposal of a new confidence parameter estimating the number of speakers - an experimental investigation. J. Inf. Hiding Multimedia Signal Process. 1(2), 101–109 (2010)
19.
Zurück zum Zitat Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian Optimization of Machine Learning Algorithms. In: 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, pp. 2951–2959 (2012) Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian Optimization of Machine Learning Algorithms. In: 25th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, USA, pp. 2951–2959 (2012)
20.
Zurück zum Zitat Stöter, F.R., Chakrabarty, S., Edler, B., Habets, E.A.P.: Classification vs. regression in supervised learning for single channel speaker count estimation. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Calgary, Alberta, Canada , pp. 436–440. IEEE(2018) Stöter, F.R., Chakrabarty, S., Edler, B., Habets, E.A.P.: Classification vs. regression in supervised learning for single channel speaker count estimation. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), Calgary, Alberta, Canada , pp. 436–440. IEEE(2018)
21.
Zurück zum Zitat Wang, Y., Getreuer, P., Hughes, T., Lyon, R.F., Saurous, R.A.: Trainable frontend for robust and far-field keyword spotting. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), New Orleans, LA, USA, pp. 5670–5674. IEEE (2017) Wang, Y., Getreuer, P., Hughes, T., Lyon, R.F., Saurous, R.A.: Trainable frontend for robust and far-field keyword spotting. In: IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), New Orleans, LA, USA, pp. 5670–5674. IEEE (2017)
22.
Zurück zum Zitat Xu, C., Li, S., Liu, G., Zhang, Y.: Crowd ++ : Unsupervised speaker count with smartphones. In: ACM International Joint Conference on Pervasive and Ubiquitous Computing, Zurich, Switzerland, pp. 43–52. ACM (2013) Xu, C., Li, S., Liu, G., Zhang, Y.: Crowd ++ : Unsupervised speaker count with smartphones. In: ACM International Joint Conference on Pervasive and Ubiquitous Computing, Zurich, Switzerland, pp. 43–52. ACM (2013)
23.
Zurück zum Zitat Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR), Vancouver, BC, Canada (2018) Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (ICLR), Vancouver, BC, Canada (2018)
24.
Zurück zum Zitat Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA (2020) Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA (2020)
Metadaten
Titel
Ensemble Size Classification in Colombian Andean String Music Recordings
verfasst von
Sascha Grollmisch
Estefanía Cano
Fernando Mora Ángel
Gustavo López Gil
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-70210-6_4

Neuer Inhalt