nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Automatic Recognition of Sound Categories from Their Vocal Imitation Using Audio Primitives Automatically Found by SI-PLCA and HMM

verfasst von : Enrico Marchetto, Geoffroy Peeters

Erschienen in: Music Technology with Swing

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper we study the automatic recognition of sound categories (such as fridge, mixers or sawing sounds) from their vocal imitations. Vocal imitations are made of a succession over time of sounds produced using vocal mechanisms that can largely differ from the ones used in speech. We develop here a recognition approach inspired by automatic-speech-recognition systems, with an acoustic model (that maps the audio signal to a set of probability over “phonemes”) and a language model (that represents the expected succession of “phonemes” for each sound category). Since we do not know what are the underlying “phonemes” of vocal imitations we propose to automatically estimate them using Shift-Invariant Probabilistic Latent Component Analysis (SI-PLCA) applied to a dataset of vocal imitations. The kernel distributions of the SI-PLCA are considered as the “phonemes” of vocal imitation and its impulse distributions are used to compute the emission probabilities of the states of a set of Hidden Markov Models (HMMs). To evaluate our proposal, we test it for a task of automatically recognizing 12 sound categories from their vocal imitations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel Automatic Estimation of Harmonic Tension by Distributed Representation of Chords

http://skatvg.iuav.it/.

In our CQT, harmonics are spaced by 18 bins, so the kernels size in frequency has to be at least 18 to exploit the shift-invariance.

One third of the data is used for testing, the remaining for training; each third is used in turns for testing.

The same subject can not appear simultaneously in the training and testing set.

Baldan, S., Delle Monache, S., Rocchesso, D.: The sound design toolkit. Softw. X 6, 255–260 (2017)

Brown, J.C.: Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)CrossRef

Houix, O., Monache, S.D., Lachambre, H., Bevilacqua, F., Rocchesso, D., Lemaitre, G.: Innovative tools for sound sketching combining vocalizations and gestures. In: Proceedings of the Audio Mostly 2016, pp. 12–10. ACM (2016)

Ishihara, K., Nakatani, T., Ogata, T., Okuno, H.G.: Automatic sound-imitation word recognition from environmental sounds focusing on ambiguity problem in determining phonemes. In: Zhang, C., W. Guesgen, H., Yeap, W.-K. (eds.) PRICAI 2004. LNCS (LNAI), vol. 3157, pp. 909–918. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28633-2_96CrossRef

Juang, B.H., Rabiner, L.R.: Automatic speech recognition-a brief history of the technology development. Georgia Institute of Technology. Atlanta Rutgers University and the University of California 1:67 (2005)

Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)CrossRef

Lemaitre, G., Dessein, A., Aura, K., Susini, P.: Do vocal imitations enable the identification of the imitated sounds. In: Proceedings of the 8th Annual Auditory Perception, Cognition and Action Meeting (APCAM 2009), Boston, MA (2009)

Lemaitre, G., Houix, O., Voisin, F., Misdariis, N., Susini, P.: Vocal imitations of non-vocal sounds. PLoS ONE 11(12), e0168167 (2016). Public Library of ScienceCrossRef

Lemaitre, G., Rocchesso, D.: On the effectiveness of vocal imitations and verbal descriptions of sounds. J. Acoust. Soc. Am. 135(2), 862–873 (2014). http://www.ncbi.nlm.nih.gov/pubmed/25234894CrossRef

10.

Marchetto, E., Peeters, G.: A set of audio features for the morphological description of vocal imitations. In: Proceedings of DAFx (2015)

11.

Paatero, P., Tapper, U.: Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2), 111–126 (1994). https://doi.org/10.1002/env.3170050203CrossRef

12.

Peeters, G., Deruty, E.: Sound indexing using morphological description. IEEE Trans. Audio Speech Lang. Process. 18(3), 675–687 (2010)CrossRef

13.

Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRef

14.

Rabiner, L.R., Juang, B.H.: Fundamentals of speech recognition (1993)

15.

Ricard, J., Herrera, P.: Morphological sound description: computational model and usability evaluation. In: Audio Engineering Society Convention 116 (2004)

16.

Saon, G., Chien, J.T.: Large-vocabulary continuous speech recognition systems: a look at some recent advances. IEEE Sig. Process. Mag. 29(6), 18–33 (2012)CrossRef

17.

Schaeffer, P.: Traité des objets musicaux. Le Seuil (1966)

18.

Schörkhuber, C., Klapuri, A., Holighaus, N., Dörfler, M.: A Matlab toolbox for efficient perfect reconstruction time-frequency transforms with log-frequency resolution. In: Audio Engineering Society Conference: 53rd International Conference: Semantic Audio, January 2014. http://www.aes.org/e-lib/browse.cfm?elib=17112

19.

Shashanka, M., Raj, B., Smaragdis, P.: Probabilistic latent variable models as nonnegative factorizations. Comput. Intell. Neurosci. 2008, 8 (2008). Article ID 947438. https://doi.org/10.1155/2008/947438CrossRef

20.

Smaragdis, P., Raj, B.: Shift-invariant probabilistic latent component analysis. Technical report, MERL (2007)

21.

Sundaram, S., Narayanan, S.: Vector-based representation and clustering of audio using onomatopoeia words. In: Proceedings of AAAI (2006)

22.

Sundaram, S., Narayanan, S.: Classification of sound clips by two schemes: using onomatopoeia and semantic labels. In: 2008 IEEE International Conference on Multimedia and Expo, pp. 1341–1344. IEEE (2008)

23.

Velasco, G.A., Holighaus, N., Dörfler, M., Grill, T.: Constructing an invertible constant-Q transform with non-stationary Gabor frames. In: Proceedings of DAFx, Paris, pp. 93–99 (2011)

Titel: Automatic Recognition of Sound Categories from Their Vocal Imitation Using Audio Primitives Automatically Found by SI-PLCA and HMM
verfasst von: Enrico Marchetto
Geoffroy Peeters
Verlag: Springer International Publishing
Buch: Music Technology with Swing
Print ISBN: 978-3-030-01691-3

Electronic ISBN: 978-3-030-01692-0

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-01692-0_1

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Interview Entropie Bild 1/© Bernhard Weßling, Joerg Schweinsberg/© Datacore Software, Smart Factory Symbolbild/© TensorSpark | Generated with AI | Getty Images, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.