Skip to main content

2016 | OriginalPaper | Buchkapitel

Active Learning for Speech Event Detection in HCI

verfasst von : Patrick Thiam, Sascha Meudt, Friedhelm Schwenker, Günther Palm

Erschienen in: Artificial Neural Networks in Pattern Recognition

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, a pool-based active learning approach combining outlier detection methods with uncertainty sampling is proposed for speech event detection. Events in this case are regarded as atypical utterances (e.g. laughter, heavy breathing) occurring sporadically during a Human Computer Interaction (HCI) scenario. The proposed approach consists in using rank aggregation to select informative speech segments which have previously been ranked using different outlier detection techniques combined with an uncertainty sampling technique. The uncertainty sampling method is based on the distance to the boundary of a Support Vector Machine with Radial Basis Function kernel trained on the available annotated samples. Extensive experimental results prove the effectiveness of the proposed approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alam, M.J., Kenny, P., Ouellet, P., Stafylakis, T., Dumouchel, P.: Supervised/unsupervised voice activity detector for text-dependent speaker recognition on RSR2015 corpus. In: Odyssey Speaker and Language Recognition Workshop (2014) Alam, M.J., Kenny, P., Ouellet, P., Stafylakis, T., Dumouchel, P.: Supervised/unsupervised voice activity detector for text-dependent speaker recognition on RSR2015 corpus. In: Odyssey Speaker and Language Recognition Workshop (2014)
2.
Zurück zum Zitat Bergmeir, C., Benìtez, J.M.: On the use of cross-validation for time series predictor evaluation. Inf. Sci. 191, 192–213 (2012)CrossRef Bergmeir, C., Benìtez, J.M.: On the use of cross-validation for time series predictor evaluation. Inf. Sci. 191, 192–213 (2012)CrossRef
3.
Zurück zum Zitat Bradley, M.M., Lang, P.J.: Measuring emotion: the self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 25(1), 49–59 (1994)CrossRef Bradley, M.M., Lang, P.J.: Measuring emotion: the self-assessment manikin and the semantic differential. J. Behav. Ther. Exp. Psychiatry 25(1), 49–59 (1994)CrossRef
4.
Zurück zum Zitat Chang, W.C., Lee, C.P., Lin, C.J.: A revisit to support vector data description (SVDD). Technical reports (2013) Chang, W.C., Lee, C.P., Lin, C.J.: A revisit to support vector data description (SVDD). Technical reports (2013)
5.
Zurück zum Zitat Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATH
6.
Zurück zum Zitat Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: ACM Multimedia (MM), pp. 835–838, October 2013 Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in opensmile, the Munich open-source multimedia feature extractor. In: ACM Multimedia (MM), pp. 835–838, October 2013
7.
Zurück zum Zitat Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds.) ISICA 2009. CCIS, vol. 51, pp. 461–471. Springer, Heidelberg (2009)CrossRef Gu, Q., Zhu, L., Cai, Z.: Evaluation measures of the classification performance of imbalanced data sets. In: Cai, Z., Li, Z., Kang, Z., Liu, Y. (eds.) ISICA 2009. CCIS, vol. 51, pp. 461–471. Springer, Heidelberg (2009)CrossRef
8.
Zurück zum Zitat Hermansky, H.: Perceptual Linear Predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRef Hermansky, H.: Perceptual Linear Predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRef
9.
Zurück zum Zitat Jagan Mohan, B., Ramesh Babu, N.: Speech recognition using MFCC and DTW. In: 2014 International Conference on Advances in Electrical Engineering (ICAEE), pp. 1–4, January 2014 Jagan Mohan, B., Ramesh Babu, N.: Speech recognition using MFCC and DTW. In: 2014 International Conference on Advances in Electrical Engineering (ICAEE), pp. 1–4, January 2014
10.
Zurück zum Zitat Krothapalli, S.R., Koolagudi, S.G.: Emotion recognition using vocal tract information. In: Krothapalli, S.R., Koolagudi, S.G. (eds.) Emotion Recognition using Speech Features. SpringerBriefs in Electrical and Computer Engineering, pp. 67–78. Springer, New York (2013) Krothapalli, S.R., Koolagudi, S.G.: Emotion recognition using vocal tract information. In: Krothapalli, S.R., Koolagudi, S.G. (eds.) Emotion Recognition using Speech Features. SpringerBriefs in Electrical and Computer Engineering, pp. 67–78. Springer, New York (2013)
11.
Zurück zum Zitat Krothapalli, S.R., Koolagudi, S.G.: Speech emotion recognition: a review. In: Krothapalli, S.R., Koolagudi, S.G. (eds.) Emotion Recognition using Speech Features. SpringerBriefs in Electrical and Computer Engineering, pp. 15–34. Springer, New York (2013) Krothapalli, S.R., Koolagudi, S.G.: Speech emotion recognition: a review. In: Krothapalli, S.R., Koolagudi, S.G. (eds.) Emotion Recognition using Speech Features. SpringerBriefs in Electrical and Computer Engineering, pp. 15–34. Springer, New York (2013)
12.
Zurück zum Zitat Lin, S.: Rank aggregation methods. Wiley Interdisc. Rev. Comput. Stat. 2(5), 555–570 (2010)CrossRef Lin, S.: Rank aggregation methods. Wiley Interdisc. Rev. Comput. Stat. 2(5), 555–570 (2010)CrossRef
13.
Zurück zum Zitat Lòpez, V., Fernàndez, A., Garcìa, S., Palade, V., Herrera, F.: Strategies for learning in class imbalance problems. Pattern Recogn. 36(3), 849–851 (2003)CrossRef Lòpez, V., Fernàndez, A., Garcìa, S., Palade, V., Herrera, F.: Strategies for learning in class imbalance problems. Pattern Recogn. 36(3), 849–851 (2003)CrossRef
14.
Zurück zum Zitat Meudt, S., Bigalke, L., Schwenker, F.: Atlas - an annotation tool for HCI data utilizing machine learning methods. In: Proceedings of the 1st International Conference on Affective and Pleasurable Design (APD 2012) (Jointly with the 4th International Conference on Applied Human Factors and Ergonomics (AHFE 2012)), pp. 5347–5352 (2012) Meudt, S., Bigalke, L., Schwenker, F.: Atlas - an annotation tool for HCI data utilizing machine learning methods. In: Proceedings of the 1st International Conference on Affective and Pleasurable Design (APD 2012) (Jointly with the 4th International Conference on Applied Human Factors and Ergonomics (AHFE 2012)), pp. 5347–5352 (2012)
15.
Zurück zum Zitat Russel, J.A.: Core affect and the psychological construction of emotion. Pyschological Rev. 110(1), 145–172 (2003)CrossRef Russel, J.A.: Core affect and the psychological construction of emotion. Pyschological Rev. 110(1), 145–172 (2003)CrossRef
16.
Zurück zum Zitat Schüssel, F., Honold, F., Bubalo, N., Huckauf, A., Traue, H., Hazer-Rau, D.: In-depth analysis of multimodal interaction: an explorative paradigm. In: Kurosu, M. (ed.) HCI 2016. LNCS, vol. 9732, pp. 233–240. Springer, Heidelberg (2016)CrossRef Schüssel, F., Honold, F., Bubalo, N., Huckauf, A., Traue, H., Hazer-Rau, D.: In-depth analysis of multimodal interaction: an explorative paradigm. In: Kurosu, M. (ed.) HCI 2016. LNCS, vol. 9732, pp. 233–240. Springer, Heidelberg (2016)CrossRef
17.
Zurück zum Zitat Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)CrossRefMATH Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)CrossRefMATH
18.
Zurück zum Zitat Thiam, P., Kächele, M., Schwenker, F., Palm, G.: Ensembles of support vector data description for active learning based annotation of affective corpora. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 1801–1807, December 2015 Thiam, P., Kächele, M., Schwenker, F., Palm, G.: Ensembles of support vector data description for active learning based annotation of affective corpora. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 1801–1807, December 2015
19.
Zurück zum Zitat Thiam, P., Meudt, S., Kächele, M., Palm, G., Schwenker, F.: Detection of emotional events utilizing support vector methods in an active learning HCI scenario. In: Proceedings of the 2014 Workshop on Emotion Representation and Modelling in Human-Computer-Interaction-Systems, ERM4HCI 2014, pp. 31–36. ACM, New York (2014) Thiam, P., Meudt, S., Kächele, M., Palm, G., Schwenker, F.: Detection of emotional events utilizing support vector methods in an active learning HCI scenario. In: Proceedings of the 2014 Workshop on Emotion Representation and Modelling in Human-Computer-Interaction-Systems, ERM4HCI 2014, pp. 31–36. ACM, New York (2014)
Metadaten
Titel
Active Learning for Speech Event Detection in HCI
verfasst von
Patrick Thiam
Sascha Meudt
Friedhelm Schwenker
Günther Palm
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46182-3_24