Top

Published in:

2015 | OriginalPaper | Chapter

19. Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech

Authors : Raymond Brueckner, Björn Schuller

Published in: Conflict and Multimodal Communication

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Conflict is a fundamental phenomenon inevitably arising in inter-human communication and only recently has become the subject of study in the emerging field of computational paralinguistics. As speech is a predominant carrier of information about the valence and level of conflict we investigate and demonstrate how deep and hierarchical neural networks, which have become the new mainstream paradigm in automatic speech recognition over the last few years, can be leveraged to automatically classify and predict levels of conflict purely based on audio recordings. For this purpose we adopt a neural network architecture which we previously have applied successfully to another paralinguistics task. On the Conflict Sub-Challenge data set of the Interspeech 2013 Computational Paralinguistics Challenge (ComParE) we obtained the best results reported so far in the literature on both the classification and the regression task. These results demonstrate that deep neural networks are also appropriate for the prediction of conflict levels, both for classification and regression.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Detecting Speech Interruptions for Automatic Conflict Detection

next chapter Conflict Cues in Call Center Interactions

Bishop CM (2006) Pattern recognition and machine learning. Springer, BerlinMATH

Boakye K, Vinyals O, Friedland G (2011) Improved overlapped speech handling for speaker diarization. In: Proceedings of interspeech, ISCA, Florence, Aug 2011, pp 941–944

Bousmalis K, Mehu M, Pantic M (2009) Spotting agreement and disagreement: a survey of nonverbal audiovisual cues and tools. In: Proceedings of the 3rd international conference on affective computing and intelligent interaction and workshops, ACII 2009, vol 2. IEEE Computer Society Press, Los Alamitos

Brueckner R, Schuller B (2012) Likability classification - a not so deep neural network approach. In: Proceedings of interspeech, Portland, OR, Sep 2012

Brueckner R, Schuller B (2013) Hierarchical neural networks and enhanced class posteriors for social signal classification. In: Proceedings of ASRU, IEEE, Olomouc, Dec 2013, pp 361–364

Brueckner R, Schuller B (2014) Social signal classification using deep BLSTM recurrent neural networks. In: Proceedings of ICASSP, IEEE, Florence, May 2014

Dahl G, Sainath T, Hinton G (2013) Improving deep neural networks for LVCSR using rectified linear units and dropout. In: Proceedings of ICASSP, IEEE, Vancouver, May 2013, pp 8609–8613

Erhan D, Bengio Y, Courville A, Vincent PAMP, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660MATHMathSciNet

Eyben F, Wöllmer M, Schuller B (2010) openSMILE – the Munich versatile and fast open-source audio feature extractor. In: Proceedings of ACM multimedia, MM 2010, ACM, Florence, Oct 2010. ACM, New York, pp 1459–1462 (acceptance rate short paper: about 30 %)

Geiger JT, Vipperla R, Bozonnet S, Evans N, Schuller B, Rigoll G (2012) Convolutive non-negative sparse coding and new features for speech overlap handling in speaker diarization. In: Proceedings of interspeech, Portland, OR, Sept 2012

Geiger J, Eyben F, Schuller B, Rigoll G (2013) Detecting overlapping speech with long short-term memory recurrent neural networks. In: Proceedings of interspeech, ISCA, Lyon, Aug 2013, pp 1668–1672

Gers F, Schraudolph N, Schmidhuber J (2002) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143MathSciNet

Grèzes F, Richards J, Rosenberg A (2013) Let me finish: automatic conflict detection using speaker overlap. In: Proceedings of interspeech, ISCA, Lyon, Aug 2013, pp 200–204

Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800CrossRefMATHMathSciNet

Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554CrossRefMATHMathSciNet

Hinton G, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors. CoRR. abs/1207.0580

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780CrossRef

Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer SC, Kolen JF (eds) A field guide to dynamical recurrent neural networks. IEEE Press, New York

Jaeger H (2001) The “echo state” approach to analysing and training recurrent neural networks. GMD Report 148, GMD - German National Research Institute for Computer Science

Jaeger H, Maass W, Príncipe JC (2007) Special issue on echo state networks and liquid state machines. Neural Netw 20(3):287–289CrossRef

Judd CM (1978) Cognitive effects of attitude conflict resolution. J Conflict Resolut 22(3):483–498CrossRef

Kim S, Filippone M, Valente F, Vinciarelli A (2012) Predicting the conflict level in television political debates: an approach based on crowdsourcing, nonverbal communication and Gaussian processes. In: Babaguchi N, Aizawa K, Smith JR, Satoh S, Plagemann T, Hua XS, Yan R (eds) Proceedings of ACM international conference on multimedia, Nara. ACM, New York, pp 793–796CrossRef

Kim S, Yella SH, Valente F (2012) Automatic detection of conflict escalation in spoken conversations. In: Proceedings of interspeech, ISCA, Portland, OR, Sept 2012

Levine JM, Moreland RL (1998) Small groups. In: Gilbert D, Lindzey G (eds) The handbook of social psychology, vol 2. Oxford University Press, Oxford

Maas A, Hannun A, Ng A (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML workshop on deep learning for audio, speech, and language processing, WDLASL, Atlanta, GA, Jun 2013

Pesarin A, Cristani M, Murino V, Vinciarelli A (2012) Conversation analysis at work: detection of conflict in competitive discussions through automatic turn-organization analysis. Cogn Process 13(2):533–540CrossRef

Räsänen O, Pohjalainen J (2013) Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In: Proceedings of interspeech, Lyon, Aug 2013, pp 210–214

Salakhutdinov R (2009) Learning deep generative models. Ph.D. thesis, University of Toronto

Schmidhuber J (1992) Learning complex extended sequences using the principle of history compression. Neural Comput 4(2):234–242CrossRef

Schuller B (2012) The computational paralinguistics challenge. IEEE Signal Process Mag 29(4):97–101CrossRef

Schuller B, Batliner A (2013) Computational paralinguistics: emotion, affect and personality in speech and language processing. Wiley, New YorkCrossRef

Schuller B, Batliner A, Steidl S, Seppi D (2011) Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Commun 53(9/10):1062–1087 [Special Issue on Sensing Emotion and Affect – Facing Realism in Speech Processing]

Schuller B, Steidl S, Batliner A, Nöth E, Vinciarelli A, Burkhardt A, van Son R, Weninger F, Eyben F, Bocklet T, Mohammadi G, Weiss B (2012) The interspeech 2012 speaker trait challenge. In: Proceedings of interspeech, Portland, OR

Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, Chetouani M Weninger F, Eyben F, Marchi E, Mortillaro M, Salamin H, Polychroniou A, Valente F, Kim S (2013) The interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of interspeech, Lyon, Aug 2013

Schuster M, Paliwal K (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681CrossRef

Stuhlsatz A, Meyer C, Eyben F, Zielke T, Meier G, Schuller B (2011) Deep neural networks for acoustic emotion recognition: raising the benchmarks. In: Proceedings of ICASSP, Prague, pp 5688–5691

Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of ICML, New York, NY, 2008, pp 1096–1103

Vinciarelli A, Dielmann A, Favre S, Salamin H (2009) Canal9: a database of political debates for analysis of social interactions. In: Proceedings of the international conference on affective computing and intelligent interaction, Sept 2009, pp 1–4

Vinciarelli A, Pantic M, Bourlard H (2009) Social signal processing: survey of an emerging domain. Image Vis Comput 27(12):1743–1759CrossRef

Waibel A, Hanazawa T, Hinton G, Shikano K, Lang K (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):328–339CrossRef

Wang N, Melchior J, Wiskott L (2012) An analysis of Gaussian-binary restricted Boltzmann machines for natural images. In: Proceedings of ESANN, Bruges, Apr 2012, pp 287–292

Wrede B, Shriberg E (2003) Spotting “hot spots” in meetings: human judgments and prosodic cues. In: Proceedings of Eurospeech, ISCA, Geneva, Sept 2003, pp 2805–2808

Yamamoto K, Asano F, Yamada T, Kitawaki N (2006) Detection of overlapping speech in meetings using support vector machines and support vector regression. IEICE Trans Fundam Electron Commun Comput Sci 89-A(8):2158–2165CrossRef

Zeiler M, Ranzato M, Monga R, Mao M, Yang K, Le QV, Nguyen P, Senior A, Vanhoucke V, Dean J, Hinton G (2013) On rectified linear units for speech processing. In: ICASSP, IEEE, Vancouver, May 2013, pp 3517–3521

Zelenák M, Hernando J (2011) The detection of overlapping speech with prosodic features for speaker diarization. In: Proceedings of interspeech, ISCA, Florence, Aug 2011, pp 1041–1044

Title: Be at Odds? Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech
Authors: Raymond Brueckner
Björn Schuller
Publisher: Springer International Publishing
Book: Conflict and Multimodal Communication
Print ISBN: 978-3-319-14080-3

Electronic ISBN: 978-3-319-14081-0

Copyright Year: 2015
DOI: https://doi.org/10.1007/978-3-319-14081-0_19

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner