Skip to main content
Erschienen in: Pattern Analysis and Applications 1/2018

23.03.2017 | Short Paper

A feature selection-based speaker clustering method for paralinguistic tasks

verfasst von: Gábor Gosztolya, László Tóth

Erschienen in: Pattern Analysis and Applications | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, computational paralinguistics has emerged as a new topic within speech technology. It concerns extracting non-linguistic information from speech (such as emotions, the level of conflict, whether the speaker is drunk). It was shown recently that many methods applied here can be assisted by speaker clustering; for example, the features extracted from the utterances could be normalized speaker-wise instead of using a global method. In this paper, we propose a speaker clustering algorithm based on standard clustering approaches like K-means and feature selection. By applying this speaker clustering technique in two paralinguistic tasks, we were able to significantly improve the accuracy scores of several machine learning methods, and we also obtained an insight into what features could be efficiently used to separate the different speakers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ajmera J, Wooters C (2003) A robust speaker clustering algorithm. In: Proceedings of ASRU, pp 411–416 Ajmera J, Wooters C (2003) A robust speaker clustering algorithm. In: Proceedings of ASRU, pp 411–416
2.
Zurück zum Zitat Benbouzid D, Busa-Fekete R, Casagrande N, Collin FD, Kégl B (2012) MultiBoost: a multi-purpose boosting package. J Mach Learn Res 13:549–553MATH Benbouzid D, Busa-Fekete R, Casagrande N, Collin FD, Kégl B (2012) MultiBoost: a multi-purpose boosting package. J Mach Learn Res 13:549–553MATH
3.
Zurück zum Zitat Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkCrossRefMATH Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkCrossRefMATH
4.
Zurück zum Zitat Bradley P, Fayyad UM (1998) Refining initial points for K-means clustering. In: Proceedings of ICML, Madison, WI, USA, pp 91–99 Bradley P, Fayyad UM (1998) Refining initial points for K-means clustering. In: Proceedings of ICML, Madison, WI, USA, pp 91–99
5.
Zurück zum Zitat Cha SH (2007) Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci 1(4):300–307 Cha SH (2007) Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci 1(4):300–307
6.
Zurück zum Zitat Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27CrossRef Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27CrossRef
7.
Zurück zum Zitat Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2010) Front end factor analysis for speaker verification. IEEE transactions on audio, speech and language processing, pp 788–798 Dehak N, Kenny PJ, Dehak R, Dumouchel P, Ouellet P (2010) Front end factor analysis for speaker verification. IEEE transactions on audio, speech and language processing, pp 788–798
8.
Zurück zum Zitat Dupuy G, Meignier S, Deléglise P, Estève Y (2014) Recent improvements on ILP-based clustering for broadcast news speaker diarization. In: Proceedings of Odyssey, pp 187–193 Dupuy G, Meignier S, Deléglise P, Estève Y (2014) Recent improvements on ILP-based clustering for broadcast news speaker diarization. In: Proceedings of Odyssey, pp 187–193
9.
Zurück zum Zitat Eyben F, Weninger F, Schuller B (2013) Affect recognition in real-life acoustic conditions - A new perspective on feature selection. In: Proceedings of Interspeech, Lyon, France, pp 2044–2048 Eyben F, Weninger F, Schuller B (2013) Affect recognition in real-life acoustic conditions - A new perspective on feature selection. In: Proceedings of Interspeech, Lyon, France, pp 2044–2048
10.
Zurück zum Zitat Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of ACM multimedia, pp 1459–1462 Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of ACM multimedia, pp 1459–1462
11.
Zurück zum Zitat Felföldi L, Kocsor A, Tóth L (2003) Classifier combination in speech recognition. Period Polytech Electr Eng 47(1):125–140MATH Felföldi L, Kocsor A, Tóth L (2003) Classifier combination in speech recognition. Period Polytech Electr Eng 47(1):125–140MATH
12.
Zurück zum Zitat Fred AL, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850CrossRef Fred AL, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850CrossRef
13.
Zurück zum Zitat Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier networks. In: Proceedings of AISTATS, pp 315–323 Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier networks. In: Proceedings of AISTATS, pp 315–323
14.
Zurück zum Zitat Gosztolya G (2014) Is AdaBoost competitive for phoneme classification? In: Proceedings of CINTI (IEEE), Budapest, Hungary, pp 61–66 Gosztolya G (2014) Is AdaBoost competitive for phoneme classification? In: Proceedings of CINTI (IEEE), Budapest, Hungary, pp 61–66
15.
Zurück zum Zitat Gosztolya G (2015) Conflict intensity estimation from speech using greedy forward-backward feature selection. In: Proceedings of Interspeech, Dresden, Germany, pp 1339–1344 Gosztolya G (2015) Conflict intensity estimation from speech using greedy forward-backward feature selection. In: Proceedings of Interspeech, Dresden, Germany, pp 1339–1344
16.
Zurück zum Zitat Gosztolya G, Busa-Fekete R, Tóth L (2013) Detecting autism, emotions and social signals using AdaBoost. In: Proceedings of Interspeech, Lyon, France, pp. 220–224 Gosztolya G, Busa-Fekete R, Tóth L (2013) Detecting autism, emotions and social signals using AdaBoost. In: Proceedings of Interspeech, Lyon, France, pp. 220–224
17.
Zurück zum Zitat Gosztolya G, Dombi J (2014) Applying representative uninorms for phonetic classifier combination. In: Proceedings of MDAI, Tokyo, Japan, pp 182–191 Gosztolya G, Dombi J (2014) Applying representative uninorms for phonetic classifier combination. In: Proceedings of MDAI, Tokyo, Japan, pp 182–191
18.
Zurück zum Zitat Gosztolya G, Grósz T, Busa-Fekete R, Tóth L (2014) Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks. In: Proceedings of Interspeech, Singapore, pp 452–456 Gosztolya G, Grósz T, Busa-Fekete R, Tóth L (2014) Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks. In: Proceedings of Interspeech, Singapore, pp 452–456
19.
Zurück zum Zitat Gosztolya G, Grósz T, Busa-Fekete R, Tóth L (2016) Determining native language and deception using phonetic features and classifier combination. In: Proceedings of Interspeech, p. accepted Gosztolya G, Grósz T, Busa-Fekete R, Tóth L (2016) Determining native language and deception using phonetic features and classifier combination. In: Proceedings of Interspeech, p. accepted
20.
Zurück zum Zitat Gosztolya G, Kocsor A (2005) A hierarchical evaluation methodology in speech recognition. Acta Cybern 17(2):213–224MathSciNetMATH Gosztolya G, Kocsor A (2005) A hierarchical evaluation methodology in speech recognition. Acta Cybern 17(2):213–224MathSciNetMATH
21.
Zurück zum Zitat Gosztolya G, Szilágyi L (2015) Application of fuzzy and possibilistic \(c\)-means clustering models in blind speaker clustering. Acta Polytechnica Hungarica 12(7):41–56 Gosztolya G, Szilágyi L (2015) Application of fuzzy and possibilistic \(c\)-means clustering models in blind speaker clustering. Acta Polytechnica Hungarica 12(7):41–56
22.
Zurück zum Zitat Grósz T, Busa-Fekete R, Gosztolya G, Tóth L (2015) Assessing the degree of Nativeness and Parkinson’s condition using Gaussian Processes and Deep Rectifier Neural Networks. In: Proceedings of Interspeech, pp 1339–1343 Grósz T, Busa-Fekete R, Gosztolya G, Tóth L (2015) Assessing the degree of Nativeness and Parkinson’s condition using Gaussian Processes and Deep Rectifier Neural Networks. In: Proceedings of Interspeech, pp 1339–1343
23.
Zurück zum Zitat Guan N, Tao D, Luo Z, Yuan B (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process 60(6):2882–2898MathSciNetCrossRef Guan N, Tao D, Luo Z, Yuan B (2012) NeNMF: an optimal gradient method for nonnegative matrix factorization. IEEE Trans Signal Process 60(6):2882–2898MathSciNetCrossRef
24.
Zurück zum Zitat Gupta R, Audhkhasi K, Lee S, Narayanan SS (2013) Speech paralinguistic event detection using probabilistic time-series smoothing and masking. In: Proceedings of Interspeech, pp 173–177 Gupta R, Audhkhasi K, Lee S, Narayanan SS (2013) Speech paralinguistic event detection using probabilistic time-series smoothing and masking. In: Proceedings of Interspeech, pp 173–177
25.
Zurück zum Zitat Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRef Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18CrossRef
26.
Zurück zum Zitat Han KJ, Narayanan SS (2008) Agglomerative hierarchical speaker clustering using incremental Gaussian mixture cluster modeling. In: Proceedings of Interspeech, pp 20–23 Han KJ, Narayanan SS (2008) Agglomerative hierarchical speaker clustering using incremental Gaussian mixture cluster modeling. In: Proceedings of Interspeech, pp 20–23
27.
Zurück zum Zitat Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT Press, Cambridge Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT Press, Cambridge
28.
Zurück zum Zitat Hantke S, Weninger F, Kurle R, Ringeval F, Batliner A, Mousa AED, Schuller B (2016) I hear you eat and speak: automatic recognition of Eating Condition and food type, use-cases, and impact on ASR performance. PLoS One 1–24 Hantke S, Weninger F, Kurle R, Ringeval F, Batliner A, Mousa AED, Schuller B (2016) I hear you eat and speak: automatic recognition of Eating Condition and food type, use-cases, and impact on ASR performance. PLoS One 1–24
29.
Zurück zum Zitat Kaya H, Özkaptan T, Salah AA, Gürgen F (2014) Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction. In: Proceedings of Interspeech, Singapore, pp 442–446 Kaya H, Özkaptan T, Salah AA, Gürgen F (2014) Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction. In: Proceedings of Interspeech, Singapore, pp 442–446
30.
Zurück zum Zitat Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeCrossRefMATH Manning C, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeCrossRefMATH
31.
Zurück zum Zitat Neuberger T, Beke A (2013) Automatic laughter detection in spontaneous speech using GMM–SVM method. In: Proceedings of TSD, pp 113–120 Neuberger T, Beke A (2013) Automatic laughter detection in spontaneous speech using GMM–SVM method. In: Proceedings of TSD, pp 113–120
32.
Zurück zum Zitat Plessis B, Sicsu A, Heutte L, Menu E, Lecolinet E, Debon O, Moreau JV (1993) A multi-classifier combination strategy for the recognition of handwritten cursive words. In: Proceedings of ICDAR, pp 642–645 Plessis B, Sicsu A, Heutte L, Menu E, Lecolinet E, Debon O, Moreau JV (1993) A multi-classifier combination strategy for the recognition of handwritten cursive words. In: Proceedings of ICDAR, pp 642–645
33.
Zurück zum Zitat Räsänen O, Pohjalainen J (2013) Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In: Proceedings of Interspeech, Lyon, France, pp 210–214 Räsänen O, Pohjalainen J (2013) Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. In: Proceedings of Interspeech, Lyon, France, pp 210–214
34.
Zurück zum Zitat Schapire R, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336CrossRefMATH Schapire R, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Mach Learn 37(3):297–336CrossRefMATH
35.
Zurück zum Zitat Schölkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRefMATH Schölkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRefMATH
36.
Zurück zum Zitat Schuller B, Steidl S, Batliner A, Epps J, Eyben F, Ringeval F, Marchi E, Zhang Y (2014) The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In: Proceedings of Interspeech, pp 427–431 Schuller B, Steidl S, Batliner A, Epps J, Eyben F, Ringeval F, Marchi E, Zhang Y (2014) The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In: Proceedings of Interspeech, pp 427–431
37.
Zurück zum Zitat Schuller B, Steidl S, Batliner A, Hantke S, Hönig F, Orozco-Arroyave JR, Nöth E, Zhang Y, Weninger F (2015) The INTERSPEECH 2015 computational paralinguistics challenge: Nativeness, Parkinson’s & Eating Condition. In: Proceedings of Interspeech, pp 478–482 Schuller B, Steidl S, Batliner A, Hantke S, Hönig F, Orozco-Arroyave JR, Nöth E, Zhang Y, Weninger F (2015) The INTERSPEECH 2015 computational paralinguistics challenge: Nativeness, Parkinson’s & Eating Condition. In: Proceedings of Interspeech, pp 478–482
38.
Zurück zum Zitat Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, Chetouani M, Weninger F, Eyben F, Marchi E, Salamin H, Polychroniou A, Valente F, Kim S (2013) The Interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of Interspeech, Lyon, France, pp 148–152 Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, Chetouani M, Weninger F, Eyben F, Marchi E, Salamin H, Polychroniou A, Valente F, Kim S (2013) The Interspeech 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of Interspeech, Lyon, France, pp 148–152
39.
Zurück zum Zitat Sculley D (2010) Web-scale k-means clustering. In: Proceedings of WWW, Raleigh, North Carolina, USA, pp 1177–1178 Sculley D (2010) Web-scale k-means clustering. In: Proceedings of WWW, Raleigh, North Carolina, USA, pp 1177–1178
40.
Zurück zum Zitat van Segbroeck M, Travadi R, Vaz C, Kim J, Black MP, Potamianos A, Narayanan SS (2014) Classification of Cognitive Load from speech using an i-vector framework. In: Proceedings of Interspeech, Singapore, pp 671–675 van Segbroeck M, Travadi R, Vaz C, Kim J, Black MP, Potamianos A, Narayanan SS (2014) Classification of Cognitive Load from speech using an i-vector framework. In: Proceedings of Interspeech, Singapore, pp 671–675
41.
Zurück zum Zitat Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 28(1):1409–1438 Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 28(1):1409–1438
42.
Zurück zum Zitat Steinhaus H (1956) Sur la division des corp materiels en parties. Bull Acad Pol Sci C1 III. (IV):801–804MathSciNetMATH Steinhaus H (1956) Sur la division des corp materiels en parties. Bull Acad Pol Sci C1 III. (IV):801–804MathSciNetMATH
43.
Zurück zum Zitat Stroop JR (1935) Studies of interference in serial verbal reactions. J Exp Psychol 18(6):643–662CrossRef Stroop JR (1935) Studies of interference in serial verbal reactions. J Exp Psychol 18(6):643–662CrossRef
44.
Zurück zum Zitat Szilágyi L, Szilágyi SM (2014) Generalization rules for the suppressed fuzzy \(c\)-means clustering algorithm. Neurocomputing 139:298–309CrossRef Szilágyi L, Szilágyi SM (2014) Generalization rules for the suppressed fuzzy \(c\)-means clustering algorithm. Neurocomputing 139:298–309CrossRef
45.
Zurück zum Zitat Todd SC, Tóth MT, Busa-Fekete R (2009) A MATLAB program for cluster analysis using graph theory. Comput Geosci 36(6):1205–1213CrossRef Todd SC, Tóth MT, Busa-Fekete R (2009) A MATLAB program for cluster analysis using graph theory. Comput Geosci 36(6):1205–1213CrossRef
46.
Zurück zum Zitat Tóth L (2014) Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition. In: Proceedings of ICASSP, pp 190–194 Tóth L (2014) Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition. In: Proceedings of ICASSP, pp 190–194
47.
Zurück zum Zitat Tóth SL, Sztahó D, Vicsi K (2012) Speech emotion perception by human and machine. In: Proceedings of COST action, Patras, Greece, pp 213–224 Tóth SL, Sztahó D, Vicsi K (2012) Speech emotion perception by human and machine. In: Proceedings of COST action, Patras, Greece, pp 213–224
48.
Zurück zum Zitat Yap TF (2012) Speech production under Cognitive Load: effects and classification. Ph.D. thesis, University of New South Wales Yap TF (2012) Speech production under Cognitive Load: effects and classification. Ph.D. thesis, University of New South Wales
49.
Zurück zum Zitat Yu K, Jiang X, Bunke H (2012) Partially supervised speaker clustering. IEEE Trans Pattern Anal Mach Intell 34(5):959–971CrossRef Yu K, Jiang X, Bunke H (2012) Partially supervised speaker clustering. IEEE Trans Pattern Anal Mach Intell 34(5):959–971CrossRef
Metadaten
Titel
A feature selection-based speaker clustering method for paralinguistic tasks
verfasst von
Gábor Gosztolya
László Tóth
Publikationsdatum
23.03.2017
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 1/2018
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-017-0612-0

Weitere Artikel der Ausgabe 1/2018

Pattern Analysis and Applications 1/2018 Zur Ausgabe

Industrial and Commercial Application

Efficient visual code localization with neural networks