Emotion variability of the training and testing utterances is one of the largest challenges in speaker recognition. It is a common situation where training data is the neutral speech and testing data is the mixture of neutral and emotional speech. In this paper, we experimentally analyzed the performance of the GMM-based verification system with the utterances in this situation. It reveals that the verification performance improves as the emotion ratio decreases and the scores of neutral features against his/her model are distributed in the upper area than other three scores(neutral against the model of other speakers, and non-neutral speech against the model of himself/herself and other speakers). Based on these, we propose a scores selection method to reduce the emotion ratio of the testing utterance by eliminating the non-neutral features. It is applicable to the GMM-based recognition system without labeling the emotion state in the testing process. The experiments are carried on the MASC Corpus and the performance of the system with scores selection is improved with an EER reduction from 13.52% to 10.17%.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- Scores Selection for Emotional Speaker Recognition
- Springer Berlin Heidelberg
Neuer Inhalt/© ITandMEDIA