Top

Artificial Life and Robotics

Published in:

23-01-2020 | Original Article

The long short-term memory based on i-vector extraction for conversational speech gender identification approach

Authors: Barlian Henryranu Prasetio, Hiroki Tamura, Koichi Tanno

Published in: Artificial Life and Robotics | Issue 2/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Stress causes a speaker’s voice characteristics to be changed. Emotional stress alters a person’s speech pattern such that it is distributed non-normally along the temporal dimension. Thus, the methods for identifying the gender of a non-stressed speaker were no longer effective in recognizing the gender of a speaker in stressful conditions. To address this issue, a new gender identification framework is proposed. We leveraged i-vector for capturing gender information on each speech segment. Then the long short-term memory dynamically handled all speech temporal context features and learned the long-term dependency from the input. We evaluated the effectiveness, in terms of accuracy and the number of iterations to saturate, of the proposed method by comparing it with the baseline methods in their respective abilities to identify the speaker’s gender from conversations with different durations. By learning the gender information encoded in long-term dependencies, our proposed method outperforms the baseline methods and is able to correctly identify the speaker’s gender in all conversation types.

previous article Design of resilient socio-technical systems by human-system co-creation

next article Study on visualization of cognitive rectifying with conversation documents in psychological counseling

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Kanervisto A, Vestman V, Sahidullah Md, Hautamaki V, Kinnunen T (2017) Effects of gender information in text-independent and text-dependent speaker verification. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), New Orleans, LA, USA

Jayasankar T, Vinothkumar K, Vijayaselvi A (2017) Automatic gender identification in speech recognition by genetic algorithm. Appl Math Inf Sci 11(3):907–913CrossRef

Jayasankar T, Vinothkumar K, Vijayaselvi A (2013) Gender-dependent emotion recognition based on HMMs and SPHMMs. Int J Speech Technol 16(2):133–141CrossRef

Shaqra FA, Duwairi R, Al-Ayyoub M (2019) Recognizing emotion from speech based on age and gender using hierarchical models. Proced Comput Sci 151:37–44CrossRef

Bisio I, Delfino A, Lavagetto F, Marchese M, Sciarrone A (2013) Gender-driven emotion recognition through speech signals for ambient intelligence applications. IEEE Trans Emerg Top Comput 1(2):224–257

Zhang L, Wang L, Dang J, Guo L, Yu Q (2018) Gender-aware CNN-BLSTM for speech emotion recognition. In: International conference on artificial neural networks (ICANN), Rhodes, Greece

Lester-Smith RA, Brad HS (2016) The effects of physiological adjustments on the perceptual and acoustical characteristics of vibrato as a model of vocal tremor. J Acoust Soc Am 140(5):3827–3833CrossRef

Archana GS, Malleswari M (2015) Gender identification and performance analysis of speech signals. In: Global conference on communication technologies (GCCT), Thuckalay, India

Ramdinmawii E, Mittal VK (2016) Gender identification from speech signal by examining the speech production characteristics. In: International conference on signal processing and communication (ICSC), Noida, India

10.

Gupta M, Bharti SS, Agarwal S (2016) Support vector machine based gender identification using voiced speech frames. In: International conference on parallel, distributed and grid computing (PDGC), Waknaghat, India

11.

Levitan SI, Mishra T, Bangalore S (2016) Automatic identification of gender from speech. In: Proceeding of speech prosody

12.

Hansen JHL, Patil S (2007) Speech under stress: analysis, modeling and recognition. In: Müller C (eds) Speaker classification I. Lecture notes in computer science, vol 4343. Springer, Berlin

13.

Godin KW, Hansen JHL (2015) Physical task stress and speaker variability in voice quality. EURASIP J Audio Speech Music Process 29:2015

14.

Marten J (2005) Culture, gender and the recognition of the basic emotions. Psychologia 2005(48):306–316CrossRef

15.

Grzybowska J, Ziolko M (2015) I-vectors in gender recognition from telephone speech. In: Proceedings of the twenty-first national conference on applications of mathematics in biology and medicine, pp 57–62

16.

Wang M, Chen Y, Tang Z, Zhang E (2015) I-vector based speaker gender recognition. In: IEEE advanced information technology, electronic and automation control conference (IAEAC), Chongqing, China

17.

Xia R, Liu Y (2012) Using i-vector space model for emotion recognition. INTERSPEECH, Portland

18.

Gomes J, EL-Sharkawy M (2015) i-vector algorithm with Gaussian mixture model for efficient speech emotion recognition. In: International conference on computational science and computational intelligence (CSCI), Las Vegas, NV, USA

19.

Xia R, Liu Y (2016) DBN-ivector framework for acoustic emotion recognition. INTERSPEECH, San FranciscoCrossRef

20.

Verma P, Das PK (2015) i-Vectors in speech processing applications: a survey. Int J Speech Technol 18(4):529–546CrossRef

21.

Coutinho E, Schuller B (2017) Shared acoustic codes underlie emotional communication in music and speech-evidence from deep transfer learning. PLoS One 12:6

22.

Hansen JHL (1999) Composer, SUSAS LDC99S78. Web download. [Sound Recording]. Linguistic Data Consortium, Philadelphia

23.

Hansen JHL (1999) Composer, SUSAS transcripts LDC99T33. [Sound Recording]. Linguistic Data Consortium, Philadelphia

24.

Son HH (2017) Toward a proposed framework for mood recognition using LSTM recurrent neuron network. Proced Comput Sci 109:1028–1034CrossRef

25.

Son G, Kwon S, Park N (2019) Gender classification based on the non-lexical cues of emergency calls with recurrent neural networks (RNN). Symmetry 11(4):15CrossRef

26.

Nammous MK, Saeed K (2019) Natural language processing: speaker, language, and gender identification with LSTM, advanced computing and systems for security, advances in intelligent systems and computing. Springer, Singapore, p 883

27.

Serizel R, Bisot V, Essid S, Richard G (2017) Acoustic features for environmental sound analysis. Computational analysis of sound scenes and events. Springer, Berlin, pp 71–101

28.

Rabiner LR, Schafer RW (2010) Theory and applications of digital speech processing. Pearson, Upper Saddle River

29.

Zewoudie AW, Luque J, Hernandi J (2018) The use of long-term features for GMM-and i-vector-based speaker diarization systems. EURASIP J Audio Speech Music Process 14:2018

30.

Zazo R, Nidadavolu PS, Chen N, Gonzalez-Rodriguez J, Dehak N (2018) Age estimation in short speech utterances based on LSTM recurrent neural networks. IEEE Access 6:22524–22530CrossRef

31.

Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef

32.

Alsulami B, Dauber E, Harang RE, Mancoridis S, Greenstadt R (2017) Source code authorship attribution using long short-term memory based networks. In: 22nd European symposium on research in computer security (ESORICS), Oslo, Norway

33.

Dwarampudi M, Reddy NVS (2019) Effects of padding on LSTMs and CNNs. arXiv:1903.07288v1 [cs.LG]

34.

Mesnil G, He X, Deng L, Bengio Y (2013) Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. INTERSPEECH, Lyon

35.

Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Proceedings of the 30th international conference on machine learning, vol 28, no 3, pp 1310–1318

36.

Sak H, Senior A, Beaufays F (2014) Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. arXiv:1402.1128v1 [cs.NE]

37.

Yokoyama N, Azuma D, Tsukiyama S (2016) An efficient Gaussian mixture reduction to two components. In: The 20th workshop on synthesis and system integration of mixed information technologies, Kyoto, Japan, pp 236–241

Title: The long short-term memory based on i-vector extraction for conversational speech gender identification approach
Authors: Barlian Henryranu Prasetio
Hiroki Tamura
Koichi Tanno
Publication date: 23-01-2020
Publisher: Springer Japan
Published in: Artificial Life and Robotics / Issue 2/2020
Print ISSN: 1433-5298
Electronic ISSN: 1614-7456
DOI: https://doi.org/10.1007/s10015-020-00582-x

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Other articles of this Issue 2/2020

Origin of life: a theory of fundamental universal structure of life

Evaluation of visual image for remotely controlled ship

Prognostic medication: prediction by a macroscopic equation model for actual medical histories of illness with various recovery speeds

Study on visualization of cognitive rectifying with conversation documents in psychological counseling

Duplicating same argument of function to realize efficient hardware for high-level synthesis

Dual-arm robot teleoperation support with the virtual world