Skip to main content
Erschienen in: Neural Computing and Applications 6/2018

17.08.2016 | Original Article

Speaker recognition with hybrid features from a deep belief network

verfasst von: Hazrat Ali, Son N. Tran, Emmanouil Benetos, Artur S. d’Avila Garcez

Erschienen in: Neural Computing and Applications | Ausgabe 6/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Learning representation from audio data has shown advantages over the handcrafted features such as mel-frequency cepstral coefficients (MFCCs) in many audio applications. In most of the representation learning approaches, the connectionist systems have been used to learn and extract latent features from the fixed length data. In this paper, we propose an approach to combine the learned features and the MFCC features for speaker recognition task, which can be applied to audio scripts of different lengths. In particular, we study the use of features from different levels of deep belief network for quantizing the audio data into vectors of audio word counts. These vectors represent the audio scripts of different lengths that make them easier to train a classifier. We show in the experiment that the audio word count vectors generated from mixture of DBN features at different layers give better performance than the MFCC features. We also can achieve further improvement by combining the audio word count vector and the MFCC features.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
A useful survey is presented by Kinnunen et al. [2] on the use of MFCCs and other features such as super vectors for speaker recognition.
 
2
Besides the work reviewed in this section, a more recent work has been reported lately in [3], which presents a deep neural network approach for speaker recognition task.
 
3
The i-vector is a recently developed features set for representation of speech data in low dimension [8] and has attracted the machine learning community through the NIST i-vector challenge [9, 10].
 
4
A useful tutorial on SVM is available from Burges [17].
 
5
The dataset can be requested via email.
 
6
Previous experimentations with this dataset for speech recognition applications have been reported by [20, 21].
 
Literatur
1.
Zurück zum Zitat Mohamed AR, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22CrossRef Mohamed AR, Dahl GE, Hinton G (2012) Acoustic modeling using deep belief networks. IEEE Trans Audio Speech Lang Process 20(1):14–22CrossRef
2.
Zurück zum Zitat Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef Kinnunen T, Li H (2010) An overview of text-independent speaker recognition: from features to supervectors. Speech Commun 52(1):12–40CrossRef
3.
Zurück zum Zitat Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675CrossRef Richardson F, Reynolds D, Dehak N (2015) Deep neural network approaches to speaker and language recognition. IEEE Signal Process Lett 22(10):1671–1675CrossRef
4.
Zurück zum Zitat Lee H, Pham P, Largman Y, Ng AY (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A (eds) Advances in neural information processing systems. NIPS, Abu Dhabi, pp 1096–1104 Lee H, Pham P, Largman Y, Ng AY (2009) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A (eds) Advances in neural information processing systems. NIPS, Abu Dhabi, pp 1096–1104
6.
Zurück zum Zitat Senoussaoui M, Dehak N, Kenny P, Dehak R, Dumouchel P (2012) First attempt of Boltzmann machines for speaker verification. In: Odyssey 2012: the speaker and language recognition workshop. ACM, pp 1064–1071 Senoussaoui M, Dehak N, Kenny P, Dehak R, Dumouchel P (2012) First attempt of Boltzmann machines for speaker verification. In: Odyssey 2012: the speaker and language recognition workshop. ACM, pp 1064–1071
7.
Zurück zum Zitat Ghahabi O, Hernando J (2014) Deep belief networks for i-vector based speaker recognition. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1700–1704 Ghahabi O, Hernando J (2014) Deep belief networks for i-vector based speaker recognition. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1700–1704
8.
Zurück zum Zitat Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRef Dehak N, Kenny P, Dehak R, Dumouchel P, Ouellet P (2011) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788–798CrossRef
10.
Zurück zum Zitat Ali H, d’Avila Garcez A, Tran S, Zhou X, Iqbal K (2014) Unimodal late fusion for NIST i-vector challenge on speaker detection. Electron Lett 50(15):1098–1100CrossRef Ali H, d’Avila Garcez A, Tran S, Zhou X, Iqbal K (2014) Unimodal late fusion for NIST i-vector challenge on speaker detection. Electron Lett 50(15):1098–1100CrossRef
11.
Zurück zum Zitat Molau S, Pitz M, Schluter R, Ney H (2001) Computing mel-frequency cepstral coefficients on the power spectrum. In: Proceedings of 2001 IEEE international conference on acoustics, speech, and signal processing, vol 1, pp 73–76 Molau S, Pitz M, Schluter R, Ney H (2001) Computing mel-frequency cepstral coefficients on the power spectrum. In: Proceedings of 2001 IEEE international conference on acoustics, speech, and signal processing, vol  1, pp 73–76
12.
13.
Zurück zum Zitat Le Roux N, Bengio Y (2008) Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput 20(6):1631–1649MathSciNetCrossRefMATH Le Roux N, Bengio Y (2008) Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput 20(6):1631–1649MathSciNetCrossRefMATH
14.
Zurück zum Zitat Deng L, Yu D (2014) Deep learning: methods and applications. NOW Publishers, BredaMATH Deng L, Yu D (2014) Deep learning: methods and applications. NOW Publishers, BredaMATH
15.
Zurück zum Zitat Freund Y, Haussler D (1994) Unsupervised learning of distributions on binary vectors using two layer networks. University of California at Santa Cruz, Santa Cruz, Tech. Rep Freund Y, Haussler D (1994) Unsupervised learning of distributions on binary vectors using two layer networks. University of California at Santa Cruz, Santa Cruz, Tech. Rep
16.
Zurück zum Zitat Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800CrossRefMATH Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800CrossRefMATH
17.
Zurück zum Zitat Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRef Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167CrossRef
19.
Zurück zum Zitat Ali H, Ahmad N, Yahya KM, Farooq O (2012) A medium vocabulary Urdu isolated words balanced corpus for automatic speech recognition. In: 2012 international conference on electronics computer technology (ICECT 2012), pp 473–476 Ali H, Ahmad N, Yahya KM, Farooq O (2012) A medium vocabulary Urdu isolated words balanced corpus for automatic speech recognition. In: 2012 international conference on electronics computer technology (ICECT 2012), pp 473–476
20.
Zurück zum Zitat Ali H, Ahmad N, Zhou X, Ali M, Manjotho A (2014) Linear discriminant analysis based approach for automatic speech recognition of Urdu isolated words. In: Communication technologies, information security and sustainable development, ser. communications in computer and information science, vol 414. Springer International Publishing, pp 24–34 Ali H, Ahmad N, Zhou X, Ali M, Manjotho A (2014) Linear discriminant analysis based approach for automatic speech recognition of Urdu isolated words. In: Communication technologies, information security and sustainable development, ser. communications in computer and information science, vol 414. Springer International Publishing, pp 24–34
21.
Zurück zum Zitat Ali H, Ahmad N, Zhou X, Iqbal K, Ali SM (2014) DWT features performance analysis for automatic speech recognition of Urdu. SpringerPlus 3(1):204CrossRef Ali H, Ahmad N, Zhou X, Iqbal K, Ali SM (2014) DWT features performance analysis for automatic speech recognition of Urdu. SpringerPlus 3(1):204CrossRef
Metadaten
Titel
Speaker recognition with hybrid features from a deep belief network
verfasst von
Hazrat Ali
Son N. Tran
Emmanouil Benetos
Artur S. d’Avila Garcez
Publikationsdatum
17.08.2016
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 6/2018
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-016-2501-7

Weitere Artikel der Ausgabe 6/2018

Neural Computing and Applications 6/2018 Zur Ausgabe

Premium Partner