Skip to main content
Erschienen in: Neural Computing and Applications 3/2012

01.04.2012 | Original Article

A neural predictive coding feature extraction scheme in DCT domain for phoneme recognition

verfasst von: Mahmood Yousefi Azar, Farbod Razzazi

Erschienen in: Neural Computing and Applications | Ausgabe 3/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nonlinear feature extraction of speech signals has been the main concern of many researches in recent years. In this paper, feature extraction of phonemes using NPC (neural predictive coding) model is generalized to a combination of time and DCT domains. Two main ideas were proposed and evaluated in this paper. First, a frame-wise DCT-based NPC feature extractor is proposed to overcome the computational complexity deficiency of the system. The basis of this approach is the application of a DCT pre-feature extractor to remove unwanted additional data. In this approach, the extracted features are the output of the hidden layer. It is shown that the use of a pre-processing stage can improve both computational complexity efficiency and accuracy issues. At the second approach, we proposed a complementary role for DCT domain features in classic NPC modeling. This approach uses the signal residual of the predicted signal in the DCT domain. The experiments were conducted on voiced plosive phonemes of TIMIT database. Simulations showed that the performance of the combined method is good at the plosive phonemes. The achieved accuracy that was resulted from the proposed method was 70.3% recognition rate on /b/d/g/ phonemes, which is higher than the results of traditional NPC approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Anusuya MA, Katti SK (2009) Speech recognition by machine: a review. Int J Comput Sci Inf Secur 6:181–205 Anusuya MA, Katti SK (2009) Speech recognition by machine: a review. Int J Comput Sci Inf Secur 6:181–205
3.
Zurück zum Zitat Shi G, Shanechi M, Aarabi P (2006) On the importance of phase in human speech recognition. IEEE Trans Audio Speech Lang Processing 14:1867–1874CrossRef Shi G, Shanechi M, Aarabi P (2006) On the importance of phase in human speech recognition. IEEE Trans Audio Speech Lang Processing 14:1867–1874CrossRef
4.
Zurück zum Zitat Garimella S, Nemala SK, Elhilali M, Tran TD, Hermansky H (2010) Sparse coding for speech recognition. The IEEE International Conference on Acoustics, Speech, and Signal processing, ICASSP’10, March Dallas, Texas, USA Garimella S, Nemala SK, Elhilali M, Tran TD, Hermansky H (2010) Sparse coding for speech recognition. The IEEE International Conference on Acoustics, Speech, and Signal processing, ICASSP’10, March Dallas, Texas, USA
5.
Zurück zum Zitat Zhao SY, Morgan N (2008) Multi-stream spectro-temporal features for robust speech recognition. Proc. Interspeech 898–901 Zhao SY, Morgan N (2008) Multi-stream spectro-temporal features for robust speech recognition. Proc. Interspeech 898–901
6.
Zurück zum Zitat Mesgarani N, Sivaram GSVS, Nemala SK, Elhilali M, Hermansky H (2009) Discriminant spectrotemporal features for phoneme recognition. 10th annual conference of the international speech communication association (INTERSPEECH), Brighton Mesgarani N, Sivaram GSVS, Nemala SK, Elhilali M, Hermansky H (2009) Discriminant spectrotemporal features for phoneme recognition. 10th annual conference of the international speech communication association (INTERSPEECH), Brighton
7.
Zurück zum Zitat Zamalloa M, Bordel G, Rodriguez LJ, Penagarikano M. (2008) Feature selection based on genetic algorithms for speaker recognition, pp 1153–1154 Zamalloa M, Bordel G, Rodriguez LJ, Penagarikano M. (2008) Feature selection based on genetic algorithms for speaker recognition, pp 1153–1154
8.
Zurück zum Zitat Beritelli F, Casaie S, Russo A, Serrano S (2005) A genetic algorithm feature selection approach to robust classification between “positive” and “negative” emotional states in speakers. Signals Syst Comput 550–553 Beritelli F, Casaie S, Russo A, Serrano S (2005) A genetic algorithm feature selection approach to robust classification between “positive” and “negative” emotional states in speakers. Signals Syst Comput 550–553
9.
Zurück zum Zitat Holland JH (1986) Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: Michal-ski Rs, carbonell JG, Mitchell TM (eds) Machine learning—an artificial intelligence approach, vol 2, pp 593–624 Holland JH (1986) Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: Michal-ski Rs, carbonell JG, Mitchell TM (eds) Machine learning—an artificial intelligence approach, vol 2, pp 593–624
10.
Zurück zum Zitat Chetouani M, Gas B, Zarader J-L, Chavy C (2002) Neural predictive coding for speech discriminant feature extraction. ESANN, pp 275–280 Chetouani M, Gas B, Zarader J-L, Chavy C (2002) Neural predictive coding for speech discriminant feature extraction. ESANN, pp 275–280
11.
Zurück zum Zitat Gas B, Zarader J-L, Chavy C, Chetouani M (2004) Discriminant neural predictive coding applied to phoneme recognition. Neurocomputing 141–166 Gas B, Zarader J-L, Chavy C, Chetouani M (2004) Discriminant neural predictive coding applied to phoneme recognition. Neurocomputing 141–166
12.
Zurück zum Zitat Tishby N (1990) A dynamical system approach to speech processing. In: Proceedings of international conference on signal and speech processing, vol 1. Albuquerque, NM, USA, pp 365–368 Tishby N (1990) A dynamical system approach to speech processing. In: Proceedings of international conference on signal and speech processing, vol 1. Albuquerque, NM, USA, pp 365–368
13.
Zurück zum Zitat Waibel A, Hanazawa T, Hinton G, Shikano K, Lang K (1989) Phoneme recognition using time-delay neural networks. IEEE Trans ASSP 37:328–339CrossRef Waibel A, Hanazawa T, Hinton G, Shikano K, Lang K (1989) Phoneme recognition using time-delay neural networks. IEEE Trans ASSP 37:328–339CrossRef
14.
Zurück zum Zitat Atal Bs, Schroeder MR (1968) Predictive coding of speech signals. Report of the 6th international congress on acoustics. Tokyo, Japan Atal Bs, Schroeder MR (1968) Predictive coding of speech signals. Report of the 6th international congress on acoustics. Tokyo, Japan
15.
Zurück zum Zitat Lapedes A, Farber R (1987) Nonlinear signal processing using neural networks: prediction and system modelling. Internal Report, Los Alamos National Laboratory Lapedes A, Farber R (1987) Nonlinear signal processing using neural networks: prediction and system modelling. Internal Report, Los Alamos National Laboratory
16.
Zurück zum Zitat Gas B, Zarader JL, Chavy C, Chetouani M (2001) Discriminant features extraction by predictive neural networks. In: WSES international conference in signal speech and image processing (SSIP01). Advances in signal processing and communications. Malta, pp 64–68 Gas B, Zarader JL, Chavy C, Chetouani M (2001) Discriminant features extraction by predictive neural networks. In: WSES international conference in signal speech and image processing (SSIP01). Advances in signal processing and communications. Malta, pp 64–68
17.
Zurück zum Zitat Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL (2004) A new nonlinear feature extraction algorithm for speaker verification. In: International conference on spoken language processing (ICSLP 04). Jeju Island, Korea Chetouani M, Faundez-Zanuy M, Gas B, Zarader JL (2004) A new nonlinear feature extraction algorithm for speaker verification. In: International conference on spoken language processing (ICSLP 04). Jeju Island, Korea
18.
Zurück zum Zitat Andrés Berzala J, Zufiria PJ (2007) Dynamic behavior of DCT and DDT formulations for the Sanger neural network. Neurocomputing 70:2768–2774CrossRef Andrés Berzala J, Zufiria PJ (2007) Dynamic behavior of DCT and DDT formulations for the Sanger neural network. Neurocomputing 70:2768–2774CrossRef
19.
Zurück zum Zitat Yousefi Azar M, Razzazi F (2008) A DCT based nonlinear predictive coding for feature extraction in speech recognition systems. Istanbul-Turkey, CIMSA 2008. IEEE international conference on computational intelligence for measurement systems and applications, pp 19–22 Yousefi Azar M, Razzazi F (2008) A DCT based nonlinear predictive coding for feature extraction in speech recognition systems. Istanbul-Turkey, CIMSA 2008. IEEE international conference on computational intelligence for measurement systems and applications, pp 19–22
20.
Zurück zum Zitat Sunitha SL, Udayashankara V (2006) Fast recursive DCT-LMS speech enhancement For performance enhancement of digital hearing aid. Academic Open Internet J 18 Sunitha SL, Udayashankara V (2006) Fast recursive DCT-LMS speech enhancement For performance enhancement of digital hearing aid. Academic Open Internet J 18
21.
Zurück zum Zitat Gas B, Zarader JL, Chavy C (2001) A new approach to speech coding: the neural predictive coding. J Adv Comput Intell 4:120–127 Gas B, Zarader JL, Chavy C (2001) A new approach to speech coding: the neural predictive coding. J Adv Comput Intell 4:120–127
22.
Zurück zum Zitat Zhu X, Wyse L (2004) Sound texture modelling and time-frequency LPC. In: Proceedings of the 7th international conference on digital audio effects DAFX’04, Naples Zhu X, Wyse L (2004) Sound texture modelling and time-frequency LPC. In: Proceedings of the 7th international conference on digital audio effects DAFX’04, Naples
23.
Zurück zum Zitat Athineos M, Ellis D (2003) Sound texture modeling with linear prediction in both time and frequency domains. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing ICASSP’03, vol 5, pp 648–51 Athineos M, Ellis D (2003) Sound texture modeling with linear prediction in both time and frequency domains. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing ICASSP’03, vol 5, pp 648–51
24.
Zurück zum Zitat The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) (1990) Speech disc. 1-1.1/NTIS.PB91-505065 The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (TIMIT) (1990) Speech disc. 1-1.1/NTIS.PB91-505065
25.
Zurück zum Zitat Huang X, Acero A, Hon H (2001) Spoken language processing. A guide to theory, algorithm, and system development. Prentice Hall, Englewood Cliffs Huang X, Acero A, Hon H (2001) Spoken language processing. A guide to theory, algorithm, and system development. Prentice Hall, Englewood Cliffs
26.
Zurück zum Zitat Jain A (1989) Fundamentals of digital image processing. Prentice Hall, Englewood CliffsMATH Jain A (1989) Fundamentals of digital image processing. Prentice Hall, Englewood CliffsMATH
27.
Zurück zum Zitat Samir JS, Ahmad AM (2009) Neural networks based time-delay estimation using DCT coefficients. Am J Appl Sci 703–708 Samir JS, Ahmad AM (2009) Neural networks based time-delay estimation using DCT coefficients. Am J Appl Sci 703–708
Metadaten
Titel
A neural predictive coding feature extraction scheme in DCT domain for phoneme recognition
verfasst von
Mahmood Yousefi Azar
Farbod Razzazi
Publikationsdatum
01.04.2012
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 3/2012
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-010-0450-0

Weitere Artikel der Ausgabe 3/2012

Neural Computing and Applications 3/2012 Zur Ausgabe

Premium Partner