Skip to main content
Erschienen in: International Journal of Speech Technology 1/2013

01.03.2013

Phoneme recognition using zerocrossing interval distribution of speech patterns and ANN

verfasst von: R. K. Sunil Kumar, V. L. Lajish

Erschienen in: International Journal of Speech Technology | Ausgabe 1/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The speech signal is modeled using zerocrossing interval distribution of the signal in time domain. The distributions of these parameters are studied over five Malayalam (one of the most popular Indian language) vowels. We found that the distribution patterns are almost similar for repeated utterances of the same vowel and varies from vowel to vowel. These distribution patterns are used for recognizing the vowels using multilayer feed forward artificial neural network. After analyzing the distribution patterns and the vowel recognition results, we realize that the zerocrossing interval distribution parameters can be effectively used for the speech phone classification and recognition. The noise adaptness of this parameter is also studied by adding additive white Gaussian noise at different signal to noise ratio. The computational complexity of the proposed technique is also less compared to the conventional spectral techniques, which includes FFT and Cepstral methods, used in the parameterization of speech signal.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Arai, T., & Yoshida, Y. (1990). Study on zerocrossing of speech signals by means of analytic signal. Journal of Acoustical Society of Japan, 692, 242–246. Arai, T., & Yoshida, Y. (1990). Study on zerocrossing of speech signals by means of analytic signal. Journal of Acoustical Society of Japan, 692, 242–246.
Zurück zum Zitat Bui, N. C., Jmonbaron, J., & Michel, J. G. (1983). An integrated voice recognition system. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-31, 323–328. Bui, N. C., Jmonbaron, J., & Michel, J. G. (1983). An integrated voice recognition system. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-31, 323–328.
Zurück zum Zitat Chandrasekhar, C. (1996). Neural network models for recognition of stop consonant vowel(scv) segments in continuous speech. PhD thesis, Department of Computer Science and Engg, IIT, Madras, India. Chandrasekhar, C. (1996). Neural network models for recognition of stop consonant vowel(scv) segments in continuous speech. PhD thesis, Department of Computer Science and Engg, IIT, Madras, India.
Zurück zum Zitat Chandrasekhar, C., & Yegnanarayana, B. (1996). Recognition of stop–consonant–vowel (SCV) segments in continuous speech using neural network models. Journal of Institution of Electronics and Telecommunication Engineers, 42, 269–280. Chandrasekhar, C., & Yegnanarayana, B. (1996). Recognition of stop–consonant–vowel (SCV) segments in continuous speech using neural network models. Journal of Institution of Electronics and Telecommunication Engineers, 42, 269–280.
Zurück zum Zitat Cristi, R. (2007). Modern digital signal processing. Washington: Thomson Brooks/Cole. Cristi, R. (2007). Modern digital signal processing. Washington: Thomson Brooks/Cole.
Zurück zum Zitat Erdol, N., Castelluccia, C., & Zilouchian, A. (1993). Recovery of missing speech packet using the short time energy and zerocrossing measurements. IEEE Transactions on Audio and Electroacoustics, 1.1(3), 295–303. Erdol, N., Castelluccia, C., & Zilouchian, A. (1993). Recovery of missing speech packet using the short time energy and zerocrossing measurements. IEEE Transactions on Audio and Electroacoustics, 1.1(3), 295–303.
Zurück zum Zitat Hecht-Nielsen, R. (1990). Neurocomputing. Reading: Addison-Wesley. Hecht-Nielsen, R. (1990). Neurocomputing. Reading: Addison-Wesley.
Zurück zum Zitat Itakura, F. (1975). Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-23, 67–72. CrossRef Itakura, F. (1975). Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-23, 67–72. CrossRef
Zurück zum Zitat Kim, K.-S., & Hwang, H.-Y. (1991). A study on the speech recognition of Korean phonemes using recurrent neural network models. Transactions of the Korean Institute of Electrical Engineers, 40(8), 782–791. Kim, K.-S., & Hwang, H.-Y. (1991). A study on the speech recognition of Korean phonemes using recurrent neural network models. Transactions of the Korean Institute of Electrical Engineers, 40(8), 782–791.
Zurück zum Zitat Kim, D.-S., Jeong, J. H., Kim, J. W., & Lee, S.Y. (1996). Feature extraction based on zerocrossing with peak amplitudes for robust speech recognition in noisy environments. IEEE Transactions on Audio and Electroacoustics, AU 17, 61–64. Kim, D.-S., Jeong, J. H., Kim, J. W., & Lee, S.Y. (1996). Feature extraction based on zerocrossing with peak amplitudes for robust speech recognition in noisy environments. IEEE Transactions on Audio and Electroacoustics, AU 17, 61–64.
Zurück zum Zitat Kwok, H. L., Tai, T. C., & Fung, Y. M. (1983). Machine recognition of the Cantonese digits using band pass filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-31, 220–222. CrossRef Kwok, H. L., Tai, T. C., & Fung, Y. M. (1983). Machine recognition of the Cantonese digits using band pass filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-31, 220–222. CrossRef
Zurück zum Zitat Lau, Y.-K., & Chan, C.-K. (1985). Speech recognition based on zerocrossing rate and energy. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(1). Lau, Y.-K., & Chan, C.-K. (1985). Speech recognition based on zerocrossing rate and energy. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(1).
Zurück zum Zitat Licklider, J. C. R. (1950). Intelligibility of amplitude-dichotomized time quantized speech waves. The Journal of the Acoustical Society of America, 22(5), 820–823. CrossRef Licklider, J. C. R. (1950). Intelligibility of amplitude-dichotomized time quantized speech waves. The Journal of the Acoustical Society of America, 22(5), 820–823. CrossRef
Zurück zum Zitat Licklider, J. C. R., & Pollack, I. (1948). Effects of differentiation, integration and infinite pack clipping upon intelligibility of speech. The Journal of the Acoustical Society of America, 20, 42. CrossRef Licklider, J. C. R., & Pollack, I. (1948). Effects of differentiation, integration and infinite pack clipping upon intelligibility of speech. The Journal of the Acoustical Society of America, 20, 42. CrossRef
Zurück zum Zitat Niederjohn, R. J. (1975). A mathematical formulation and comparison of zerocrossing analysis techniques which have been applied to automatic speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-23(4), 373–380. CrossRef Niederjohn, R. J. (1975). A mathematical formulation and comparison of zerocrossing analysis techniques which have been applied to automatic speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-23(4), 373–380. CrossRef
Zurück zum Zitat Niederjohn, R. J., & Lahat, M. (1985). A zero-crossing consistency method for formant tracking of voiced speech in high noise levels. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2, 349–355. CrossRef Niederjohn, R. J., & Lahat, M. (1985). A zero-crossing consistency method for formant tracking of voiced speech in high noise levels. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2, 349–355. CrossRef
Zurück zum Zitat Niederjohn, R. J., Krutz, M. W., & Brown, B. M. (1987). An experimental investigation of perceptual effects of altering the zerocrossing of a speech signal. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-35(5), 618–625. CrossRef Niederjohn, R. J., Krutz, M. W., & Brown, B. M. (1987). An experimental investigation of perceptual effects of altering the zerocrossing of a speech signal. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-35(5), 618–625. CrossRef
Zurück zum Zitat Rabiner, L. R., & Juang, B.H. (1993). Fundamentals of speech recognition. New York: Prentice Hall. Rabiner, L. R., & Juang, B.H. (1993). Fundamentals of speech recognition. New York: Prentice Hall.
Zurück zum Zitat Sreenivas, T. V., & Niederjohn, R. J. (1992). Zerocrossing based spectral analysis and SVD spectral analysis for formant frequency estimation in noise. IEEE Transactions on Signal Processing, 40(2). Sreenivas, T. V., & Niederjohn, R. J. (1992). Zerocrossing based spectral analysis and SVD spectral analysis for formant frequency estimation in noise. IEEE Transactions on Signal Processing, 40(2).
Zurück zum Zitat Wasson, D. A., & Donaldson, R. W. (1975). Speech amplitude and zerocrossing for automatic identification of human speakers. IEEE Transactions on Acoustics, Speech, and Signal Processing, 390–392. Wasson, D. A., & Donaldson, R. W. (1975). Speech amplitude and zerocrossing for automatic identification of human speakers. IEEE Transactions on Acoustics, Speech, and Signal Processing, 390–392.
Zurück zum Zitat Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. (1988). Phoneme recognition: neural networks vs. hidden Markov models. IEEE Transactions on Neural Networks, 18(2), 107–110. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. (1988). Phoneme recognition: neural networks vs. hidden Markov models. IEEE Transactions on Neural Networks, 18(2), 107–110.
Metadaten
Titel
Phoneme recognition using zerocrossing interval distribution of speech patterns and ANN
verfasst von
R. K. Sunil Kumar
V. L. Lajish
Publikationsdatum
01.03.2013
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 1/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9169-x

Weitere Artikel der Ausgabe 1/2013

International Journal of Speech Technology 1/2013 Zur Ausgabe

Neuer Inhalt