Skip to main content
Top
Published in: International Journal of Speech Technology 2/2014

01-06-2014

Methods for applying VAD in Kazakh speech recognition systems

Authors: Maxat N. Kalimoldayev, Keylan Alimhan, Orken J. Mamyrbayev

Published in: International Journal of Speech Technology | Issue 2/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This article considers the algorithm “Voice activity detection” and the using VAD algorithm in the system of Kazakh speech recognition. The paper presents a mathematical model VAD and methods for detecting voice data: pauses between sentences, words, individual sounds. VAD algorithm is adapted to the recognition of Kazakh speech counting the basic properties of Kazakh language. Voice activity detection researches in Kazakh speech are being conducted for the first time. The results of the spectral analysis are displayed on the picture.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Dorokhin, O. A., & Starushko, D. G. (2003). Speech signal segmentation. Artificial Intellect, 3, 450–478. Dorokhin, O. A., & Starushko, D. G. (2003). Speech signal segmentation. Artificial Intellect, 3, 450–478.
go back to reference Shelepov, V. J., & Nitsenko, A. V. (2003). Amplitude segmentation of speech signal using filtration and known phonetic composition. Artificial Intellect, 6, 120–123. Shelepov, V. J., & Nitsenko, A. V. (2003). Amplitude segmentation of speech signal using filtration and known phonetic composition. Artificial Intellect, 6, 120–123.
go back to reference Lamel, L. F., Rabiner, L. R., Rosenberg, A. E., & Wilpon, J. G. (1981). An improved endpoint detector for isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, Assp-29(4), 777–785. CrossRef Lamel, L. F., Rabiner, L. R., Rosenberg, A. E., & Wilpon, J. G. (1981). An improved endpoint detector for isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, Assp-29(4), 777–785. CrossRef
go back to reference Rabiner, L. R., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall. Rabiner, L. R., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
go back to reference Tucker, R. (1992). Voice activity detection using a periodicity measure. IEE Proceedings Communications Speech and Vision, 139(4), 377–380. CrossRef Tucker, R. (1992). Voice activity detection using a periodicity measure. IEE Proceedings Communications Speech and Vision, 139(4), 377–380. CrossRef
go back to reference Nemer, E., Goubran, R., & Mahmoud, S. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing, 9(3), 217–231. CrossRef Nemer, E., Goubran, R., & Mahmoud, S. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing, 9(3), 217–231. CrossRef
go back to reference Deller, J. R., Hansen, H. L., & Proakis, J. G. (2008). Discrete-time processing of speech signals. New York: Wiley. Deller, J. R., Hansen, H. L., & Proakis, J. G. (2008). Discrete-time processing of speech signals. New York: Wiley.
go back to reference Nilsson, M., & Ejnarsson, M. (2002). Speech recognition using hidden Markov model. Department of Telecommunications and Speech Processing. Blekinge Institute of Technology, Blekinge. Nilsson, M., & Ejnarsson, M. (2002). Speech recognition using hidden Markov model. Department of Telecommunications and Speech Processing. Blekinge Institute of Technology, Blekinge.
go back to reference Aida-Zade, K. R., Ardil, C., & Rustamov, S. S. (2006). Investigation of combined use of MFCC and LPC features in speech recognition systems. In Proc. of world academy of science, engineering and technology 13 (pp. 275–276). Aida-Zade, K. R., Ardil, C., & Rustamov, S. S. (2006). Investigation of combined use of MFCC and LPC features in speech recognition systems. In Proc. of world academy of science, engineering and technology 13 (pp. 275–276).
go back to reference Rabiner, L. R., & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54(3), 298–315. Rabiner, L. R., & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54(3), 298–315.
go back to reference Rabiner, L. R., & Schafer, R. V. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall. ISBN-13: 9780132136037. Rabiner, L. R., & Schafer, R. V. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall. ISBN-13: 9780132136037.
go back to reference Rabiner, L. R., & Schafer, R. V. (1981). Digital processing of speech signals. Radio and Communication (pp. 495–515). Rabiner, L. R., & Schafer, R. V. (1981). Digital processing of speech signals. Radio and Communication (pp. 495–515).
go back to reference Atal, B., & Rabiner, L. R. (1984). A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-24(197), 201–212. Atal, B., & Rabiner, L. R. (1984). A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-24(197), 201–212.
go back to reference Reddy, D. R. (1967). Computer recognition of connected speech. The Journal of the Acoustical Society of America, 42(2), 329–347. CrossRef Reddy, D. R. (1967). Computer recognition of connected speech. The Journal of the Acoustical Society of America, 42(2), 329–347. CrossRef
go back to reference Schafer, R. W., & Rabiner, L. R. (1970). System for automatic formant analysis of voiced speech. The Journal of the Acoustical Society of America, 47(2), 634–648. CrossRef Schafer, R. W., & Rabiner, L. R. (1970). System for automatic formant analysis of voiced speech. The Journal of the Acoustical Society of America, 47(2), 634–648. CrossRef
Metadata
Title
Methods for applying VAD in Kazakh speech recognition systems
Authors
Maxat N. Kalimoldayev
Keylan Alimhan
Orken J. Mamyrbayev
Publication date
01-06-2014
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2014
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-013-9220-6

Other articles of this Issue 2/2014

International Journal of Speech Technology 2/2014 Go to the issue