Top

International Journal of Speech Technology

Published in:

01-06-2014

Methods for applying VAD in Kazakh speech recognition systems

Authors: Maxat N. Kalimoldayev, Keylan Alimhan, Orken J. Mamyrbayev

Published in: International Journal of Speech Technology | Issue 2/2014

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

This article considers the algorithm “Voice activity detection” and the using VAD algorithm in the system of Kazakh speech recognition. The paper presents a mathematical model VAD and methods for detecting voice data: pauses between sentences, words, individual sounds. VAD algorithm is adapted to the recognition of Kazakh speech counting the basic properties of Kazakh language. Voice activity detection researches in Kazakh speech are being conducted for the first time. The results of the spectral analysis are displayed on the picture.

previous article Recent developments in spoken term detection: a survey

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Dorokhin, O. A., & Starushko, D. G. (2003). Speech signal segmentation. Artificial Intellect, 3, 450–478.

Shelepov, V. J., & Nitsenko, A. V. (2003). Amplitude segmentation of speech signal using filtration and known phonetic composition. Artificial Intellect, 6, 120–123.

Lamel, L. F., Rabiner, L. R., Rosenberg, A. E., & Wilpon, J. G. (1981). An improved endpoint detector for isolated word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, Assp-29(4), 777–785. CrossRef

Rabiner, L. R., & Juang, B.-H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.

Tucker, R. (1992). Voice activity detection using a periodicity measure. IEE Proceedings Communications Speech and Vision, 139(4), 377–380. CrossRef

Nemer, E., Goubran, R., & Mahmoud, S. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing, 9(3), 217–231. CrossRef

Deller, J. R., Hansen, H. L., & Proakis, J. G. (2008). Discrete-time processing of speech signals. New York: Wiley.

Nilsson, M., & Ejnarsson, M. (2002). Speech recognition using hidden Markov model. Department of Telecommunications and Speech Processing. Blekinge Institute of Technology, Blekinge.

Aida-Zade, K. R., Ardil, C., & Rustamov, S. S. (2006). Investigation of combined use of MFCC and LPC features in speech recognition systems. In Proc. of world academy of science, engineering and technology 13 (pp. 275–276).

Rabiner, L. R., & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54(3), 298–315.

Rabiner, L. R., & Schafer, R. V. (1978). Digital processing of speech signals. Englewood Cliffs: Prentice-Hall. ISBN-13: 9780132136037.

Rabiner, L. R., & Schafer, R. V. (1981). Digital processing of speech signals. Radio and Communication (pp. 495–515).

Atal, B., & Rabiner, L. R. (1984). A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-24(197), 201–212.

Reddy, D. R. (1967). Computer recognition of connected speech. The Journal of the Acoustical Society of America, 42(2), 329–347. CrossRef

Schafer, R. W., & Rabiner, L. R. (1970). System for automatic formant analysis of voiced speech. The Journal of the Acoustical Society of America, 47(2), 634–648. CrossRef

Title: Methods for applying VAD in Kazakh speech recognition systems
Authors: Maxat N. Kalimoldayev
Keylan Alimhan
Orken J. Mamyrbayev
Publication date: 01-06-2014
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 2/2014
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-013-9220-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2014

Tone modelling in Ibibio speech synthesis

Recent developments in spoken term detection: a survey

An improved feature transformation method using mutual information

A semantic parsing approach for Bhutanese language of Dzongkha

A perceptually motivated stationary wavelet packet filterbank using improved spectral over-subtraction for enhancement of speech in various noise environments

GMM based language identification system using robust features