Skip to main content
Top
Published in: International Journal of Speech Technology 1/2019

30-11-2018

Continuous Tamil Speech Recognition technique under non stationary noisy environments

Authors: M. Kalamani, M. Krishnamoorthi, R. S. Valarmathi

Published in: International Journal of Speech Technology | Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the last few years, the need for Continuous Speech Recognition system in Tamil language has been increased widely. In this research work, efficient Continuous Tamil Speech Recognition (CTSR) technique is proposed under non stationary noisy environments. This research work consists of two stages such as speech enhancement and modelling phase. In this, the modified Modulation Magnitude Estimation based Spectral Subtraction with Chi-Square Distribution based Noise Estimation (SS–NE) algorithm is proposed to enhance the noisy Tamil speech signal under various non-stationary noise environments. In order to extract the speech segments from the continuous speech, further the enhanced speech signal is segmented through the combination of short-time signal energy and spectral centroid features of the signal. In this work, 26 mel frequency cepstral coefficients per frame are found as optimal values and they are considered as acoustic feature vectors for each frame. In this research work, the Fuzzy C-Means (FCM) clustering is used in order to cluster the extracted feature vectors into discrete symbols. From the evaluation results, it is found that the optimal number of clusters ‘C’ as 5. Finally, Tamil speech from various speakers is recognized using Expectation Maximization Gaussian Mixture Model (EM-GMM) with 16 component densities under continuous measurements of labelled features from FCM clustering techniques in order to reduce the word error rate. From the simulated results, it is observed that the proposed FCM with EM-GMM model for CTSR improves the recognition accuracy from 1.2 to 4.4% when compared to the existing algorithms under different noisy environments by reducing the WER from 1.6 to 5.47%.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Al-Alaoui, M. A., Al-Kanj, L., Azar, J., & Yaacoub, E. (2008). Speech recognition using artificial neural networks and hidden Markov models. IEEE Multidisciplinary Engineering Education Magazine, 3(3), 77–86. Al-Alaoui, M. A., Al-Kanj, L., Azar, J., & Yaacoub, E. (2008). Speech recognition using artificial neural networks and hidden Markov models. IEEE Multidisciplinary Engineering Education Magazine, 3(3), 77–86.
go back to reference Atlas, L., Li, Q., & Thompson, J. (2004). Homomorphic modulation spectra. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. ii761–ii764. Atlas, L., Li, Q., & Thompson, J. (2004). Homomorphic modulation spectra. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. ii761–ii764.
go back to reference Benesty, J., & Huang, Y. (2003). Adaptive signal processing: Applications to real-world problems. Berlin: Springer.MATHCrossRef Benesty, J., & Huang, Y. (2003). Adaptive signal processing: Applications to real-world problems. Berlin: Springer.MATHCrossRef
go back to reference Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120.CrossRef Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2), 113–120.CrossRef
go back to reference Cappé, O. (1994). Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Transactions on Speech and Audio Processing, 2(2), 345–349.CrossRef Cappé, O. (1994). Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Transactions on Speech and Audio Processing, 2(2), 345–349.CrossRef
go back to reference Chattopadhyay, S., Pratihar, D. K., & Sarkar, S. C. D. (2011). A comparative study of fuzzy C-means algorithm and entropy-based fuzzy clustering algorithms. Computing and Informatics, 30(4), 701–720.MATH Chattopadhyay, S., Pratihar, D. K., & Sarkar, S. C. D. (2011). A comparative study of fuzzy C-means algorithm and entropy-based fuzzy clustering algorithms. Computing and Informatics, 30(4), 701–720.MATH
go back to reference Chi, H. F., Gao, S. X., Soli, S. D., & Alwan, A. (2003). Band-limited feedback cancellation with a modified filtered-X LMS algorithm for hearing aids. Speech Communication, 39(1), 147–161.MATHCrossRef Chi, H. F., Gao, S. X., Soli, S. D., & Alwan, A. (2003). Band-limited feedback cancellation with a modified filtered-X LMS algorithm for hearing aids. Speech Communication, 39(1), 147–161.MATHCrossRef
go back to reference Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech Audio Processing, 11(5), 466–475.CrossRef Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech Audio Processing, 11(5), 466–475.CrossRef
go back to reference Cohen, I. (2004). Speech enhancement using a non-causal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.CrossRef Cohen, I. (2004). Speech enhancement using a non-causal a priori SNR estimator. IEEE Signal Processing Letters, 11(9), 725–728.CrossRef
go back to reference Cohen, I. (2005). Speech enhancement using super Gaussian speech models and non causal a priori SNR estimation. Speech Communication, 47(3), 336–350.MathSciNetCrossRef Cohen, I. (2005). Speech enhancement using super Gaussian speech models and non causal a priori SNR estimation. Speech Communication, 47(3), 336–350.MathSciNetCrossRef
go back to reference Cohen, I., & Berdugo, B. (2002). Noise Estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.CrossRef Cohen, I., & Berdugo, B. (2002). Noise Estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9(1), 12–15.CrossRef
go back to reference Cornelis, B., Moonen, M., & Wouters, J. (2011). Performance analysis of multichannel Wiener Filter-based noise reduction in hearing aids under second order statistics estimation errors. IEEE Transactions on Audio, Speech and Language Processing, 19(5), 1368–1381.CrossRef Cornelis, B., Moonen, M., & Wouters, J. (2011). Performance analysis of multichannel Wiener Filter-based noise reduction in hearing aids under second order statistics estimation errors. IEEE Transactions on Audio, Speech and Language Processing, 19(5), 1368–1381.CrossRef
go back to reference Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), 357–366.CrossRef Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4), 357–366.CrossRef
go back to reference Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(6), 1109–1121.CrossRef Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(6), 1109–1121.CrossRef
go back to reference Erkelens, J., Jensen, J., & Heusdens, R. (2007). A data driven approach to optimized spectral speech enhancement methods for various error criteria. Speech Communication, 49(7), 530–541.CrossRef Erkelens, J., Jensen, J., & Heusdens, R. (2007). A data driven approach to optimized spectral speech enhancement methods for various error criteria. Speech Communication, 49(7), 530–541.CrossRef
go back to reference Erkelens, J. S., & Heusdens, R. (2008). Tracking of non-stationary noise based on data-driven recursive noise power estimation. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1112–1123.CrossRef Erkelens, J. S., & Heusdens, R. (2008). Tracking of non-stationary noise based on data-driven recursive noise power estimation. IEEE Transactions on Audio, Speech, and Language Processing, 16(6), 1112–1123.CrossRef
go back to reference Gerkmann, T., & Hendriks, R. C. (2011). Noise power estimation based on the probability of speech presence. In Proceedings of the IEEE workshop on applications of signal processing to audio and acoustics, pp. 145–148. Gerkmann, T., & Hendriks, R. C. (2011). Noise power estimation based on the probability of speech presence. In Proceedings of the IEEE workshop on applications of signal processing to audio and acoustics, pp. 145–148.
go back to reference Ghanbari, Y., Karami, M., & Amelifard, B. (2004). Improved Multiband Spectral subtraction method for speech enhancement. In Proceedings of the sixth IASTED international conference on signal and image processing, pp. 225–230. Ghanbari, Y., Karami, M., & Amelifard, B. (2004). Improved Multiband Spectral subtraction method for speech enhancement. In Proceedings of the sixth IASTED international conference on signal and image processing, pp. 225–230.
go back to reference Haykin, S., & Widrow, B. (2003). Least-mean-square adaptive filters. New York: Wiley.CrossRef Haykin, S., & Widrow, B. (2003). Least-mean-square adaptive filters. New York: Wiley.CrossRef
go back to reference Hellgren, J. (2002). Analysis of feedback cancellation in hearing aids with filtered-X LMS and the direct method of closed loop identification. IEEE Transactions on Speech and Audio Processing, 10(2), 119–131.CrossRef Hellgren, J. (2002). Analysis of feedback cancellation in hearing aids with filtered-X LMS and the direct method of closed loop identification. IEEE Transactions on Speech and Audio Processing, 10(2), 119–131.CrossRef
go back to reference Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society of America, 87(4), 1738–1752.CrossRef Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis for speech. Journal of Acoustic Society of America, 87(4), 1738–1752.CrossRef
go back to reference Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.CrossRef Hermansky, H., & Morgan, N. (1994). RASTA processing of speech. IEEE Transactions on Speech and Audio Processing, 2(4), 578–589.CrossRef
go back to reference Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In Proceedings of the IEEE fourth international conference on signal processing and communication systems, pp. 1–5. Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In Proceedings of the IEEE fourth international conference on signal processing and communication systems, pp. 1–5.
go back to reference Huang, H. C., & Lee, J. (2012). A new variable step-size NLMS algorithm and its performance analysis. IEEE Transactions on Signal Processing, 60(4), 2055–2060.MathSciNetMATHCrossRef Huang, H. C., & Lee, J. (2012). A new variable step-size NLMS algorithm and its performance analysis. IEEE Transactions on Signal Processing, 60(4), 2055–2060.MathSciNetMATHCrossRef
go back to reference Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014a). Speech enhancement using modified magnitude estimation- based spectral subtraction algorithm. Arabian Journal for Sciences and Engineering, 39(32), 8965–8978.CrossRef Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014a). Speech enhancement using modified magnitude estimation- based spectral subtraction algorithm. Arabian Journal for Sciences and Engineering, 39(32), 8965–8978.CrossRef
go back to reference Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014b). Adaptive noise reduction, algorithm for speech enhancement. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(6), 987–994. Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014b). Adaptive noise reduction, algorithm for speech enhancement. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(6), 987–994.
go back to reference Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014c). Hybrid modeling algorithm for Continuous Tamil Speech Recognition. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(12), 1927–1934. Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2014c). Hybrid modeling algorithm for Continuous Tamil Speech Recognition. World Academy of Science, Engineering and Technology - International Journal of Computer, Information, Systems and Control Engineering, 8(12), 1927–1934.
go back to reference Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2015). Noise tracking algorithm for speech enhancement. Applied Mathematics and Information Sciences, 9(2), 691–698. Kalamani, M., Valarmathy, S., & Krishnamoorthi, M. (2015). Noise tracking algorithm for speech enhancement. Applied Mathematics and Information Sciences, 9(2), 691–698.
go back to reference Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, pp. 4164–4167. Kamath, S., & Loizou, P. (2002). A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, pp. 4164–4167.
go back to reference Kesarkar, M. P. (2003). Feature extraction for speech recognition. Technical Credit Seminar Report, Electronic Systems Group, IIT Bombay. Kesarkar, M. P. (2003). Feature extraction for speech recognition. Technical Credit Seminar Report, Electronic Systems Group, IIT Bombay.
go back to reference Li, X. G., Yao, M. F., & Huang, W. T. (2011). Speech recognition based on k-means clustering and neural network ensembles. In Proceedings of the IEEE seventh international conference on natural computation, Vol. 2, pp. 614–617. Li, X. G., Yao, M. F., & Huang, W. T. (2011). Speech recognition based on k-means clustering and neural network ensembles. In Proceedings of the IEEE seventh international conference on natural computation, Vol. 2, pp. 614–617.
go back to reference Loizou, P. C. (2005). Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum. IEEE Transactions on Speech and Audio Processing, 13(5), 857–869.CrossRef Loizou, P. C. (2005). Speech enhancement based on perceptually motivated Bayesian estimators of the magnitude spectrum. IEEE Transactions on Speech and Audio Processing, 13(5), 857–869.CrossRef
go back to reference Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech Audio Processing, 9(5), 504–512.CrossRef Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech Audio Processing, 9(5), 504–512.CrossRef
go back to reference Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors. IEEE Transactions on Speech and Audio Processing, 3(5), 845–856.CrossRef Martin, R. (2005). Speech enhancement based on minimum mean-square error estimation and super-Gaussian priors. IEEE Transactions on Speech and Audio Processing, 3(5), 845–856.CrossRef
go back to reference Mohammed, J. R., & Shafi, M. S. (2012). An efficient adaptive noise cancellation scheme using ALE and NLMS filters. International Journal of Electrical and Computer Engineering, 2(3), 325–332. Mohammed, J. R., & Shafi, M. S. (2012). An efficient adaptive noise cancellation scheme using ALE and NLMS filters. International Journal of Electrical and Computer Engineering, 2(3), 325–332.
go back to reference Paliwal, K., Schwerin, B., & Wójcicki, K. (2012). Speech enhancement using a minimum mean square error short time spectral modulation magnitude estimator. Speech Communication, 54(2), 282–305.CrossRef Paliwal, K., Schwerin, B., & Wójcicki, K. (2012). Speech enhancement using a minimum mean square error short time spectral modulation magnitude estimator. Speech Communication, 54(2), 282–305.CrossRef
go back to reference Paliwal, K., Wójcicki, K., & Schwerin, B. (2010). Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Communication, 52(5), 450–475.CrossRef Paliwal, K., Wójcicki, K., & Schwerin, B. (2010). Single-channel speech enhancement using spectral subtraction in the short-time modulation domain. Speech Communication, 52(5), 450–475.CrossRef
go back to reference Porter, J., & Boll, S. (1984). Optimal estimators for spectral restoration of noisy speech. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 9, pp. 53–56. Porter, J., & Boll, S. (1984). Optimal estimators for spectral restoration of noisy speech. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 9, pp. 53–56.
go back to reference Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRef
go back to reference Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall. Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice Hall.
go back to reference Rabiner, L. R. & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54(2), 297–315.CrossRef Rabiner, L. R. & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. The Bell System Technical Journal, 54(2), 297–315.CrossRef
go back to reference Rahman, M. M., & Bhuiyan, M. A. A. (2012). Continuous Bangla speech segmentation using short-term speech features extraction approaches. International Journal of Advanced Computer Science and Applications, 3(11), 131–138. Rahman, M. M., & Bhuiyan, M. A. A. (2012). Continuous Bangla speech segmentation using short-term speech features extraction approaches. International Journal of Advanced Computer Science and Applications, 3(11), 131–138.
go back to reference Rahman, M. Z. U., Shaik, R. A., & Reddy, D. V. (2009). Adaptive noise removal in the ECG using the block LMS algorithm. In Proceedings of the second IEEE international conference on adaptive science and technology, pp. 380–383. Rahman, M. Z. U., Shaik, R. A., & Reddy, D. V. (2009). Adaptive noise removal in the ECG using the block LMS algorithm. In Proceedings of the second IEEE international conference on adaptive science and technology, pp. 380–383.
go back to reference Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. MS thesis, University of Texas, Dallas. Rangachari, S. (2004). Noise estimation algorithms for highly non-stationary environments. MS thesis, University of Texas, Dallas.
go back to reference Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.CrossRef Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.CrossRef
go back to reference Rangachari, S., Loizou, P. C., & Hu, Y. (2004). A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, I-305–308. Rangachari, S., Loizou, P. C., & Hu, Y. (2004). A noise estimation algorithm with rapid adaptation for highly non-stationary environments. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, I-305–308.
go back to reference Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRef Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.CrossRef
go back to reference Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. 629–632. Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Vol. 2, pp. 629–632.
go back to reference Sunny, S., David, P. S., & Jacob, K. P. (2012). Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in Malayalam. In Proceedings of the IEEE international conference on advances in computing and communications, pp. 27–30. Sunny, S., David, P. S., & Jacob, K. P. (2012). Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in Malayalam. In Proceedings of the IEEE international conference on advances in computing and communications, pp. 27–30.
go back to reference Thangarajan, R., Natarajan, A. M., & Selvam, M. (2009). Syllable modeling in continuous speech recognition for Tamil language. International Journal of Speech Technology, 12(1), 47–57.CrossRef Thangarajan, R., Natarajan, A. M., & Selvam, M. (2009). Syllable modeling in continuous speech recognition for Tamil language. International Journal of Speech Technology, 12(1), 47–57.CrossRef
go back to reference Vyas, M. (2013). A Gaussian mixture model based speech recognition system using MATLAB. Signal & Image Processing: An International Journal (SIPIJ), 4, 109–118. Vyas, M. (2013). A Gaussian mixture model based speech recognition system using MATLAB. Signal & Image Processing: An International Journal (SIPIJ), 4, 109–118.
Metadata
Title
Continuous Tamil Speech Recognition technique under non stationary noisy environments
Authors
M. Kalamani
M. Krishnamoorthi
R. S. Valarmathi
Publication date
30-11-2018
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-09580-8

Other articles of this Issue 1/2019

International Journal of Speech Technology 1/2019 Go to the issue