Skip to main content
Erschienen in: International Journal of Speech Technology 3/2013

01.09.2013

Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables

verfasst von: R. P. Sharma, O. Farooq, I. Khan

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper proposes a new feature extraction technique using wavelet based sub-band parameters (WBSP) for classification of unaspirated Hindi stop consonants. The extracted acoustic parameters show marked deviation from the values reported for English and other languages, Hindi having distinguishing manner based features. Since acoustic parameters are difficult to be extracted automatically for speech recognition.
Mel Frequency Cepstral Coefficient (MFCC) based features are usually used. MFCC are based on short time Fourier transform (STFT) which assumes the speech signal to be stationary over a short period. This assumption is specifically violated in case of stop consonants.
In WBSP, from acoustic study, the features derived from CV syllables have different weighting factors with the middle segment having the maximum. The wavelet transform has been applied to splitting of signal into 8 sub-bands of different bandwidths and the variation of energy in different sub-bands is also taken into account. WBSP gives improved classification scores. The number of filters used (8) for feature extraction in WBSP is less compared to the number (24) used for MFCC. Its classification performance has been compared with four other techniques using linear classifier. Further, Principal components analysis (PCA) has also been applied to reduce dimensionality.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Athineos, M., & Ellis, D. P. (2003). Frequency-domain linear prediction for temporal features. In Proc. ASRU (pp. 261–266). Athineos, M., & Ellis, D. P. (2003). Frequency-domain linear prediction for temporal features. In Proc. ASRU (pp. 261–266).
Zurück zum Zitat Chandra, M. (2007). Speech classification using wavelet transform. Ph.D. thesis submitted to A.M.U., Aligarh, India. Chandra, M. (2007). Speech classification using wavelet transform. Ph.D. thesis submitted to A.M.U., Aligarh, India.
Zurück zum Zitat Chang, S., Kwon, Y., & Yang, S. (1998). Speech feature extracted from adaptive wavelet for speech classification. Electronics Letters, 34, 2211–2213. CrossRef Chang, S., Kwon, Y., & Yang, S. (1998). Speech feature extracted from adaptive wavelet for speech classification. Electronics Letters, 34, 2211–2213. CrossRef
Zurück zum Zitat Chen, S. H. (2002). A study on speech signal processing using wavelet transforms. Ph.D. dissertation submitted to National Cheng Kung, University Tinan, Taiwan, and Republic of China. Chen, S. H. (2002). A study on speech signal processing using wavelet transforms. Ph.D. dissertation submitted to National Cheng Kung, University Tinan, Taiwan, and Republic of China.
Zurück zum Zitat Duda, R. O., Hart, P. E., & Stork, G. (2001). Pattern classification (2nd ed.). New York: Wiley. MATH Duda, R. O., Hart, P. E., & Stork, G. (2001). Pattern classification (2nd ed.). New York: Wiley. MATH
Zurück zum Zitat Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech classification. IEEE Signal Processing Letters, 8(7), 196–198. CrossRef Farooq, O., & Datta, S. (2001). Mel filter-like admissible wavelet packet structure for speech classification. IEEE Signal Processing Letters, 8(7), 196–198. CrossRef
Zurück zum Zitat Farooq, O., & Datta, S. (2003). Phoneme recognition using wavelet based features. Journal of Information Sciences, 150(1–2), 5–15. CrossRef Farooq, O., & Datta, S. (2003). Phoneme recognition using wavelet based features. Journal of Information Sciences, 150(1–2), 5–15. CrossRef
Zurück zum Zitat Farooq, O., & Datta, S. (2007). Evaluation of a wavelet based ASR front-end. International Journal on Wavelets and Multiresolution Processing, 5(4), 641–654. CrossRef Farooq, O., & Datta, S. (2007). Evaluation of a wavelet based ASR front-end. International Journal on Wavelets and Multiresolution Processing, 5(4), 641–654. CrossRef
Zurück zum Zitat Fukunaga, K. (1990). Introduction to statistical pattern classification. San Diego: Academic Press. Fukunaga, K. (1990). Introduction to statistical pattern classification. San Diego: Academic Press.
Zurück zum Zitat Huber, R., Ramoser, H., Mayer, K., Penz, H., & Rubik, M. (2005). Classification of coins using an eigenspace approach. Pattern Classification Letters, 26(1), 61–75. CrossRef Huber, R., Ramoser, H., Mayer, K., Penz, H., & Rubik, M. (2005). Classification of coins using an eigenspace approach. Pattern Classification Letters, 26(1), 61–75. CrossRef
Zurück zum Zitat Jiang, H., Joo, M., & Gao, Y. (2003). Feature extraction using wavelet packets strategy. In Proceedings of the 42nd IEEE conference on decision and control, Maui, Hawaii, USA (pp. 4517–4520). Jiang, H., Joo, M., & Gao, Y. (2003). Feature extraction using wavelet packets strategy. In Proceedings of the 42nd IEEE conference on decision and control, Maui, Hawaii, USA (pp. 4517–4520).
Zurück zum Zitat Katz, M., Meier, H. G., Dolfing, H., & Klakow, D. (2002). Robustness of linear discriminant analysis in automatic speech classification. In Proc. international conference on pattern classification, Québec, Canada (Vol. 3, pp. 30371–30374). Katz, M., Meier, H. G., Dolfing, H., & Klakow, D. (2002). Robustness of linear discriminant analysis in automatic speech classification. In Proc. international conference on pattern classification, Québec, Canada (Vol. 3, pp. 30371–30374).
Zurück zum Zitat Krishnan, M., Neophytou, C. P., & Prescott, G. (1994). Wavelet transform speech classification using vector quantization, dynamic time warping and artificial neural networks. In International conference on spoken language process, Yokohama, Japan. Krishnan, M., Neophytou, C. P., & Prescott, G. (1994). Wavelet transform speech classification using vector quantization, dynamic time warping and artificial neural networks. In International conference on spoken language process, Yokohama, Japan.
Zurück zum Zitat Mallat, S. (1998). A wavelet tour of signal processing (2nd ed.). New York: Academic Press. MATH Mallat, S. (1998). A wavelet tour of signal processing (2nd ed.). New York: Academic Press. MATH
Zurück zum Zitat Partridge, M., & Calvo, R. (1997). Fast dimensionality reduction and simple PCA. Intelligent Data Analysis, 2(3), 292–298. Partridge, M., & Calvo, R. (1997). Fast dimensionality reduction and simple PCA. Intelligent Data Analysis, 2(3), 292–298.
Zurück zum Zitat Posadas, A. M., Vidal, , F. de Miguel, Alguacil, G., Pena, J., Ibanez, J. M., & Morales, J. (1993). Spatial-temporal analysis of a seismic series using the principal components method. Journal of Geophysical Research, 98(B2), 1923–1932. CrossRef Posadas, A. M., Vidal, , F. de Miguel, Alguacil, G., Pena, J., Ibanez, J. M., & Morales, J. (1993). Spatial-temporal analysis of a seismic series using the principal components method. Journal of Geophysical Research, 98(B2), 1923–1932. CrossRef
Zurück zum Zitat Rabiner, L. R., & Juang, B. H. (2003). Fundamental of speech classification (1st ed.). Delhi: Pearson Education. Rabiner, L. R., & Juang, B. H. (2003). Fundamental of speech classification (1st ed.). Delhi: Pearson Education.
Zurück zum Zitat Sekhar, C. C., & Yegnanarayana, B. (2002). A constraint satisfaction model for classification of stop consonant–vowel (SCV) utterances. IEEE Transactions on Speech and Audio Processing, 10(7), 472–480. CrossRef Sekhar, C. C., & Yegnanarayana, B. (2002). A constraint satisfaction model for classification of stop consonant–vowel (SCV) utterances. IEEE Transactions on Speech and Audio Processing, 10(7), 472–480. CrossRef
Zurück zum Zitat Sharma, R. P. (2008). Recognition of (Hindi) stop consonants. Unpublished Ph.D. thesis submitted to A.M.U., Aligarh, India. Sharma, R. P. (2008). Recognition of (Hindi) stop consonants. Unpublished Ph.D. thesis submitted to A.M.U., Aligarh, India.
Zurück zum Zitat Suchato, A. (2004). Classification of stop consonant place of articulation. Ph.D. dissertation submitted to Massachusetts Institute of Technology. Suchato, A. (2004). Classification of stop consonant place of articulation. Ph.D. dissertation submitted to Massachusetts Institute of Technology.
Zurück zum Zitat Tufekci, Z., & Gowdy, J. N. (2000). Feature extraction using discrete wavelet transform for speech classification. In IEEE, SoutheastCon, Nashville, Tennessee, USA (pp. 116–123). Tufekci, Z., & Gowdy, J. N. (2000). Feature extraction using discrete wavelet transform for speech classification. In IEEE, SoutheastCon, Nashville, Tennessee, USA (pp. 116–123).
Zurück zum Zitat Turk, M. A., & Pentland, A. P. (1991). Face classification using eigenfaces. In Proceedings of the computer vision and pattern classification (pp. 586–591). Turk, M. A., & Pentland, A. P. (1991). Face classification using eigenfaces. In Proceedings of the computer vision and pattern classification (pp. 586–591).
Zurück zum Zitat Van der Maaten, L. J. P. (2007). An introduction to dimensionality reduction using Matlab. MICC Report, Maastricht University. Van der Maaten, L. J. P. (2007). An introduction to dimensionality reduction using Matlab. MICC Report, Maastricht University.
Zurück zum Zitat Wang, K., Lee, K., & Juang, B. H. (1997). Selective feature extraction via signal decomposition. IEEE Signal Processing Letters, 4, 8–11. CrossRef Wang, K., Lee, K., & Juang, B. H. (1997). Selective feature extraction via signal decomposition. IEEE Signal Processing Letters, 4, 8–11. CrossRef
Zurück zum Zitat Xueying, Z., & Jing, B. (2006). The speech classification based on the bark wavelet and CZCPA features. In IEEE international conference on intelligent robots and systems, October 9–15, Beijing, China (pp. 318–321). Xueying, Z., & Jing, B. (2006). The speech classification based on the bark wavelet and CZCPA features. In IEEE international conference on intelligent robots and systems, October 9–15, Beijing, China (pp. 318–321).
Zurück zum Zitat Yoo, S., Boston, J. R., Durrant, J. D., Kovacyk, K., Karn, S., & Shaiman, S. E. J. (2005). Relative energy and intelligibility of transient speech components. Proceedings of IEEE ICASSP, 1, 69–72. Yoo, S., Boston, J. R., Durrant, J. D., Kovacyk, K., Karn, S., & Shaiman, S. E. J. (2005). Relative energy and intelligibility of transient speech components. Proceedings of IEEE ICASSP, 1, 69–72.
Metadaten
Titel
Wavelet based sub-band parameters for classification of unaspirated Hindi stop consonants in initial position of CV syllables
verfasst von
R. P. Sharma
O. Farooq
I. Khan
Publikationsdatum
01.09.2013
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2013
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9185-x

Weitere Artikel der Ausgabe 3/2013

International Journal of Speech Technology 3/2013 Zur Ausgabe

Neuer Inhalt