Skip to main content
Top
Published in: International Journal of Speech Technology 2/2012

01-06-2012

Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

Authors: A. Muthamizh Selvan, R. Rajesh

Published in: International Journal of Speech Technology | Issue 2/2012

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Gender (Male/Female) classification plays a primary vital role to develop a robust Automatic Tamil Speech Recognition (ASR) applications due to the diversity in the vocal tract of speakers. Various features including Formants (F1, F2, F3, F4), Zero Crossings, and Mel-Frequency Cepstral Coefficients (MFCCs) etc. have appeared in the literature especially for speech/signal classification/recognition. Recently Dalal et al. have proposed a feature called as Histogram of Oriented Gradients (HOG) for extracting feature from an image for efficient detection/classification of objects. We extend and apply the HOG for spectrogram of speech signal and hence called as Spectral Histogram of Oriented Gradients (SHOGs). The results of Tamil language male/female speaker classification using SHOGs features shows good improvement in the classification rate when compared to other features. The results of combination of various features with SHOGs are also promissing.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Al-Haddad, S. A. R., Samad, S. A., Hussain, A., & Ishak, K. A. (2008). Isolated Malay digit recognition using pattern recognition fusion of dynamic time warping and hidden Markov models. American Journal of Applied Sciences, 5(6), 714–720. CrossRef Al-Haddad, S. A. R., Samad, S. A., Hussain, A., & Ishak, K. A. (2008). Isolated Malay digit recognition using pattern recognition fusion of dynamic time warping and hidden Markov models. American Journal of Applied Sciences, 5(6), 714–720. CrossRef
go back to reference Anusuya, M. A., & Katti, S. K. (2009). Speech recognition by machine: a review. International Journal of Computer Science and Information Security, 6(3), 181–205. Anusuya, M. A., & Katti, S. K. (2009). Speech recognition by machine: a review. International Journal of Computer Science and Information Security, 6(3), 181–205.
go back to reference Boril, H., & Hansen, J. H. L. (2010). Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1379–1393. CrossRef Boril, H., & Hansen, J. H. L. (2010). Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1379–1393. CrossRef
go back to reference Cherif, M., Korba, A., Messadeg, D., Djemili, R., & Bourouba, H. (2008). Robust speech recognition using perceptual wavelet denoising and mel-frequency product spectrum cepstral coefficient features. Informatica, 32, 283–288. MATH Cherif, M., Korba, A., Messadeg, D., Djemili, R., & Bourouba, H. (2008). Robust speech recognition using perceptual wavelet denoising and mel-frequency product spectrum cepstral coefficient features. Informatica, 32, 283–288. MATH
go back to reference Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Conference on computer vision and pattern recognition (CVPR). Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Conference on computer vision and pattern recognition (CVPR).
go back to reference Dharanipragada, S., Yapanel, U. H., & Rao, B. D. (2007). Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 224–234. CrossRef Dharanipragada, S., Yapanel, U. H., & Rao, B. D. (2007). Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 224–234. CrossRef
go back to reference Frankel, J., & King, S. (2007). Speech recognition using linear dynamic models. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 246–256. CrossRef Frankel, J., & King, S. (2007). Speech recognition using linear dynamic models. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 246–256. CrossRef
go back to reference Gläser, C., Heckmann, M., Joublin, F., & Goerick, C. (2010). Combining auditory preprocessing and Bayesian estimation for robust formant tracking. IEEE Transactions on Audio, Speech, and Language Processing, 18(2), 224–236. CrossRef Gläser, C., Heckmann, M., Joublin, F., & Goerick, C. (2010). Combining auditory preprocessing and Bayesian estimation for robust formant tracking. IEEE Transactions on Audio, Speech, and Language Processing, 18(2), 224–236. CrossRef
go back to reference Jankowski, C. R. Jr., Hoang-Doan, H. V., & Lippmann, R. P. (1995). A comparison of signal processing front ends for automatic word recognition. IEEE Transactions on Speech and Audio Processing, 3(4), 286–293. CrossRef Jankowski, C. R. Jr., Hoang-Doan, H. V., & Lippmann, R. P. (1995). A comparison of signal processing front ends for automatic word recognition. IEEE Transactions on Speech and Audio Processing, 3(4), 286–293. CrossRef
go back to reference Jia, H.-X., & Zhang, Y.-J. (2007). Fast human detection by boosting histograms of oriented gradients. In Proc. IEEE fourth international conference on image and graphics (pp. 683–688). CrossRef Jia, H.-X., & Zhang, Y.-J. (2007). Fast human detection by boosting histograms of oriented gradients. In Proc. IEEE fourth international conference on image and graphics (pp. 683–688). CrossRef
go back to reference Kolossa, D., Fernandez Astudillo, R., Hoffmann, E., & Orglmeister, R. (2010). Independent component analysis and time-frequency masking for speech recognition in multitalker conditions. EURASIP Journal on Audio, Speech, and Music Processing, 2010, 651420, pp. 1–13. CrossRef Kolossa, D., Fernandez Astudillo, R., Hoffmann, E., & Orglmeister, R. (2010). Independent component analysis and time-frequency masking for speech recognition in multitalker conditions. EURASIP Journal on Audio, Speech, and Music Processing, 2010, 651420, pp. 1–13. CrossRef
go back to reference Lee, C.-H., Han, C.-C., & Chuang, C.-C. (2008). Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1541–1550. CrossRef Lee, C.-H., Han, C.-C., & Chuang, C.-C. (2008). Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1541–1550. CrossRef
go back to reference Levy, C., Linares, G., & Bonastre, J.-F. (2009). Compact acousticmodels for embedded speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2009, 806186, pp. 1–13. CrossRef Levy, C., Linares, G., & Bonastre, J.-F. (2009). Compact acousticmodels for embedded speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2009, 806186, pp. 1–13. CrossRef
go back to reference Maier, A., Haderlein, T., Stelzle, F., Noth, E., Nkenke, E., Rosanowski, F., Schutzenberger, A., & Schuster, M. (2010). Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP Journal on Audio, Speech, and Music Processing, 2010, 926951, pp. 1–7. CrossRef Maier, A., Haderlein, T., Stelzle, F., Noth, E., Nkenke, E., Rosanowski, F., Schutzenberger, A., & Schuster, M. (2010). Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP Journal on Audio, Speech, and Music Processing, 2010, 926951, pp. 1–7. CrossRef
go back to reference Morales, N., Torre Toledano, D., Hansen, J. H. L., & Garrido, J. (2009). Feature compensation techniques for ASR on band-limited speech. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 758–774. CrossRef Morales, N., Torre Toledano, D., Hansen, J. H. L., & Garrido, J. (2009). Feature compensation techniques for ASR on band-limited speech. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 758–774. CrossRef
go back to reference Morales-Cordovilla, J. A., Peinado, A. M., Sánchez, V., & González, J. A. (2011). Feature extraction based on pitch-synchronous averaging for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(3), 640–651. CrossRef Morales-Cordovilla, J. A., Peinado, A. M., Sánchez, V., & González, J. A. (2011). Feature extraction based on pitch-synchronous averaging for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(3), 640–651. CrossRef
go back to reference Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. Journal of Computing, 2(3), 138–143. Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. Journal of Computing, 2(3), 138–143.
go back to reference Muthamizh Selvan, A., & Rajesh, R. (2011). Word classification using neural network. In Proc. of international conference on advances in computing and communications (ACC 2011), Part III (pp. 497–502). Berlin: Springer. CCIS 192. Muthamizh Selvan, A., & Rajesh, R. (2011). Word classification using neural network. In Proc. of international conference on advances in computing and communications (ACC 2011), Part III (pp. 497–502). Berlin: Springer. CCIS 192.
go back to reference Panagiotakis, C., & Tziritas, G. (2005). A speech/music discriminator based on RMS and zero-crossings. IEEE Transactions on Multimedia, 7(1), 155–166. CrossRef Panagiotakis, C., & Tziritas, G. (2005). A speech/music discriminator based on RMS and zero-crossings. IEEE Transactions on Multimedia, 7(1), 155–166. CrossRef
go back to reference Park, H., Takiguchi, T., & Ariki, Y. (2009). Integrated phoneme subspace method for speech feature extraction. EURASIP Journal on Audio, Speech, and Music Processing, 2009, 690451, pp. 1–6. CrossRef Park, H., Takiguchi, T., & Ariki, Y. (2009). Integrated phoneme subspace method for speech feature extraction. EURASIP Journal on Audio, Speech, and Music Processing, 2009, 690451, pp. 1–6. CrossRef
go back to reference Pikrakis, A., Giannakopoulos, T., & Theodoridis, S. (2008). A speech/music discriminator of radio recordings based on dynamic programming and Bayesian networks. IEEE Transactions on Multimedia, 10(5), 846–857. CrossRef Pikrakis, A., Giannakopoulos, T., & Theodoridis, S. (2008). A speech/music discriminator of radio recordings based on dynamic programming and Bayesian networks. IEEE Transactions on Multimedia, 10(5), 846–857. CrossRef
go back to reference Rajesh, R., Rajeev, K., Gopakumar, V., Suchithra, K., & Lekhesh, V. P. (2011). On experimenting with pedestrian classification using neural network. In Proc. of 3rd international conference on electronics computer technology (ICECT) (pp. 107–111). CrossRef Rajesh, R., Rajeev, K., Gopakumar, V., Suchithra, K., & Lekhesh, V. P. (2011). On experimenting with pedestrian classification using neural network. In Proc. of 3rd international conference on electronics computer technology (ICECT) (pp. 107–111). CrossRef
go back to reference Scheirer, E., & Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator. International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP), 2, 1331–1334. Scheirer, E., & Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator. International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP), 2, 1331–1334.
go back to reference Tomasi, C., & Manduchi, R. (1997). Bilateral filtering for gray and color images. In Proc. IEEE int. conference on computer vision. Tomasi, C., & Manduchi, R. (1997). Bilateral filtering for gray and color images. In Proc. IEEE int. conference on computer vision.
go back to reference Wang, N., Ching, P. C., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 196–205. CrossRef Wang, N., Ching, P. C., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 196–205. CrossRef
go back to reference Yin, H., Nadeu, C., & Hohmann, V. (2009). Pitch and formant based order adaptation of the fractional Fourier transformand its application to speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2009, 304579, pp. 1–14. CrossRef Yin, H., Nadeu, C., & Hohmann, V. (2009). Pitch and formant based order adaptation of the fractional Fourier transformand its application to speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2009, 304579, pp. 1–14. CrossRef
go back to reference Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., & Acero, A. (2008). A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition. In Proc. int. conference on acoustics, speech and signal processing (ICASSP) (pp. 4041–4044). Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., & Acero, A. (2008). A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition. In Proc. int. conference on acoustics, speech and signal processing (ICASSP) (pp. 4041–4044).
go back to reference Zhang, T., & Jay Kuo, C. C. (2001). Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9(4), 441–457. CrossRef Zhang, T., & Jay Kuo, C. C. (2001). Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9(4), 441–457. CrossRef
Metadata
Title
Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification
Authors
A. Muthamizh Selvan
R. Rajesh
Publication date
01-06-2012
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2012
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-012-9138-4

Other articles of this Issue 2/2012

International Journal of Speech Technology 2/2012 Go to the issue