Top

International Journal of Speech Technology

Published in:

01-06-2012

Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

Authors: A. Muthamizh Selvan, R. Rajesh

Published in: International Journal of Speech Technology | Issue 2/2012

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Gender (Male/Female) classification plays a primary vital role to develop a robust Automatic Tamil Speech Recognition (ASR) applications due to the diversity in the vocal tract of speakers. Various features including Formants (F1, F2, F3, F4), Zero Crossings, and Mel-Frequency Cepstral Coefficients (MFCCs) etc. have appeared in the literature especially for speech/signal classification/recognition. Recently Dalal et al. have proposed a feature called as Histogram of Oriented Gradients (HOG) for extracting feature from an image for efficient detection/classification of objects. We extend and apply the HOG for spectrogram of speech signal and hence called as Spectral Histogram of Oriented Gradients (SHOGs). The results of Tamil language male/female speaker classification using SHOGs features shows good improvement in the classification rate when compared to other features. The results of combination of various features with SHOGs are also promissing.

previous article Speaker verification using excitation source information

next article Emotion recognition from speech using source, system, and prosodic features

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Al-Haddad, S. A. R., Samad, S. A., Hussain, A., & Ishak, K. A. (2008). Isolated Malay digit recognition using pattern recognition fusion of dynamic time warping and hidden Markov models. American Journal of Applied Sciences, 5(6), 714–720. CrossRef

Anusuya, M. A., & Katti, S. K. (2009). Speech recognition by machine: a review. International Journal of Computer Science and Information Security, 6(3), 181–205.

Boril, H., & Hansen, J. H. L. (2010). Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1379–1393. CrossRef

Cherif, M., Korba, A., Messadeg, D., Djemili, R., & Bourouba, H. (2008). Robust speech recognition using perceptual wavelet denoising and mel-frequency product spectrum cepstral coefficient features. Informatica, 32, 283–288. MATH

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Conference on computer vision and pattern recognition (CVPR).

Dharanipragada, S., Yapanel, U. H., & Rao, B. D. (2007). Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 224–234. CrossRef

Frankel, J., & King, S. (2007). Speech recognition using linear dynamic models. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 246–256. CrossRef

Gläser, C., Heckmann, M., Joublin, F., & Goerick, C. (2010). Combining auditory preprocessing and Bayesian estimation for robust formant tracking. IEEE Transactions on Audio, Speech, and Language Processing, 18(2), 224–236. CrossRef

Jankowski, C. R. Jr., Hoang-Doan, H. V., & Lippmann, R. P. (1995). A comparison of signal processing front ends for automatic word recognition. IEEE Transactions on Speech and Audio Processing, 3(4), 286–293. CrossRef

Jia, H.-X., & Zhang, Y.-J. (2007). Fast human detection by boosting histograms of oriented gradients. In Proc. IEEE fourth international conference on image and graphics (pp. 683–688). CrossRef

Kolossa, D., Fernandez Astudillo, R., Hoffmann, E., & Orglmeister, R. (2010). Independent component analysis and time-frequency masking for speech recognition in multitalker conditions. EURASIP Journal on Audio, Speech, and Music Processing, 2010, 651420, pp. 1–13. CrossRef

Lee, C.-H., Han, C.-C., & Chuang, C.-C. (2008). Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients. IEEE Transactions on Audio, Speech, and Language Processing, 16(8), 1541–1550. CrossRef

Levy, C., Linares, G., & Bonastre, J.-F. (2009). Compact acousticmodels for embedded speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2009, 806186, pp. 1–13. CrossRef

Maier, A., Haderlein, T., Stelzle, F., Noth, E., Nkenke, E., Rosanowski, F., Schutzenberger, A., & Schuster, M. (2010). Automatic speech recognition systems for the evaluation of voice and speech disorders in head and neck cancer. EURASIP Journal on Audio, Speech, and Music Processing, 2010, 926951, pp. 1–7. CrossRef

Morales, N., Torre Toledano, D., Hansen, J. H. L., & Garrido, J. (2009). Feature compensation techniques for ASR on band-limited speech. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 758–774. CrossRef

Morales-Cordovilla, J. A., Peinado, A. M., Sánchez, V., & González, J. A. (2011). Feature extraction based on pitch-synchronous averaging for robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 19(3), 640–651. CrossRef

Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. Journal of Computing, 2(3), 138–143.

Muthamizh Selvan, A., & Rajesh, R. (2011). Word classification using neural network. In Proc. of international conference on advances in computing and communications (ACC 2011), Part III (pp. 497–502). Berlin: Springer. CCIS 192.

Panagiotakis, C., & Tziritas, G. (2005). A speech/music discriminator based on RMS and zero-crossings. IEEE Transactions on Multimedia, 7(1), 155–166. CrossRef

Park, H., Takiguchi, T., & Ariki, Y. (2009). Integrated phoneme subspace method for speech feature extraction. EURASIP Journal on Audio, Speech, and Music Processing, 2009, 690451, pp. 1–6. CrossRef

Pikrakis, A., Giannakopoulos, T., & Theodoridis, S. (2008). A speech/music discriminator of radio recordings based on dynamic programming and Bayesian networks. IEEE Transactions on Multimedia, 10(5), 846–857. CrossRef

Rajesh, R., Rajeev, K., Gopakumar, V., Suchithra, K., & Lekhesh, V. P. (2011). On experimenting with pedestrian classification using neural network. In Proc. of 3rd international conference on electronics computer technology (ICECT) (pp. 107–111). CrossRef

Scheirer, E., & Slaney, M. (1997). Construction and evaluation of a robust multifeature speech/music discriminator. International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP), 2, 1331–1334.

Tomasi, C., & Manduchi, R. (1997). Bilateral filtering for gray and color images. In Proc. IEEE int. conference on computer vision.

Wang, N., Ching, P. C., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 196–205. CrossRef

Yin, H., Nadeu, C., & Hohmann, V. (2009). Pitch and formant based order adaptation of the fractional Fourier transformand its application to speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2009, 304579, pp. 1–14. CrossRef

Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., & Acero, A. (2008). A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition. In Proc. int. conference on acoustics, speech and signal processing (ICASSP) (pp. 4041–4044).

Zhang, T., & Jay Kuo, C. C. (2001). Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9(4), 441–457. CrossRef

Title: Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification
Authors: A. Muthamizh Selvan
R. Rajesh
Publication date: 01-06-2012
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 2/2012
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-012-9138-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2012

Emotion recognition from speech: a review

Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

A new approach to acoustic analysis of two British regional accents—Birmingham and Liverpool accents

A pertinent learning machine input feature for speaker discrimination by voice

Integration of multiple acoustic and language models for improved Hindi speech recognition system