nach oben

International Journal of Speech Technology

Erschienen in:

01.06.2015

Source and system features for phone recognition

verfasst von: K. E. Manjunath, K. Sreenivasa Rao

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this work, we have explored excitation source features in addition to vocal tract system features to improve the performance of phone recognition systems (PRSs). The excitation source information is derived by processing linear prediction residual of the speech signal. The vocal tract information is captured using Mel-frequency cepstral coefficient features. The PRSs are developed using hidden Markov models. The robustness of proposed excitation source features is demonstrated using white and babble noisy speech samples. In this work, TIMIT and Bengali speech databases are used for developing PRSs. The tandem PRSs are developed using the phone posteriors obtained from feedforward neural networks. From the results, it is observed that the tandem PRSs developed using the combination of excitation source and vocal tract system features, outperform the conventional tandem systems developed using system features alone. It is also observed that the PRSs developed using the combination of excitation source and vocal tract features, are more robust to noise than the PRSs developed using vocal tract features alone.

Vorheriger Artikel Nature-inspired feature subset selection application to arabic speaker recognition system

Nächster Artikel Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Bourlard, H. A., & Morgan, N. (1994). Connnectionist speech recognition: A hybrid approach. Boston: Kluwer Academic Publishers.CrossRef

Chengalvarayan, R. (1998). On the use of normalized LPC Error towards better large vocabulary speech recognition systems. In IEEE international conference on acoustics, speech and signal processing.

Chetouani, M., Faundez-Zanuy, M., Gas, B., & Zarader, J. L. (2009). Investigation on LP-residual representations for speaker identification. Pattern Recognition, 42, 487–494.CrossRefMATH

Csapo, T. G. (2012). Increasing the naturalness of synthesizes speech. http://speechlab.tmit.bme.hu/csapo/downloads/Csapo-phonetician2012-paper.pdf.

Csapo, T. G., & Nemeth, G. (2012). A novel codebook-based excitation model for use in speech synthesis. In International conference on cognitive infocommunications.

Dhananjaya, N., Yegnanarayana, B., & Suryakanth, V. G. (2011). Acoustic-phonetic information from excitation source for refining manner hypotheses of a phone recognizer. In IEEE international conference on acoustics, speech and signal processing (ICASSP).

Fallside, F., Lucke, H., Marsland, T.P., O’Shea, P.J., Owen, M.S.J., Prager, R.W., Robinson, A.J., & Russell, N.H. (1990). Continuous speech recognition for the TIMIT database using neural networks. In ICASSP-90.

Fant, G. (1979). Glottal source and excitation analysis. STL-QPSR, 20, 085–107.

Graves, Alex, Mohamed, Abdel-rahman, & Hinton, Geoffrey (2013). Speech recognition with deep recurrent neural networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP).

Hayakawa, S., Takeda, K., & Itakura, F. (1997). Speaker identification using harmonic structure of LP-residual spectrum. Biometric personal Aunthentification, Lecture notes, 1206, 253–260.

He, Jialong, Liu, Li, & Palm, G. (1996). On the use of residual cepstrum in speech recognition. In IEEE international conference on acoustics, speech, and signal processing (ICASSP).

Hermansky, H., Ellis, D. P. W., & Sharma, S. (2000). Tandem connectionist feature extraction for conventional HMM systems. In IEEE international conference on acoustics, speech and signal processing (ICASSP).

Hinton, G., Deng, Li, Yu, D., Dahl, G. E., Mohamed, A., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29, 82–97.CrossRef

Ketabdar, H., & Bourlard, H. (2008). Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation. In IEEE international conference on acoustics, speech and signal processing (ICASSP).

Lee, K.-F., & Hon, H.-W. (1989). Speaker-independent phone recognition using hidden markov models. IEEE Transactions on Acoustics, Speech and Signal Processing, 37, 1641–1648.CrossRef

Linguistic Data Consortium (1993). TIMIT Acoustic-Phonetic Continuous Speech Corpus. (1993). Available: http://catalog.ldc.upenn.edu/LDC93S1.

Mahadeva Prasanna, S. R., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.CrossRef

Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63, 561–580.CrossRef

Manjunath, K.E., & Sreenivasa Rao, K. (2014). Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In NCC-2014.

Manjunath, K. E., Sreenivasa Rao, K., & Pati, D. (2013). Development of Phonetic Engine for Indian languages: Bengali and Oriya. In 16th international oriental COCOSDA.

Manjunath, K.E., Sunil Kumar, S. B., Pati, D., Satapathy, B., & Sreenivasa Rao, K. (2013). Development of consonant-vowel recognition systems for Indian languages : Bengali and Odia. In INDICON-2013.

Mohamed, A., Dahl, G. E., & Hinton, G. (2012). Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing, 20, 14–22.CrossRef

Pati, D., & Mahadeva Prasanna, S. R. (2008). Non-Parametric Vector Quantization of Excitation Source Information for Speaker Recognition. In IEEE region 10 conference TENCON.

Pati, D., & Mahadeva Prasanna, S. R. (2012). Speaker verification using excitation source information. The International Journal of Speech Technology (Springer), 15, 241–257.CrossRef

Pati, D., & Mahadeva Prasanna, S. R. (2013). A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information. Sadhana, 38, 591–620.CrossRefMathSciNet

Rabiner, L., Juang, B.-H., & Yagnanarayana, B. (2008). Fundamentals of speech recognition. Singapore: Pearson Education.

Speech Group at the International Computer Science Ins. (2010) QuickNet Software and Documentation. [Online]. Available: http://www1.icsi.berkeley.edu/Speech.

Sri Rama Murty, K., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13, 52–55.CrossRef

Stevens, K. N. (1998). Acoustic phonetics. Cambridge, MA: MIT Press.

Sunil Kumar, S. B., Sreenivasa Rao, K., & Pati, D. (2013). Phonetic and prosodically rich transcribed speech corpus in indian languages : Bengali and Odia. In 16th international oriental COCOSDA.

The Hidden Markov Model Toolkit and HTK book. (2013). Available: http://htk.eng.cam.ac.uk.

The International Phonetic Association. (2005). International phonetic alphabet. Available: http://www.langsci.ucl.ac.uk/ipa/index.html.

Titze, I. R. (2008). Nonlinear sourcefilter coupling in phonation: Theory. Journal of the Acoustical Society of America, 123(5), 2733–2749.CrossRef

Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.

Vaseghi, S. (2008). Speech processing. Available: http://dea.brunel.ac.uk/cmsp/Home_Saeed_Vaseghi/Chapter13-SpeechProcessing.

Yegnanarayana, B., Mahadeva Prasanna, S. R., Duraiswami, R., & Zotkin, D. (2005). Processing of reverberant speech for time-delay estimation. IEEE Transactions on Audio, Speech, and Language Processing, 13, 1110–1118.CrossRef

Titel: Source and system features for phone recognition
verfasst von: K. E. Manjunath
K. Sreenivasa Rao
Publikationsdatum: 01.06.2015
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 2/2015
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-014-9266-0

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2015

Images compression techniques for wireless sensor network applications

Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN

A wavelet based method for removal of highly non-stationary noises from single-channel hindi speech patterns of low input SNR

Modeling user response timings in spoken dialog systems

Dictionary design in subspace model for speaker identification

Nature-inspired feature subset selection application to arabic speaker recognition system

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.