nach oben

International Journal of Speech Technology

Erschienen in:

20.05.2016

Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition

verfasst von: Khalid M. O. Nahar, Mohammed Abu Shquier, Wasfi G. Al-Khatib, Husni Al-Muhtaseb, Moustafa Elshafei

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In attempt to increase the rate of Arabic phonemes recognition, we introduce a novel hybrid recognition algorithm. The algorithm is composed of the learning vector quantization (LVQ) and hidden Markov model (HMM). The hybrid algorithm used to recognizing Arabic phonemes in continuous open-vocabulary speech. A recorded Arabic corpus of different TV news for modern standard Arabic was used for training and testing purposes. We employ a data driven approach to generate the training feature vectors that embed the frame neighboring correlation information. Next, we generate the phonemes codebooks using the K-means splitting algorithm. Then, we trained the generated codebooks using the LVQ algorithm. We achieved a performance of 98.49 % during independent classification training and 90 % during dependent classification training. When using the trained LVQ codebooks in Arabic utterance transcription, the phoneme recognition rate was 72 % using LVQ only. We combined the LVQ codebooks with the single state HMM model using enhanced Viterbi algorithm which includes the phonemes bigrams. We achieved 89 % of Arabic phonemes recognition rate based on the hybrid LVQ/HMM algorithm.

Vorheriger Artikel Arabic speech synthesis and diacritic recognition

Nächster Artikel Simultaneous speech coding and de-noising in a dictionary based quantized CS framework

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Except for the first 3 frames and the last 3 frames in the feature matrix.

AbuZeina, D., & Al-Khatib, W. (2012). Within-word pronunciation variation modeling for Arabic ASRs: A direct data-driven approach. International Journal of Speech Technology, 15(2), 65–75.CrossRef

Ali, M., & Elshafei, M. (2009). Arabic phonetic dictionaries for speech recognition. Journal of Information Technology, 2(80), 67–80.CrossRef

Al-Manie, M., Alkanhal, M., & Al-Ghamdi, M. (2010). Arabic speech segmentation: Automatic verses manual method and zero crossing measurements. Indian Journal of Science and Technology, 3, 1134–1138.

Avdagic, Z., Nuhic, A., & Konjicija, S. (2007). Phoneme recognition as a member of predefined class using hybrid cascaded LVQ/elman neural network. In 2007 IEEE International Conference on Signal Processing and Communications, (pp. 1195–1198).

Cosi, P., Frasconi, P., Gori, M., Lastrucci, L., & Soda, G. (2000). Competitive radial basis functions training for phone classification. Neurocomputing, 34(1–4), 117–129.CrossRefMATH

Essa, E., Tolba, A., & Elmougy, S. (2008). Combined classifier based Arabic speech recognition. In Proceedings of the 2008 IEEE International Conference on Computer Engineering & Systems.

Gemmeke, J., ten Bosch, L., Boves, L., & Cranen, B. (2009). Using sparse representations for exemplar based continuous digit recognition. In Proceeding of the EUSIPCO, (pp. 24–28).

Gürgen, F., Alpaydin, R., Ünlüakin, U., & Alpaydin, E. (1994). Distributed and local neural classifiers for phoneme recognition†. Pattern Recognition Letters, 15(11), 1111–1118.CrossRef

Kohonen, T. (1988). Self-organization and associative memory (2nd ed., pp. 199–202). Berlin: Springer.CrossRefMATH

Kondo, K., Kamata, H., & Ishida, Y. (1994). Speaker-independent spoken digits recognition using LVQ. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), (Vol. 7, pp. 4448–4451).

Kumpf, K., & King, R. (1996). Automatic accent classification of foreign accented Australian English speech. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96,( Vol. 3, pp. 1740–1743).

Kurimo, M. (1997). Training mixture density HMMs with SOM and LVQ. Computer Speech & Language, 11(4), 321–343.CrossRef

Lamere, P., Kwok, P., & Walker, W. (2003). Design of the CMU Sphinx-4 decoder. In Eurospeech.

Ma, D., & ZENG, X. (2012). An improved VQ based algorithm for recognizing speaker-independent isolated words. In 2012 International Conference on Machine Learning and Cybernetics, pp. 792–796.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5-th Berkeley symposium on Mathematical Statistics and Probability, (Vol. 1, pp. 281–297).

Mäntysalo, J., Torkkola, K., & Kohonen, T. (1994). Mapping content dependent acoustic information into context independent form by LVQ. Speech Communication, 14(2), 119–130.CrossRef

McDermott, E., & Katagiri, S. (1991). LVQ-based shift-tolerant phoneme recognition. Signal Processing, IEEE Transactions, 39(6), 1398–1411.CrossRef

Nahar, K., Elshafei, M., & Al-Khatib, W. (2012). Statistical analysis of Arabic phonemes for continuous Arabic speech recognition. International Journal of Computer and Information Technology, 1(2), 49–61.

Prasad, T., & Kohli, M.(2010). Vector quantization of microarray gene expression data. In Proceedings of the World Congress on Engineering.

Selouani, S., & Caelen, J. (1999). A hybrid learning vector quantization/time-delay neural networks system for the recognition of arabic speech. In Proceedings of the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP’99), (Vol. 2, pp. 709–713).

Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), 328–339.CrossRef

Yokota, M., Katagiri, S., & McDermott, E. (1988). Learning in an LVQ based phoneme recognition system. (7E/CE Technical Report, SP88-104).

Titel: Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition
verfasst von: Khalid M. O. Nahar
Mohammed Abu Shquier
Wasfi G. Al-Khatib
Husni Al-Muhtaseb
Moustafa Elshafei
Publikationsdatum: 20.05.2016
Verlag: Springer US
Erschienen in: International Journal of Speech Technology / Ausgabe 3/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-016-9337-5

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Buchstaben, die aus einem Megaphon kommen/© MicroStockHub/Getty Images/iStock, Digitale Lieferkette/© zapp2photo / stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2016

Integrated acoustic echo and noise suppression in modulation domain

Performance of speaker identification using CSM and TM

Text-dependent speaker verification using classical LBG, adaptive LBG and FCM vector quantization

Audio steganalysis using deep belief networks

Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people

Wavelet energy based voice activity detection and adaptive thresholding for efficient speech coding

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.