Skip to main content
Erschienen in: International Journal of Speech Technology 3/2016

20.05.2016

Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition

verfasst von: Khalid M. O. Nahar, Mohammed Abu Shquier, Wasfi G. Al-Khatib, Husni Al-Muhtaseb, Moustafa Elshafei

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In attempt to increase the rate of Arabic phonemes recognition, we introduce a novel hybrid recognition algorithm. The algorithm is composed of the learning vector quantization (LVQ) and hidden Markov model (HMM). The hybrid algorithm used to recognizing Arabic phonemes in continuous open-vocabulary speech. A recorded Arabic corpus of different TV news for modern standard Arabic was used for training and testing purposes. We employ a data driven approach to generate the training feature vectors that embed the frame neighboring correlation information. Next, we generate the phonemes codebooks using the K-means splitting algorithm. Then, we trained the generated codebooks using the LVQ algorithm. We achieved a performance of 98.49 % during independent classification training and 90 % during dependent classification training. When using the trained LVQ codebooks in Arabic utterance transcription, the phoneme recognition rate was 72 % using LVQ only. We combined the LVQ codebooks with the single state HMM model using enhanced Viterbi algorithm which includes the phonemes bigrams. We achieved 89 % of Arabic phonemes recognition rate based on the hybrid LVQ/HMM algorithm.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Except for the first 3 frames and the last 3 frames in the feature matrix.
 
Literatur
Zurück zum Zitat AbuZeina, D., & Al-Khatib, W. (2012). Within-word pronunciation variation modeling for Arabic ASRs: A direct data-driven approach. International Journal of Speech Technology, 15(2), 65–75.CrossRef AbuZeina, D., & Al-Khatib, W. (2012). Within-word pronunciation variation modeling for Arabic ASRs: A direct data-driven approach. International Journal of Speech Technology, 15(2), 65–75.CrossRef
Zurück zum Zitat Ali, M., & Elshafei, M. (2009). Arabic phonetic dictionaries for speech recognition. Journal of Information Technology, 2(80), 67–80.CrossRef Ali, M., & Elshafei, M. (2009). Arabic phonetic dictionaries for speech recognition. Journal of Information Technology, 2(80), 67–80.CrossRef
Zurück zum Zitat Al-Manie, M., Alkanhal, M., & Al-Ghamdi, M. (2010). Arabic speech segmentation: Automatic verses manual method and zero crossing measurements. Indian Journal of Science and Technology, 3, 1134–1138. Al-Manie, M., Alkanhal, M., & Al-Ghamdi, M. (2010). Arabic speech segmentation: Automatic verses manual method and zero crossing measurements. Indian Journal of Science and Technology, 3, 1134–1138.
Zurück zum Zitat Avdagic, Z., Nuhic, A., & Konjicija, S. (2007). Phoneme recognition as a member of predefined class using hybrid cascaded LVQ/elman neural network. In 2007 IEEE International Conference on Signal Processing and Communications, (pp. 1195–1198). Avdagic, Z., Nuhic, A., & Konjicija, S. (2007). Phoneme recognition as a member of predefined class using hybrid cascaded LVQ/elman neural network. In 2007 IEEE International Conference on Signal Processing and Communications, (pp. 1195–1198).
Zurück zum Zitat Cosi, P., Frasconi, P., Gori, M., Lastrucci, L., & Soda, G. (2000). Competitive radial basis functions training for phone classification. Neurocomputing, 34(1–4), 117–129.CrossRefMATH Cosi, P., Frasconi, P., Gori, M., Lastrucci, L., & Soda, G. (2000). Competitive radial basis functions training for phone classification. Neurocomputing, 34(1–4), 117–129.CrossRefMATH
Zurück zum Zitat Essa, E., Tolba, A., & Elmougy, S. (2008). Combined classifier based Arabic speech recognition. In Proceedings of the 2008 IEEE International Conference on Computer Engineering & Systems. Essa, E., Tolba, A., & Elmougy, S. (2008). Combined classifier based Arabic speech recognition. In Proceedings of the 2008 IEEE International Conference on Computer Engineering & Systems.
Zurück zum Zitat Gemmeke, J., ten Bosch, L., Boves, L., & Cranen, B. (2009). Using sparse representations for exemplar based continuous digit recognition. In Proceeding of the EUSIPCO, (pp. 24–28). Gemmeke, J., ten Bosch, L., Boves, L., & Cranen, B. (2009). Using sparse representations for exemplar based continuous digit recognition. In Proceeding of the EUSIPCO, (pp. 24–28).
Zurück zum Zitat Gürgen, F., Alpaydin, R., Ünlüakin, U., & Alpaydin, E. (1994). Distributed and local neural classifiers for phoneme recognition†. Pattern Recognition Letters, 15(11), 1111–1118.CrossRef Gürgen, F., Alpaydin, R., Ünlüakin, U., & Alpaydin, E. (1994). Distributed and local neural classifiers for phoneme recognition†. Pattern Recognition Letters, 15(11), 1111–1118.CrossRef
Zurück zum Zitat Kohonen, T. (1988). Self-organization and associative memory (2nd ed., pp. 199–202). Berlin: Springer.CrossRefMATH Kohonen, T. (1988). Self-organization and associative memory (2nd ed., pp. 199–202). Berlin: Springer.CrossRefMATH
Zurück zum Zitat Kondo, K., Kamata, H., & Ishida, Y. (1994). Speaker-independent spoken digits recognition using LVQ. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), (Vol. 7, pp. 4448–4451). Kondo, K., Kamata, H., & Ishida, Y. (1994). Speaker-independent spoken digits recognition using LVQ. In Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), (Vol. 7, pp. 4448–4451).
Zurück zum Zitat Kumpf, K., & King, R. (1996). Automatic accent classification of foreign accented Australian English speech. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96,( Vol. 3, pp. 1740–1743). Kumpf, K., & King, R. (1996). Automatic accent classification of foreign accented Australian English speech. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP ’96,( Vol. 3, pp. 1740–1743).
Zurück zum Zitat Kurimo, M. (1997). Training mixture density HMMs with SOM and LVQ. Computer Speech & Language, 11(4), 321–343.CrossRef Kurimo, M. (1997). Training mixture density HMMs with SOM and LVQ. Computer Speech & Language, 11(4), 321–343.CrossRef
Zurück zum Zitat Lamere, P., Kwok, P., & Walker, W. (2003). Design of the CMU Sphinx-4 decoder. In Eurospeech. Lamere, P., Kwok, P., & Walker, W. (2003). Design of the CMU Sphinx-4 decoder. In Eurospeech.
Zurück zum Zitat Ma, D., & ZENG, X. (2012). An improved VQ based algorithm for recognizing speaker-independent isolated words. In 2012 International Conference on Machine Learning and Cybernetics, pp. 792–796. Ma, D., & ZENG, X. (2012). An improved VQ based algorithm for recognizing speaker-independent isolated words. In 2012 International Conference on Machine Learning and Cybernetics, pp. 792–796.
Zurück zum Zitat MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5-th Berkeley symposium on Mathematical Statistics and Probability, (Vol. 1, pp. 281–297). MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5-th Berkeley symposium on Mathematical Statistics and Probability, (Vol. 1, pp. 281–297).
Zurück zum Zitat Mäntysalo, J., Torkkola, K., & Kohonen, T. (1994). Mapping content dependent acoustic information into context independent form by LVQ. Speech Communication, 14(2), 119–130.CrossRef Mäntysalo, J., Torkkola, K., & Kohonen, T. (1994). Mapping content dependent acoustic information into context independent form by LVQ. Speech Communication, 14(2), 119–130.CrossRef
Zurück zum Zitat McDermott, E., & Katagiri, S. (1991). LVQ-based shift-tolerant phoneme recognition. Signal Processing, IEEE Transactions, 39(6), 1398–1411.CrossRef McDermott, E., & Katagiri, S. (1991). LVQ-based shift-tolerant phoneme recognition. Signal Processing, IEEE Transactions, 39(6), 1398–1411.CrossRef
Zurück zum Zitat Nahar, K., Elshafei, M., & Al-Khatib, W. (2012). Statistical analysis of Arabic phonemes for continuous Arabic speech recognition. International Journal of Computer and Information Technology, 1(2), 49–61. Nahar, K., Elshafei, M., & Al-Khatib, W. (2012). Statistical analysis of Arabic phonemes for continuous Arabic speech recognition. International Journal of Computer and Information Technology, 1(2), 49–61.
Zurück zum Zitat Prasad, T., & Kohli, M.(2010). Vector quantization of microarray gene expression data. In Proceedings of the World Congress on Engineering. Prasad, T., & Kohli, M.(2010). Vector quantization of microarray gene expression data. In Proceedings of the World Congress on Engineering.
Zurück zum Zitat Selouani, S., & Caelen, J. (1999). A hybrid learning vector quantization/time-delay neural networks system for the recognition of arabic speech. In Proceedings of the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP’99), (Vol. 2, pp. 709–713). Selouani, S., & Caelen, J. (1999). A hybrid learning vector quantization/time-delay neural networks system for the recognition of arabic speech. In Proceedings of the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (NSIP’99), (Vol. 2, pp. 709–713).
Zurück zum Zitat Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), 328–339.CrossRef Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), 328–339.CrossRef
Zurück zum Zitat Yokota, M., Katagiri, S., & McDermott, E. (1988). Learning in an LVQ based phoneme recognition system. (7E/CE Technical Report, SP88-104). Yokota, M., Katagiri, S., & McDermott, E. (1988). Learning in an LVQ based phoneme recognition system. (7E/CE Technical Report, SP88-104).
Metadaten
Titel
Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition
verfasst von
Khalid M. O. Nahar
Mohammed Abu Shquier
Wasfi G. Al-Khatib
Husni Al-Muhtaseb
Moustafa Elshafei
Publikationsdatum
20.05.2016
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2016
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-016-9337-5

Weitere Artikel der Ausgabe 3/2016

International Journal of Speech Technology 3/2016 Zur Ausgabe

Neuer Inhalt