Top

Arabian Journal for Science and Engineering

Published in:

22-08-2019 | Research Article - Electrical Engineering

Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract

Authors: Muhammad Javed, Mirza Muhammad Ali Baig, Saad Ahmed Qazi

Published in: Arabian Journal for Science and Engineering | Issue 3/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Automatic segmentation of speech is about identifying boundaries of phonemes in a given utterance. This paper presents a strategy driven by cosine distance similarity scores for identifying phoneme boundaries. The proposed strategy helps in the selection of appropriate feature extraction technique for speech segmentation applications. After assessing various state-of-the-art speech processing techniques, a novel combination of forward and inverse characteristics of vocal tract (FICV) is developed. The proposed technique is evaluated on Classical Arabic dataset. Extensive experiments are made to compare the proposed technique with state-of-the-art techniques, including the hidden Markov model-based forced alignment procedures. The results show that proposed technique has total error rate of 14.48%, while the accuracy is 85.2% within 10 ms alignment error. When compared with the existing state-of-the-art technique, the proposed technique outperforms by 12.29% and 22.73% in terms of error rates and alignment accuracies, respectively, which signifies the potential of using FICV in speech segmentation.

previous article Speech Signal Recovery Using Block Sparse Bayesian Learning

next article Impact of Plug-In Electric Vehicles on Faulted Distribution System

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Brognaux, S.; Drugman, T.: HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24, 5–15 (2016)CrossRef

Adell, J.; Bonafonte, A.: Towards phone segmentation for concatenative speech synthesis. In: Fifth ISCA Workshop on Speech Synthesis (2004)

Lee, K.-F.; Hon, H.-W.; Reddy, R.: An overview of the SPHINX speech recognition system. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 600–610. Morgan Kaufmann, San Francisco (1990). https://doi.org/10.1016/B978-0-08-051584-7.50056-5 CrossRef

Graves, A.; Mohamed, A.-R.; Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649 (2013)

Sharma, M.; Mammone, R.: Subword-based text-dependent speaker verification system with user-selectable passwords. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings, pp. 93–96 (1996)

Alsulaiman, M.; Mahmood, A.; Muhammad, G.: Speaker recognition based on Arabic phonemes. Speech Commun. 86, 42–51 (2017)CrossRef

Pradhan, G.; Prasanna, S.M.: Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21, 854–867 (2013)CrossRef

Muthusamy, Y.K.; Barnard, E.; Cole, R.A.: Reviewing automatic language identification. IEEE Signal Process. Mag. 11, 33–41 (1994)CrossRef

Adami, A.G.; Hermansky, H.: Segmentation of speech for speaker and language recognition. In: Eighth European Conference on Speech Communication and Technology (2003)

10.

van Hemert, J.P.: Automatic segmentation of speech. IEEE Trans. Signal Process. 39, 1008–1012 (1991)CrossRef

11.

Hosom, J.-P.: Automatic time alignment of phonemes using acoustic-phonetic information. Thesis, OHSU (2000). http://digitalcommons.ohsu.edu/etd/175

12.

Awais, M.; Masud, S.; Shamail, S.: Continuous arabic speech segmentation using FFT spectrogram. Innov. Inf. Technol. 2006, 1–6 (2006)

13.

Ljolje, A.; Riley, M.: Automatic segmentation and labeling of speech. In: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991. ICASSP-91, pp. 473–476 (1991)

14.

Kessens, J.M.; Strik, H.: On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions. Comput. Speech Lang. 18, 123–141 (2004)CrossRef

15.

Kim, Y.-J.; Conkie, A.: Automatic segmentation combining an HMM-based approach and spectral boundary correction. In: Seventh International Conference on Spoken Language Processing (2002)

16.

Scharenborg, O.; Wan, V.; Ernestus, M.: Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. J. Acoust. Soc. Am. 127, 1084–1095 (2010)CrossRef

17.

Rasanen, O.; Laine, U.; Altosaar, T.: Blind segmentation of speech using non-linear filtering methods. In: Ipsic, I. (ed.) Speech Technologies, pp. 105–124. IntechOpen (2011). https://doi.org/10.5772/16433

18.

Khanagha, V.; Daoudi, K.; Pont, O.; Yahia, H.: Phonetic segmentation of speech signal using local singularity analysis. Digit. Signal Proc. 35, 86–94 (2014)CrossRef

19.

Dusan, S.; Rabiner, L.: On the relation between maximum spectral transition positions and phone boundaries. In: Ninth International Conference on Spoken Language Processing (2006)

20.

Frihia, H.; Bahi, H.: HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications. Int. J. Speech Technol. 20, 563–573 (2017)CrossRef

21.

Sangeetha, J.; Jothilakshmi, S.: Robust automatic continuous speech segmentation for indian languages to improve speech to speech translation. Int. J. Comput. Appl. 53, 13–16 (2012)

22.

Anwar, M.J.; Awais, M.; Masud, S.; Shamail, S.: Automatic Arabic speech segmentation system. Int. J. Inf. Technol. 12, 102–111 (2006)

23.

Kaur, E.A.; Singh, E.T.: Segmentation of continuous punjabi speech signal into syllables. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 20–22 (2010)

24.

Tolba, M.; Nazmy, T.; Abdelhamid, A.; Gadallah, M.: A novel method for Arabic consonant/vowel segmentation using wavelet transform. Int. J. Intell. Coop. Inf. Syst. IJICIS 5, 353–364 (2005)

25.

Shah, N.J.; Vachhani, B.B.; Sailor, H.B.; Patil, H.A.: Effectiveness of PLP-based phonetic segmentation for speech synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 270–274 (2014)

26.

Nagarajan, T.; Murthy, H.A.; Hegde, R.M.: Segmentation of speech into syllable-like units. In: Eighth European Conference on Speech Communication and Technology (2003)

27.

Rabiner, L.R.; Schafer, R.W.: Theory and Applications of Digital Speech Processing, vol. 64. Pearson, Upper Saddle River (2011)

28.

Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)CrossRef

29.

Hemdal, J.F.; Lougheed, R.M.: Morphological approaches to the automatic extraction of phonetic features. IEEE Trans. Signal Process. 39, 490–497 (1991)CrossRef

30.

Dimitriadis, D.; Maragos, P.; Potamianos, A.: On the effects of filterbank design and energy computation on robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19, 1504–1516 (2011)CrossRef

31.

Do, C.-T.; Pastor, D.; Goalic, A.: On the recognition of cochlear implant-like spectrally reduced speech with MFCC and HMM-based ASR. IEEE Trans. Audio Speech Lang. Process. 18, 1065–1068 (2010)CrossRef

32.

Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)CrossRef

33.

Markel, J.D.; Gray, A.J.: Linear Prediction of Speech. Springer, Berlin (1976)CrossRef

34.

Sara, S.: Ibn Sina: A Treatise on Arabic Phonetics. LINCOM Publishers, Munich (2009)

35.

Baig, M.M.A.; Qazi, S.A.; Kadri, M.B.: Discriminative training for phonetic recognition of the Holy Quran. Arab. J. Sci. Eng. 40, 2629–2640 (2015)CrossRef

36.

Alghamdi, M.M.; Ajami Alotaibi, Y.: HMM automatic speech recognition system of Arabic alphadigits. Arab. J. Sci. Eng. 35, 137 (2010)

37.

Alotaibi, Y.A.; Muhammad, G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech Lang. 24, 219–231 (2010)CrossRef

38.

Boersma, P.: Praat: doing phonetics by computer. http://www.praat.org/ (2006). Accessed 1 Jan 2014

39.

Furui, S.: On the role of spectral transition for speech perception. J. Acoust. Soc. Am. 80, 1016–1025 (1986)CrossRef

40.

Davis, S.B.; Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 65–74. Morgan Kaufmann, San Francisco (1990). https://doi.org/10.1016/B978-0-08-051584-7.50056-5 CrossRef

Title: Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract
Authors: Muhammad Javed
Mirza Muhammad Ali Baig
Saad Ahmed Qazi
Publication date: 22-08-2019
Publisher: Springer Berlin Heidelberg
Published in: Arabian Journal for Science and Engineering / Issue 3/2020
Print ISSN: 2193-567X
Electronic ISSN: 2191-4281
DOI: https://doi.org/10.1007/s13369-019-04065-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Other articles of this Issue 3/2020

Modified IMC Technique for Nonlinear Uncertain Milling CNC Machine Tool System

Dynamic Control of a Machine Repair Problem with Switching Failure and Unreliable Repairmen

Using Fuzzy-Improved Principal Component Analysis (PCA-IF) for Ranking of Major Accident Scenarios

A Hybrid Intelligent Approach for Solar Photovoltaic Power Forecasting: Impact of Aerosol Data

Impact of Plug-In Electric Vehicles on Faulted Distribution System

BER Performance of SFBC–OFDM Systems Working Over Fading Channels Under Impulsive Environment

Premium Partners