nach oben

Arabian Journal for Science and Engineering

Erschienen in:

22.08.2019 | Research Article - Electrical Engineering

Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract

verfasst von: Muhammad Javed, Mirza Muhammad Ali Baig, Saad Ahmed Qazi

Erschienen in: Arabian Journal for Science and Engineering | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Automatic segmentation of speech is about identifying boundaries of phonemes in a given utterance. This paper presents a strategy driven by cosine distance similarity scores for identifying phoneme boundaries. The proposed strategy helps in the selection of appropriate feature extraction technique for speech segmentation applications. After assessing various state-of-the-art speech processing techniques, a novel combination of forward and inverse characteristics of vocal tract (FICV) is developed. The proposed technique is evaluated on Classical Arabic dataset. Extensive experiments are made to compare the proposed technique with state-of-the-art techniques, including the hidden Markov model-based forced alignment procedures. The results show that proposed technique has total error rate of 14.48%, while the accuracy is 85.2% within 10 ms alignment error. When compared with the existing state-of-the-art technique, the proposed technique outperforms by 12.29% and 22.73% in terms of error rates and alignment accuracies, respectively, which signifies the potential of using FICV in speech segmentation.

Vorheriger Artikel Speech Signal Recovery Using Block Sparse Bayesian Learning

Nächster Artikel Impact of Plug-In Electric Vehicles on Faulted Distribution System

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Brognaux, S.; Drugman, T.: HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24, 5–15 (2016)CrossRef

Adell, J.; Bonafonte, A.: Towards phone segmentation for concatenative speech synthesis. In: Fifth ISCA Workshop on Speech Synthesis (2004)

Lee, K.-F.; Hon, H.-W.; Reddy, R.: An overview of the SPHINX speech recognition system. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 600–610. Morgan Kaufmann, San Francisco (1990). https://doi.org/10.1016/B978-0-08-051584-7.50056-5 CrossRef

Graves, A.; Mohamed, A.-R.; Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649 (2013)

Sharma, M.; Mammone, R.: Subword-based text-dependent speaker verification system with user-selectable passwords. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings, pp. 93–96 (1996)

Alsulaiman, M.; Mahmood, A.; Muhammad, G.: Speaker recognition based on Arabic phonemes. Speech Commun. 86, 42–51 (2017)CrossRef

Pradhan, G.; Prasanna, S.M.: Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21, 854–867 (2013)CrossRef

Muthusamy, Y.K.; Barnard, E.; Cole, R.A.: Reviewing automatic language identification. IEEE Signal Process. Mag. 11, 33–41 (1994)CrossRef

Adami, A.G.; Hermansky, H.: Segmentation of speech for speaker and language recognition. In: Eighth European Conference on Speech Communication and Technology (2003)

10.

van Hemert, J.P.: Automatic segmentation of speech. IEEE Trans. Signal Process. 39, 1008–1012 (1991)CrossRef

11.

Hosom, J.-P.: Automatic time alignment of phonemes using acoustic-phonetic information. Thesis, OHSU (2000). http://digitalcommons.ohsu.edu/etd/175

12.

Awais, M.; Masud, S.; Shamail, S.: Continuous arabic speech segmentation using FFT spectrogram. Innov. Inf. Technol. 2006, 1–6 (2006)

13.

Ljolje, A.; Riley, M.: Automatic segmentation and labeling of speech. In: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991. ICASSP-91, pp. 473–476 (1991)

14.

Kessens, J.M.; Strik, H.: On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions. Comput. Speech Lang. 18, 123–141 (2004)CrossRef

15.

Kim, Y.-J.; Conkie, A.: Automatic segmentation combining an HMM-based approach and spectral boundary correction. In: Seventh International Conference on Spoken Language Processing (2002)

16.

Scharenborg, O.; Wan, V.; Ernestus, M.: Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. J. Acoust. Soc. Am. 127, 1084–1095 (2010)CrossRef

17.

Rasanen, O.; Laine, U.; Altosaar, T.: Blind segmentation of speech using non-linear filtering methods. In: Ipsic, I. (ed.) Speech Technologies, pp. 105–124. IntechOpen (2011). https://doi.org/10.5772/16433

18.

Khanagha, V.; Daoudi, K.; Pont, O.; Yahia, H.: Phonetic segmentation of speech signal using local singularity analysis. Digit. Signal Proc. 35, 86–94 (2014)CrossRef

19.

Dusan, S.; Rabiner, L.: On the relation between maximum spectral transition positions and phone boundaries. In: Ninth International Conference on Spoken Language Processing (2006)

20.

Frihia, H.; Bahi, H.: HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications. Int. J. Speech Technol. 20, 563–573 (2017)CrossRef

21.

Sangeetha, J.; Jothilakshmi, S.: Robust automatic continuous speech segmentation for indian languages to improve speech to speech translation. Int. J. Comput. Appl. 53, 13–16 (2012)

22.

Anwar, M.J.; Awais, M.; Masud, S.; Shamail, S.: Automatic Arabic speech segmentation system. Int. J. Inf. Technol. 12, 102–111 (2006)

23.

Kaur, E.A.; Singh, E.T.: Segmentation of continuous punjabi speech signal into syllables. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 20–22 (2010)

24.

Tolba, M.; Nazmy, T.; Abdelhamid, A.; Gadallah, M.: A novel method for Arabic consonant/vowel segmentation using wavelet transform. Int. J. Intell. Coop. Inf. Syst. IJICIS 5, 353–364 (2005)

25.

Shah, N.J.; Vachhani, B.B.; Sailor, H.B.; Patil, H.A.: Effectiveness of PLP-based phonetic segmentation for speech synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 270–274 (2014)

26.

Nagarajan, T.; Murthy, H.A.; Hegde, R.M.: Segmentation of speech into syllable-like units. In: Eighth European Conference on Speech Communication and Technology (2003)

27.

Rabiner, L.R.; Schafer, R.W.: Theory and Applications of Digital Speech Processing, vol. 64. Pearson, Upper Saddle River (2011)

28.

Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)CrossRef

29.

Hemdal, J.F.; Lougheed, R.M.: Morphological approaches to the automatic extraction of phonetic features. IEEE Trans. Signal Process. 39, 490–497 (1991)CrossRef

30.

Dimitriadis, D.; Maragos, P.; Potamianos, A.: On the effects of filterbank design and energy computation on robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19, 1504–1516 (2011)CrossRef

31.

Do, C.-T.; Pastor, D.; Goalic, A.: On the recognition of cochlear implant-like spectrally reduced speech with MFCC and HMM-based ASR. IEEE Trans. Audio Speech Lang. Process. 18, 1065–1068 (2010)CrossRef

32.

Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)CrossRef

33.

Markel, J.D.; Gray, A.J.: Linear Prediction of Speech. Springer, Berlin (1976)CrossRef

34.

Sara, S.: Ibn Sina: A Treatise on Arabic Phonetics. LINCOM Publishers, Munich (2009)

35.

Baig, M.M.A.; Qazi, S.A.; Kadri, M.B.: Discriminative training for phonetic recognition of the Holy Quran. Arab. J. Sci. Eng. 40, 2629–2640 (2015)CrossRef

36.

Alghamdi, M.M.; Ajami Alotaibi, Y.: HMM automatic speech recognition system of Arabic alphadigits. Arab. J. Sci. Eng. 35, 137 (2010)

37.

Alotaibi, Y.A.; Muhammad, G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech Lang. 24, 219–231 (2010)CrossRef

38.

Boersma, P.: Praat: doing phonetics by computer. http://www.praat.org/ (2006). Accessed 1 Jan 2014

39.

Furui, S.: On the role of spectral transition for speech perception. J. Acoust. Soc. Am. 80, 1016–1025 (1986)CrossRef

40.

Davis, S.B.; Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Waibel, A., Lee, K.-F. (eds.) Readings in Speech Recognition, pp. 65–74. Morgan Kaufmann, San Francisco (1990). https://doi.org/10.1016/B978-0-08-051584-7.50056-5 CrossRef

Titel: Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract
verfasst von: Muhammad Javed
Mirza Muhammad Ali Baig
Saad Ahmed Qazi
Publikationsdatum: 22.08.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: Arabian Journal for Science and Engineering / Ausgabe 3/2020
Print ISSN: 2193-567X
Elektronische ISSN: 2191-4281
DOI: https://doi.org/10.1007/s13369-019-04065-5

Premium Partner

Marktübersichten

Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.

Zur Marktübersicht

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2020

On Designing a New Bayesian Dispersion Chart for Process Monitoring

Prospect Theory-Based Consistency Recovery Strategies with Multiplicative Probabilistic Linguistic Preference Relations in Managing Group Decision Making

Reconfiguration of Distribution Network for Transformer Life Extension Using Quasi-static Time-Series Analysis

Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants

Power Quality Events Recognition Using S-Transform and Wild Goat Optimization-Based Extreme Learning Machine

Continuous and Discontinuous PWM Methods for Symmetrical Six-Phase Induction Motor with Single Isolated Neutral

Premium Partner

Marktübersichten