Skip to main content
Erschienen in: Arabian Journal for Science and Engineering 3/2020

22.08.2019 | Research Article - Electrical Engineering

Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract

verfasst von: Muhammad Javed, Mirza Muhammad Ali Baig, Saad Ahmed Qazi

Erschienen in: Arabian Journal for Science and Engineering | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automatic segmentation of speech is about identifying boundaries of phonemes in a given utterance. This paper presents a strategy driven by cosine distance similarity scores for identifying phoneme boundaries. The proposed strategy helps in the selection of appropriate feature extraction technique for speech segmentation applications. After assessing various state-of-the-art speech processing techniques, a novel combination of forward and inverse characteristics of vocal tract (FICV) is developed. The proposed technique is evaluated on Classical Arabic dataset. Extensive experiments are made to compare the proposed technique with state-of-the-art techniques, including the hidden Markov model-based forced alignment procedures. The results show that proposed technique has total error rate of 14.48%, while the accuracy is 85.2% within 10 ms alignment error. When compared with the existing state-of-the-art technique, the proposed technique outperforms by 12.29% and 22.73% in terms of error rates and alignment accuracies, respectively, which signifies the potential of using FICV in speech segmentation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Brognaux, S.; Drugman, T.: HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24, 5–15 (2016)CrossRef Brognaux, S.; Drugman, T.: HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24, 5–15 (2016)CrossRef
2.
Zurück zum Zitat Adell, J.; Bonafonte, A.: Towards phone segmentation for concatenative speech synthesis. In: Fifth ISCA Workshop on Speech Synthesis (2004) Adell, J.; Bonafonte, A.: Towards phone segmentation for concatenative speech synthesis. In: Fifth ISCA Workshop on Speech Synthesis (2004)
4.
Zurück zum Zitat Graves, A.; Mohamed, A.-R.; Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649 (2013) Graves, A.; Mohamed, A.-R.; Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649 (2013)
5.
Zurück zum Zitat Sharma, M.; Mammone, R.: Subword-based text-dependent speaker verification system with user-selectable passwords. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings, pp. 93–96 (1996) Sharma, M.; Mammone, R.: Subword-based text-dependent speaker verification system with user-selectable passwords. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings, pp. 93–96 (1996)
6.
Zurück zum Zitat Alsulaiman, M.; Mahmood, A.; Muhammad, G.: Speaker recognition based on Arabic phonemes. Speech Commun. 86, 42–51 (2017)CrossRef Alsulaiman, M.; Mahmood, A.; Muhammad, G.: Speaker recognition based on Arabic phonemes. Speech Commun. 86, 42–51 (2017)CrossRef
7.
Zurück zum Zitat Pradhan, G.; Prasanna, S.M.: Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21, 854–867 (2013)CrossRef Pradhan, G.; Prasanna, S.M.: Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21, 854–867 (2013)CrossRef
8.
Zurück zum Zitat Muthusamy, Y.K.; Barnard, E.; Cole, R.A.: Reviewing automatic language identification. IEEE Signal Process. Mag. 11, 33–41 (1994)CrossRef Muthusamy, Y.K.; Barnard, E.; Cole, R.A.: Reviewing automatic language identification. IEEE Signal Process. Mag. 11, 33–41 (1994)CrossRef
9.
Zurück zum Zitat Adami, A.G.; Hermansky, H.: Segmentation of speech for speaker and language recognition. In: Eighth European Conference on Speech Communication and Technology (2003) Adami, A.G.; Hermansky, H.: Segmentation of speech for speaker and language recognition. In: Eighth European Conference on Speech Communication and Technology (2003)
10.
Zurück zum Zitat van Hemert, J.P.: Automatic segmentation of speech. IEEE Trans. Signal Process. 39, 1008–1012 (1991)CrossRef van Hemert, J.P.: Automatic segmentation of speech. IEEE Trans. Signal Process. 39, 1008–1012 (1991)CrossRef
12.
Zurück zum Zitat Awais, M.; Masud, S.; Shamail, S.: Continuous arabic speech segmentation using FFT spectrogram. Innov. Inf. Technol. 2006, 1–6 (2006) Awais, M.; Masud, S.; Shamail, S.: Continuous arabic speech segmentation using FFT spectrogram. Innov. Inf. Technol. 2006, 1–6 (2006)
13.
Zurück zum Zitat Ljolje, A.; Riley, M.: Automatic segmentation and labeling of speech. In: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991. ICASSP-91, pp. 473–476 (1991) Ljolje, A.; Riley, M.: Automatic segmentation and labeling of speech. In: 1991 International Conference on Acoustics, Speech, and Signal Processing, 1991. ICASSP-91, pp. 473–476 (1991)
14.
Zurück zum Zitat Kessens, J.M.; Strik, H.: On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions. Comput. Speech Lang. 18, 123–141 (2004)CrossRef Kessens, J.M.; Strik, H.: On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions. Comput. Speech Lang. 18, 123–141 (2004)CrossRef
15.
Zurück zum Zitat Kim, Y.-J.; Conkie, A.: Automatic segmentation combining an HMM-based approach and spectral boundary correction. In: Seventh International Conference on Spoken Language Processing (2002) Kim, Y.-J.; Conkie, A.: Automatic segmentation combining an HMM-based approach and spectral boundary correction. In: Seventh International Conference on Spoken Language Processing (2002)
16.
Zurück zum Zitat Scharenborg, O.; Wan, V.; Ernestus, M.: Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. J. Acoust. Soc. Am. 127, 1084–1095 (2010)CrossRef Scharenborg, O.; Wan, V.; Ernestus, M.: Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. J. Acoust. Soc. Am. 127, 1084–1095 (2010)CrossRef
17.
18.
Zurück zum Zitat Khanagha, V.; Daoudi, K.; Pont, O.; Yahia, H.: Phonetic segmentation of speech signal using local singularity analysis. Digit. Signal Proc. 35, 86–94 (2014)CrossRef Khanagha, V.; Daoudi, K.; Pont, O.; Yahia, H.: Phonetic segmentation of speech signal using local singularity analysis. Digit. Signal Proc. 35, 86–94 (2014)CrossRef
19.
Zurück zum Zitat Dusan, S.; Rabiner, L.: On the relation between maximum spectral transition positions and phone boundaries. In: Ninth International Conference on Spoken Language Processing (2006) Dusan, S.; Rabiner, L.: On the relation between maximum spectral transition positions and phone boundaries. In: Ninth International Conference on Spoken Language Processing (2006)
20.
Zurück zum Zitat Frihia, H.; Bahi, H.: HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications. Int. J. Speech Technol. 20, 563–573 (2017)CrossRef Frihia, H.; Bahi, H.: HMM/SVM segmentation and labelling of Arabic speech for speech recognition applications. Int. J. Speech Technol. 20, 563–573 (2017)CrossRef
21.
Zurück zum Zitat Sangeetha, J.; Jothilakshmi, S.: Robust automatic continuous speech segmentation for indian languages to improve speech to speech translation. Int. J. Comput. Appl. 53, 13–16 (2012) Sangeetha, J.; Jothilakshmi, S.: Robust automatic continuous speech segmentation for indian languages to improve speech to speech translation. Int. J. Comput. Appl. 53, 13–16 (2012)
22.
Zurück zum Zitat Anwar, M.J.; Awais, M.; Masud, S.; Shamail, S.: Automatic Arabic speech segmentation system. Int. J. Inf. Technol. 12, 102–111 (2006) Anwar, M.J.; Awais, M.; Masud, S.; Shamail, S.: Automatic Arabic speech segmentation system. Int. J. Inf. Technol. 12, 102–111 (2006)
23.
Zurück zum Zitat Kaur, E.A.; Singh, E.T.: Segmentation of continuous punjabi speech signal into syllables. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 20–22 (2010) Kaur, E.A.; Singh, E.T.: Segmentation of continuous punjabi speech signal into syllables. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 20–22 (2010)
24.
Zurück zum Zitat Tolba, M.; Nazmy, T.; Abdelhamid, A.; Gadallah, M.: A novel method for Arabic consonant/vowel segmentation using wavelet transform. Int. J. Intell. Coop. Inf. Syst. IJICIS 5, 353–364 (2005) Tolba, M.; Nazmy, T.; Abdelhamid, A.; Gadallah, M.: A novel method for Arabic consonant/vowel segmentation using wavelet transform. Int. J. Intell. Coop. Inf. Syst. IJICIS 5, 353–364 (2005)
25.
Zurück zum Zitat Shah, N.J.; Vachhani, B.B.; Sailor, H.B.; Patil, H.A.: Effectiveness of PLP-based phonetic segmentation for speech synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 270–274 (2014) Shah, N.J.; Vachhani, B.B.; Sailor, H.B.; Patil, H.A.: Effectiveness of PLP-based phonetic segmentation for speech synthesis. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 270–274 (2014)
26.
Zurück zum Zitat Nagarajan, T.; Murthy, H.A.; Hegde, R.M.: Segmentation of speech into syllable-like units. In: Eighth European Conference on Speech Communication and Technology (2003) Nagarajan, T.; Murthy, H.A.; Hegde, R.M.: Segmentation of speech into syllable-like units. In: Eighth European Conference on Speech Communication and Technology (2003)
27.
Zurück zum Zitat Rabiner, L.R.; Schafer, R.W.: Theory and Applications of Digital Speech Processing, vol. 64. Pearson, Upper Saddle River (2011) Rabiner, L.R.; Schafer, R.W.: Theory and Applications of Digital Speech Processing, vol. 64. Pearson, Upper Saddle River (2011)
28.
Zurück zum Zitat Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)CrossRef Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87, 1738–1752 (1990)CrossRef
29.
Zurück zum Zitat Hemdal, J.F.; Lougheed, R.M.: Morphological approaches to the automatic extraction of phonetic features. IEEE Trans. Signal Process. 39, 490–497 (1991)CrossRef Hemdal, J.F.; Lougheed, R.M.: Morphological approaches to the automatic extraction of phonetic features. IEEE Trans. Signal Process. 39, 490–497 (1991)CrossRef
30.
Zurück zum Zitat Dimitriadis, D.; Maragos, P.; Potamianos, A.: On the effects of filterbank design and energy computation on robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19, 1504–1516 (2011)CrossRef Dimitriadis, D.; Maragos, P.; Potamianos, A.: On the effects of filterbank design and energy computation on robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19, 1504–1516 (2011)CrossRef
31.
Zurück zum Zitat Do, C.-T.; Pastor, D.; Goalic, A.: On the recognition of cochlear implant-like spectrally reduced speech with MFCC and HMM-based ASR. IEEE Trans. Audio Speech Lang. Process. 18, 1065–1068 (2010)CrossRef Do, C.-T.; Pastor, D.; Goalic, A.: On the recognition of cochlear implant-like spectrally reduced speech with MFCC and HMM-based ASR. IEEE Trans. Audio Speech Lang. Process. 18, 1065–1068 (2010)CrossRef
32.
Zurück zum Zitat Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)CrossRef Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)CrossRef
33.
Zurück zum Zitat Markel, J.D.; Gray, A.J.: Linear Prediction of Speech. Springer, Berlin (1976)CrossRef Markel, J.D.; Gray, A.J.: Linear Prediction of Speech. Springer, Berlin (1976)CrossRef
34.
Zurück zum Zitat Sara, S.: Ibn Sina: A Treatise on Arabic Phonetics. LINCOM Publishers, Munich (2009) Sara, S.: Ibn Sina: A Treatise on Arabic Phonetics. LINCOM Publishers, Munich (2009)
35.
Zurück zum Zitat Baig, M.M.A.; Qazi, S.A.; Kadri, M.B.: Discriminative training for phonetic recognition of the Holy Quran. Arab. J. Sci. Eng. 40, 2629–2640 (2015)CrossRef Baig, M.M.A.; Qazi, S.A.; Kadri, M.B.: Discriminative training for phonetic recognition of the Holy Quran. Arab. J. Sci. Eng. 40, 2629–2640 (2015)CrossRef
36.
Zurück zum Zitat Alghamdi, M.M.; Ajami Alotaibi, Y.: HMM automatic speech recognition system of Arabic alphadigits. Arab. J. Sci. Eng. 35, 137 (2010) Alghamdi, M.M.; Ajami Alotaibi, Y.: HMM automatic speech recognition system of Arabic alphadigits. Arab. J. Sci. Eng. 35, 137 (2010)
37.
Zurück zum Zitat Alotaibi, Y.A.; Muhammad, G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech Lang. 24, 219–231 (2010)CrossRef Alotaibi, Y.A.; Muhammad, G.: Study on pharyngeal and uvular consonants in foreign accented Arabic for ASR. Comput. Speech Lang. 24, 219–231 (2010)CrossRef
39.
Zurück zum Zitat Furui, S.: On the role of spectral transition for speech perception. J. Acoust. Soc. Am. 80, 1016–1025 (1986)CrossRef Furui, S.: On the role of spectral transition for speech perception. J. Acoust. Soc. Am. 80, 1016–1025 (1986)CrossRef
Metadaten
Titel
Unsupervised Phonetic Segmentation of Classical Arabic Speech Using Forward and Inverse Characteristics of the Vocal Tract
verfasst von
Muhammad Javed
Mirza Muhammad Ali Baig
Saad Ahmed Qazi
Publikationsdatum
22.08.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Arabian Journal for Science and Engineering / Ausgabe 3/2020
Print ISSN: 2193-567X
Elektronische ISSN: 2191-4281
DOI
https://doi.org/10.1007/s13369-019-04065-5

Weitere Artikel der Ausgabe 3/2020

Arabian Journal for Science and Engineering 3/2020 Zur Ausgabe

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.