Skip to main content

2017 | OriginalPaper | Buchkapitel

Phonetic Segmentation Using Knowledge from Visual and Perceptual Domain

verfasst von : Bhavik Vachhani, Chitralekha Bhat, Sunil Kopparapu

Erschienen in: Text, Speech, and Dialogue

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Accurate and automatic phonetic segmentation is crucial for several speech based applications such as phone level articulation analysis and error detection, speech synthesis, annotation, speech recognition and emotion recognition. In this paper we examine the effectiveness of using visual features obtained by processing the image spectrogram of a speech utterance, as applied to phonetic segmentation. Further, we propose a mechanism to combine the knowledge from visual and perceptual domains for automatic phonetic segmentation. This process can be considered analogous to manual phonetic segmentation. The technique was evaluated on TIMIT American English Corpus. Experimental results show significant improvements in phonetic segmentation, especially for lower tolerances of 5, 10 and 15 ms, with an absolute improvement of 8.29% for TIMIT database for a 10 ms tolerance is observed.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adell, J., Bonafonte, A.: Towards phone segmentation for concatenative speech synthesis. In: Proceedings of the 5th ISCA Speech Snthesis Workshop, pp. 139–144 (2004) Adell, J., Bonafonte, A.: Towards phone segmentation for concatenative speech synthesis. In: Proceedings of the 5th ISCA Speech Snthesis Workshop, pp. 139–144 (2004)
2.
Zurück zum Zitat Dusan, S., Rabiner, L.R.: On the relation between maximum spectral transition positions and phone boundaries. In: INTERSPEECH- ICSLP, Ninth International Conference on Spoken Language Processing, 17–21 September 2006, Pittsburgh, PA, USA (2006) Dusan, S., Rabiner, L.R.: On the relation between maximum spectral transition positions and phone boundaries. In: INTERSPEECH- ICSLP, Ninth International Conference on Spoken Language Processing, 17–21 September 2006, Pittsburgh, PA, USA (2006)
3.
Zurück zum Zitat Garofolo, J.S.: Getting started with the darpa timit cd-rom: an acoustic phonetic continuous speech database. In: National Institute of Standards and Technology (NIST) (1988) Garofolo, J.S.: Getting started with the darpa timit cd-rom: an acoustic phonetic continuous speech database. In: National Institute of Standards and Technology (NIST) (1988)
4.
Zurück zum Zitat Golipour, L., O’Shaughnessy, D.D.: A new approach for phoneme segmentation of speech signals. In: INTERSPEECH, pp. 1933–1936. ISCA (2007) Golipour, L., O’Shaughnessy, D.D.: A new approach for phoneme segmentation of speech signals. In: INTERSPEECH, pp. 1933–1936. ISCA (2007)
5.
Zurück zum Zitat Kalinli, O.: Automatic phoneme segmentation using auditory attention features. In: Proceedings of the INTERSPEECH, pp. 2270–2273 (2012) Kalinli, O.: Automatic phoneme segmentation using auditory attention features. In: Proceedings of the INTERSPEECH, pp. 2270–2273 (2012)
6.
Zurück zum Zitat Keshet, J., Shalev-Shwartz, S., Singer, Y., Chazan, D.: Phoneme alignment based on discriminative learning. In: INTERSPEECH 2005, pp. 2961–2964 (2005) Keshet, J., Shalev-Shwartz, S., Singer, Y., Chazan, D.: Phoneme alignment based on discriminative learning. In: INTERSPEECH 2005, pp. 2961–2964 (2005)
7.
Zurück zum Zitat King, S., Hasegawa-Johnson, M.: Accurate speech segmentation by mimicking human auditory processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, 26–31 May, Vancouver, BC, Canada, pp. 8096–8100 (2013) King, S., Hasegawa-Johnson, M.: Accurate speech segmentation by mimicking human auditory processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, 26–31 May, Vancouver, BC, Canada, pp. 8096–8100 (2013)
8.
Zurück zum Zitat Leow, S.J., Chng, E.S., Lee, C.H.: Language-resource independent speech segmentation using cues from a spectrogram image. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5813–5817, April 2015 Leow, S.J., Chng, E.S., Lee, C.H.: Language-resource independent speech segmentation using cues from a spectrogram image. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5813–5817, April 2015
9.
Zurück zum Zitat Lo, H.Y., Wang, H.M.: Phonetic boundary refinement using support vector machine. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 933–936, April 2007 Lo, H.Y., Wang, H.M.: Phonetic boundary refinement using support vector machine. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. 933–936, April 2007
10.
Zurück zum Zitat Patil, V., Joshi, S., Rao, P.: Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach. In: INTERSPEECH, pp. 2543–2546 (2009) Patil, V., Joshi, S., Rao, P.: Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach. In: INTERSPEECH, pp. 2543–2546 (2009)
11.
Zurück zum Zitat Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W.: The buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Commun. 45, 89–95 (2005)CrossRef Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W.: The buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Commun. 45, 89–95 (2005)CrossRef
12.
Zurück zum Zitat Prieto, G.A., Parker, R.L., Thomson, D.J., Vernon, F.L., Graham, R.L.: Reducing the Bias of Multitaper Spectrum Estimates, vol. 171, pp. 1269–1281. Oxford University Press, Oxford (2007) Prieto, G.A., Parker, R.L., Thomson, D.J., Vernon, F.L., Graham, R.L.: Reducing the Bias of Multitaper Spectrum Estimates, vol. 171, pp. 1269–1281. Oxford University Press, Oxford (2007)
13.
Zurück zum Zitat Qiao, Y., Shimomura, N., Minematsu, N.: Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons. In: ICASSP, pp. 3989–3992 (2008) Qiao, Y., Shimomura, N., Minematsu, N.: Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons. In: ICASSP, pp. 3989–3992 (2008)
14.
Zurück zum Zitat Raymond, W.D., Pitt, M.A., Johnson, K., Hume, E., Makashay, M.J., Dautricourt, R., Hilts, C.: An analysis of transcription consistency in spontaneous speech from the buckeye corpus. In: INTERSPEECH (2002) Raymond, W.D., Pitt, M.A., Johnson, K., Hume, E., Makashay, M.J., Dautricourt, R., Hilts, C.: An analysis of transcription consistency in spontaneous speech from the buckeye corpus. In: INTERSPEECH (2002)
15.
Zurück zum Zitat Shah, N.J., Vachhani, B.B., Sailor, H.B., Patil, H.A.: Effectiveness of PLP-based phonetic segmentation for speech synthesis. In: Proceedings of the ICASSP, Florence, Italy, pp. 270–274 (2014) Shah, N.J., Vachhani, B.B., Sailor, H.B., Patil, H.A.: Effectiveness of PLP-based phonetic segmentation for speech synthesis. In: Proceedings of the ICASSP, Florence, Italy, pp. 270–274 (2014)
16.
Zurück zum Zitat Stolcke, A., Ryant, N., Mitra, V., Yuan, J., Wang, W., Liberman, M.: Highly accurate phonetic segmentation using boundary correction models and system fusion. In: Proceedings of the ICASSP, Florence, Italy, pp. 5552–5556 (2014) Stolcke, A., Ryant, N., Mitra, V., Yuan, J., Wang, W., Liberman, M.: Highly accurate phonetic segmentation using boundary correction models and system fusion. In: Proceedings of the ICASSP, Florence, Italy, pp. 5552–5556 (2014)
17.
Zurück zum Zitat Thomson, D.: Spectrum estimation and harmonic analysis. Proc. IEEE 70, 1055–1096 (1982)CrossRef Thomson, D.: Spectrum estimation and harmonic analysis. Proc. IEEE 70, 1055–1096 (1982)CrossRef
18.
Zurück zum Zitat Vachhani, B., Bhat, C., Kopparapu, S.: Robust phonetic segmentation using multi-taper spectral estimation for noisy and clipped speech. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp. 1343–1347, August 2016 Vachhani, B., Bhat, C., Kopparapu, S.: Robust phonetic segmentation using multi-taper spectral estimation for noisy and clipped speech. In: 2016 24th European Signal Processing Conference (EUSIPCO), pp. 1343–1347, August 2016
19.
Zurück zum Zitat Wesenick, M.B., Kipp, A.: Estimating the quality of phonetic transcriptions and segmentations of speech signals. In: Proceedings of the Fourth International Conference on Spoken Language 1996, ICSLP 1996, vol. 1, pp. 129–132. IEEE (1996) Wesenick, M.B., Kipp, A.: Estimating the quality of phonetic transcriptions and segmentations of speech signals. In: Proceedings of the Fourth International Conference on Spoken Language 1996, ICSLP 1996, vol. 1, pp. 129–132. IEEE (1996)
20.
Zurück zum Zitat Yuan, J., Ryant, N., Liberman, M., Stolcke, A., Mitra, V., Wang, W.: Automatic phonetic segmentation using boundary models. In: Proceedings of the INTERSPEECH, pp. 2306–2310 (2013) Yuan, J., Ryant, N., Liberman, M., Stolcke, A., Mitra, V., Wang, W.: Automatic phonetic segmentation using boundary models. In: Proceedings of the INTERSPEECH, pp. 2306–2310 (2013)
Metadaten
Titel
Phonetic Segmentation Using Knowledge from Visual and Perceptual Domain
verfasst von
Bhavik Vachhani
Chitralekha Bhat
Sunil Kopparapu
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64206-2_44