Skip to main content
Top

2015 | OriginalPaper | Chapter

Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System

Authors : Olga Khomitsevich, Pavel Chistikov, Tatiana Krivosheeva, Natalia Epimakhova, Irina Chernykh

Published in: Speech and Computer

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose a system for automatic punctuation prediction in recognized speech using prosodic, word and grammatical features. An SVM classifier is trained using prosody, and a CRF classifier is trained on a large text dataset using word-based features. The probabilities are then fused to produce a joint decision on comma and period placement, with a second classification pass for question mark detection. Training two classifiers separately enables us to avoid data sparseness for the lexical classifier, and to increase the overall robustness of the system. This works well for Russian and could be applied to other inflected languages. The system was tested on different speech styles. On manual transcripts, we achieved an F-score of 50–71 % for periods, 46–66 % for commas, 19–47 % for question marks, and 77–87 % for “mark/no mark” classification. The results for recognizer output are 46–66 % for periods, 43–60 % for commas, 10–38 % for questions, and 64–80 % for “mark/no mark”.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Stolcke, A., Shriberg, E.: Automatic linguistic segmentation of conversational speech. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 2, pp. 1005–1008 (1996) Stolcke, A., Shriberg, E.: Automatic linguistic segmentation of conversational speech. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 2, pp. 1005–1008 (1996)
2.
go back to reference Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of ICSLP, pp. 917–920 (2002) Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of ICSLP, pp. 917–920 (2002)
3.
go back to reference Shriberg, E., Stolcke, A., Hakkani-Tr, D., Tr, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1), 127–154 (2000)CrossRef Shriberg, E., Stolcke, A., Hakkani-Tr, D., Tr, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1), 127–154 (2000)CrossRef
4.
go back to reference Chistikov, P., Khomitsevich, O.: Online automatic sentence boundary detection in a Russian ASR System. In: Proceedings of the 14th International Conference Speech and Computer - Specom 2011, pp. 112–117 (2011) Chistikov, P., Khomitsevich, O.: Online automatic sentence boundary detection in a Russian ASR System. In: Proceedings of the 14th International Conference Speech and Computer - Specom 2011, pp. 112–117 (2011)
5.
go back to reference Korenevsky, M., Ponomareva, I., Levin, K.: Online topic segmentation of russian broadcast news. In: Proceedings of the 14th International Conference on Speech and Computer - SPECOM 2011, pp. 373–378 (2011) Korenevsky, M., Ponomareva, I., Levin, K.: Online topic segmentation of russian broadcast news. In: Proceedings of the 14th International Conference on Speech and Computer - SPECOM 2011, pp. 373–378 (2011)
6.
go back to reference Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001) Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)
7.
go back to reference Wang, X., Ng, H.T., Sim, K.C.: Dynamic conditional random fields for joint sentence boundary and punctuation prediction. In: INTERSPEECH 2012 - Proceedings of th 13th Annual Conference of the International Speech Communication Association, pp. 281–286 (2012) Wang, X., Ng, H.T., Sim, K.C.: Dynamic conditional random fields for joint sentence boundary and punctuation prediction. In: INTERSPEECH 2012 - Proceedings of th 13th Annual Conference of the International Speech Communication Association, pp. 281–286 (2012)
8.
go back to reference Hasan, M., Doddipatla, R., Hain, T.: Multipass sentence end detection of lecture speech. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association (2014) Hasan, M., Doddipatla, R., Hain, T.: Multipass sentence end detection of lecture speech. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association (2014)
9.
go back to reference Kolar, J., Lamel, L.: Development and evaluation of automatic punctuation for french and english speech-to-text. In: INTERSPEECH 2012 - Proceedings of the 13th Annual Conference of the International Speech Communication Association (2012) Kolar, J., Lamel, L.: Development and evaluation of automatic punctuation for french and english speech-to-text. In: INTERSPEECH 2012 - Proceedings of the 13th Annual Conference of the International Speech Communication Association (2012)
10.
go back to reference Boakye, K., Favre, B., Hakkani-Tr, D.: Any questions? Automatic question detection in meetings. In: ASRU 2009 - IEEE Workshop on Automatic Speech Recognition & Under-standing, pp. 485–489 (2009) Boakye, K., Favre, B., Hakkani-Tr, D.: Any questions? Automatic question detection in meetings. In: ASRU 2009 - IEEE Workshop on Automatic Speech Recognition & Under-standing, pp. 485–489 (2009)
11.
go back to reference Margolis, A., Ostendorf, M.: Question detection in spoken conversations using textual conversations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 118–124 (2011) Margolis, A., Ostendorf, M.: Question detection in spoken conversations using textual conversations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 118–124 (2011)
13.
go back to reference Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014) Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014)
14.
go back to reference Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001 (2014) Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001 (2014)
15.
go back to reference Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural net-works for large-vocabulary speech recognition. IEEE Trans. Audio, Speech and Lan-guage Proc. 20(1), 30–42 (2012)CrossRef Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural net-works for large-vocabulary speech recognition. IEEE Trans. Audio, Speech and Lan-guage Proc. 20(1), 30–42 (2012)CrossRef
16.
go back to reference Schwarz, P.: Phoneme recognition based on long temporal context. Doctoral thesis, Brno, Brno University of Technology, Faculty of Information Technology (2008) Schwarz, P.: Phoneme recognition based on long temporal context. Doctoral thesis, Brno, Brno University of Technology, Faculty of Information Technology (2008)
17.
go back to reference Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: INTERSPEECH 2013 - Proceedings of the 14th Annual Conference of the International Speech Communication Association (2013) Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: INTERSPEECH 2013 - Proceedings of the 14th Annual Conference of the International Speech Communication Association (2013)
18.
go back to reference Zhang, D., Wu, S., Yang, N., Li, M.: Punctuation prediction with transition-based parsing. ACL (1), 752–760 (2013) Zhang, D., Wu, S., Yang, N., Li, M.: Punctuation prediction with transition-based parsing. ACL (1), 752–760 (2013)
Metadata
Title
Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System
Authors
Olga Khomitsevich
Pavel Chistikov
Tatiana Krivosheeva
Natalia Epimakhova
Irina Chernykh
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-23132-7_20

Premium Partner