Skip to main content

2015 | OriginalPaper | Buchkapitel

Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System

verfasst von : Olga Khomitsevich, Pavel Chistikov, Tatiana Krivosheeva, Natalia Epimakhova, Irina Chernykh

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a system for automatic punctuation prediction in recognized speech using prosodic, word and grammatical features. An SVM classifier is trained using prosody, and a CRF classifier is trained on a large text dataset using word-based features. The probabilities are then fused to produce a joint decision on comma and period placement, with a second classification pass for question mark detection. Training two classifiers separately enables us to avoid data sparseness for the lexical classifier, and to increase the overall robustness of the system. This works well for Russian and could be applied to other inflected languages. The system was tested on different speech styles. On manual transcripts, we achieved an F-score of 50–71 % for periods, 46–66 % for commas, 19–47 % for question marks, and 77–87 % for “mark/no mark” classification. The results for recognizer output are 46–66 % for periods, 43–60 % for commas, 10–38 % for questions, and 64–80 % for “mark/no mark”.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Stolcke, A., Shriberg, E.: Automatic linguistic segmentation of conversational speech. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 2, pp. 1005–1008 (1996) Stolcke, A., Shriberg, E.: Automatic linguistic segmentation of conversational speech. In: Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 2, pp. 1005–1008 (1996)
2.
Zurück zum Zitat Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of ICSLP, pp. 917–920 (2002) Huang, J., Zweig, G.: Maximum entropy model for punctuation annotation from speech. In: Proceedings of ICSLP, pp. 917–920 (2002)
3.
Zurück zum Zitat Shriberg, E., Stolcke, A., Hakkani-Tr, D., Tr, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1), 127–154 (2000)CrossRef Shriberg, E., Stolcke, A., Hakkani-Tr, D., Tr, G.: Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1), 127–154 (2000)CrossRef
4.
Zurück zum Zitat Chistikov, P., Khomitsevich, O.: Online automatic sentence boundary detection in a Russian ASR System. In: Proceedings of the 14th International Conference Speech and Computer - Specom 2011, pp. 112–117 (2011) Chistikov, P., Khomitsevich, O.: Online automatic sentence boundary detection in a Russian ASR System. In: Proceedings of the 14th International Conference Speech and Computer - Specom 2011, pp. 112–117 (2011)
5.
Zurück zum Zitat Korenevsky, M., Ponomareva, I., Levin, K.: Online topic segmentation of russian broadcast news. In: Proceedings of the 14th International Conference on Speech and Computer - SPECOM 2011, pp. 373–378 (2011) Korenevsky, M., Ponomareva, I., Levin, K.: Online topic segmentation of russian broadcast news. In: Proceedings of the 14th International Conference on Speech and Computer - SPECOM 2011, pp. 373–378 (2011)
6.
Zurück zum Zitat Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001) Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: ISCA Tutorial and Research Workshop (ITRW) on Prosody in Speech Recognition and Understanding (2001)
7.
Zurück zum Zitat Wang, X., Ng, H.T., Sim, K.C.: Dynamic conditional random fields for joint sentence boundary and punctuation prediction. In: INTERSPEECH 2012 - Proceedings of th 13th Annual Conference of the International Speech Communication Association, pp. 281–286 (2012) Wang, X., Ng, H.T., Sim, K.C.: Dynamic conditional random fields for joint sentence boundary and punctuation prediction. In: INTERSPEECH 2012 - Proceedings of th 13th Annual Conference of the International Speech Communication Association, pp. 281–286 (2012)
8.
Zurück zum Zitat Hasan, M., Doddipatla, R., Hain, T.: Multipass sentence end detection of lecture speech. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association (2014) Hasan, M., Doddipatla, R., Hain, T.: Multipass sentence end detection of lecture speech. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association (2014)
9.
Zurück zum Zitat Kolar, J., Lamel, L.: Development and evaluation of automatic punctuation for french and english speech-to-text. In: INTERSPEECH 2012 - Proceedings of the 13th Annual Conference of the International Speech Communication Association (2012) Kolar, J., Lamel, L.: Development and evaluation of automatic punctuation for french and english speech-to-text. In: INTERSPEECH 2012 - Proceedings of the 13th Annual Conference of the International Speech Communication Association (2012)
10.
Zurück zum Zitat Boakye, K., Favre, B., Hakkani-Tr, D.: Any questions? Automatic question detection in meetings. In: ASRU 2009 - IEEE Workshop on Automatic Speech Recognition & Under-standing, pp. 485–489 (2009) Boakye, K., Favre, B., Hakkani-Tr, D.: Any questions? Automatic question detection in meetings. In: ASRU 2009 - IEEE Workshop on Automatic Speech Recognition & Under-standing, pp. 485–489 (2009)
11.
Zurück zum Zitat Margolis, A., Ostendorf, M.: Question detection in spoken conversations using textual conversations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 118–124 (2011) Margolis, A., Ostendorf, M.: Question detection in spoken conversations using textual conversations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 118–124 (2011)
13.
Zurück zum Zitat Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014) Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014)
14.
Zurück zum Zitat Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001 (2014) Tomashenko, N., Khokhlov, Y.: Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing. In: INTERSPEECH 2014 - Proceedings of the 15th Annual Conference of the International Speech Communication Association, pp. 2997–3001 (2014)
15.
Zurück zum Zitat Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural net-works for large-vocabulary speech recognition. IEEE Trans. Audio, Speech and Lan-guage Proc. 20(1), 30–42 (2012)CrossRef Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural net-works for large-vocabulary speech recognition. IEEE Trans. Audio, Speech and Lan-guage Proc. 20(1), 30–42 (2012)CrossRef
16.
Zurück zum Zitat Schwarz, P.: Phoneme recognition based on long temporal context. Doctoral thesis, Brno, Brno University of Technology, Faculty of Information Technology (2008) Schwarz, P.: Phoneme recognition based on long temporal context. Doctoral thesis, Brno, Brno University of Technology, Faculty of Information Technology (2008)
17.
Zurück zum Zitat Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: INTERSPEECH 2013 - Proceedings of the 14th Annual Conference of the International Speech Communication Association (2013) Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: INTERSPEECH 2013 - Proceedings of the 14th Annual Conference of the International Speech Communication Association (2013)
18.
Zurück zum Zitat Zhang, D., Wu, S., Yang, N., Li, M.: Punctuation prediction with transition-based parsing. ACL (1), 752–760 (2013) Zhang, D., Wu, S., Yang, N., Li, M.: Punctuation prediction with transition-based parsing. ACL (1), 752–760 (2013)
Metadaten
Titel
Combining Prosodic and Lexical Classifiers for Two-Pass Punctuation Detection in a Russian ASR System
verfasst von
Olga Khomitsevich
Pavel Chistikov
Tatiana Krivosheeva
Natalia Epimakhova
Irina Chernykh
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-23132-7_20

Premium Partner