Skip to main content

2017 | OriginalPaper | Buchkapitel

Text Punctuation: An Inter-annotator Agreement Study

verfasst von : Marek Boháč, Michal Rott, Vojtěch Kovář

Erschienen in: Text, Speech, and Dialogue

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Spoken language is a phenomenon which is hard to be annotated accurately. One of the most ambiguous tasks is to fill in the punctuation marks into the spoken language transcription. Used punctuation marks are often dependent on how annotators understand the transcription content. This may differ as the spoken language often lacks clear structure (inherent to written language) due to the utterance spontaneity or due to skipping between ideas.
Therefore we suspect that filling commas into the spoken language transcription is a very ambiguous task with low inter-annotator agreement (IAA). Low IAA also means that application of Gold Truth (GT) annotations for automatic algorithm evaluation is questionable as already discussed in [7, 8].
In this paper we analyze the IAA within group of annotators and we propose methods to increase it. We also propose and evaluate a reformulation of classical GT annotations for cases with multiple annotations available.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
All the used data are accessible at http://​nlp.​ite.​tul.​cz/​punctuation.
 
Literatur
1.
Zurück zum Zitat Boháč, M., Blavka, K., Kuchařová, M., Škodová, S.: Post-processing of the recognized speech for web presentation of large audio archive. In: 2012 35th International Conference on Telecommunications and Signal Processing (TSP), pp. 441–445, July 2012 Boháč, M., Blavka, K., Kuchařová, M., Škodová, S.: Post-processing of the recognized speech for web presentation of large audio archive. In: 2012 35th International Conference on Telecommunications and Signal Processing (TSP), pp. 441–445, July 2012
2.
Zurück zum Zitat Boháč, M., Nouza, J., Blavka, K.: Investigation on most frequent errors in large-scale speech recognition applications. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 520–527. Springer, Heidelberg (2012). doi:10.1007/978-3-642-32790-2_63 CrossRef Boháč, M., Nouza, J., Blavka, K.: Investigation on most frequent errors in large-scale speech recognition applications. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 520–527. Springer, Heidelberg (2012). doi:10.​1007/​978-3-642-32790-2_​63 CrossRef
3.
Zurück zum Zitat Kolář, J., Švec, J., Psutka, J.: Automatic punctuation annotation in Czech broadcast news speech. In: 9th Conference Speech and Computer (2004) Kolář, J., Švec, J., Psutka, J.: Automatic punctuation annotation in Czech broadcast news speech. In: 9th Conference Speech and Computer (2004)
4.
Zurück zum Zitat Kovář, V.: Partial grammar checking for Czech using the SET parser. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 308–314. Springer, Cham (2014). doi:10.1007/978-3-319-10816-2_38 Kovář, V.: Partial grammar checking for Czech using the SET parser. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 308–314. Springer, Cham (2014). doi:10.​1007/​978-3-319-10816-2_​38
5.
Zurück zum Zitat Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis as pattern matching: the SET parsing system. In: Proceedings of 4th Language and Technology Conference, Wydawnictwo Poznańskie, Poznań, Poland, pp. 978–983 (2009) Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis as pattern matching: the SET parsing system. In: Proceedings of 4th Language and Technology Conference, Wydawnictwo Poznańskie, Poznań, Poland, pp. 978–983 (2009)
6.
Zurück zum Zitat Kovář, V., Machura, J., Zemková, K., Rott, M.: Evaluation and improvements in punctuation detection for Czech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS, vol. 9924, pp. 287–294. Springer, Cham (2016). doi:10.1007/978-3-319-45510-5_33 Kovář, V., Machura, J., Zemková, K., Rott, M.: Evaluation and improvements in punctuation detection for Czech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2016. LNCS, vol. 9924, pp. 287–294. Springer, Cham (2016). doi:10.​1007/​978-3-319-45510-5_​33
7.
Zurück zum Zitat Kovář, V.: Evaluating natural language processing tasks with low inter-annotator agreement: the case of corpus applications. In: Recent Advances in Slavonic Natural Language Processing, RASLAN 2016, pp. 127–134 (2016) Kovář, V.: Evaluating natural language processing tasks with low inter-annotator agreement: the case of corpus applications. In: Recent Advances in Slavonic Natural Language Processing, RASLAN 2016, pp. 127–134 (2016)
8.
Zurück zum Zitat Kovář, V., Jakubíček, M., Horák, A.: On evaluation of natural language processing tasks - is gold standard evaluation methodology a good solution? In: Proceedings of the ICAART 2016, vol. 2, pp. 540–545. SCITEPRESS (2016) Kovář, V., Jakubíček, M., Horák, A.: On evaluation of natural language processing tasks - is gold standard evaluation methodology a good solution? In: Proceedings of the ICAART 2016, vol. 2, pp. 540–545. SCITEPRESS (2016)
9.
Zurück zum Zitat Mihajlik, P., Fegyó, T., Németh, B., Tüske, Z., Trón, V.: Towards automatic transcription of large spoken archives in agglutinating languages – Hungarian ASR for the MALACH Project. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS, vol. 4629, pp. 342–349. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74628-7_45 CrossRef Mihajlik, P., Fegyó, T., Németh, B., Tüske, Z., Trón, V.: Towards automatic transcription of large spoken archives in agglutinating languages – Hungarian ASR for the MALACH Project. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS, vol. 4629, pp. 342–349. Springer, Heidelberg (2007). doi:10.​1007/​978-3-540-74628-7_​45 CrossRef
10.
Zurück zum Zitat Nouza, J., Červa, P., Ždánský, J., et al.: Speech-to-text technology to transcribe and disclose 100, 000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive. In: INTERSPEECH 2014, pp. 964–968 (2014) Nouza, J., Červa, P., Ždánský, J., et al.: Speech-to-text technology to transcribe and disclose 100, 000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive. In: INTERSPEECH 2014, pp. 964–968 (2014)
11.
Zurück zum Zitat Petkevič, V.: Kontrola české gramatiky (český grammar checker). Studie z aplikované lingvistiky-Studies in Applied Linguistics 5(2), 48–66 (2014) Petkevič, V.: Kontrola české gramatiky (český grammar checker). Studie z aplikované lingvistiky-Studies in Applied Linguistics 5(2), 48–66 (2014)
Metadaten
Titel
Text Punctuation: An Inter-annotator Agreement Study
verfasst von
Marek Boháč
Michal Rott
Vojtěch Kovář
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-64206-2_14