Skip to main content

2016 | OriginalPaper | Buchkapitel

Collaborator Effort Optimisation in Multimodal Crowdsourcing for Transcribing Historical Manuscripts

verfasst von : Emilio Granell, Carlos-D. Martínez-Hinarejos

Erschienen in: Advances in Speech and Language Technologies for Iberian Languages

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Crowdsourcing is a powerful tool for massive transcription at a relatively low cost, since the transcription effort is distributed into a set of collaborators, and therefore, supervision effort of professional transcribers may be dramatically reduced. Nevertheless, collaborators are a scarce resource, which makes optimisation very important in order to get the maximum benefit from their efforts. In this work, the optimisation of the work load in the side of collaborators is studied in a multimodal crowdsourcing platform where speech dictation of handwritten text lines is used as transcription source. The experiments explore how this optimisation allows to obtain similar results reducing the number of collaborators and the number of text lines that they have to read.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Fischer, A., Wüthrich, M., Liwicki, M., Frinken, V., Bunke, H., Viehhauser, G., Stolz, M.: Automatic transcription of handwritten medieval documents. In: Proceedings of the 15th VSMM, pp. 137–142 (2009) Fischer, A., Wüthrich, M., Liwicki, M., Frinken, V., Bunke, H., Viehhauser, G., Stolz, M.: Automatic transcription of handwritten medieval documents. In: Proceedings of the 15th VSMM, pp. 137–142 (2009)
2.
Zurück zum Zitat Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)CrossRef Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)CrossRef
3.
Zurück zum Zitat Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 86–96 (2011)CrossRef Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 86–96 (2011)CrossRef
4.
Zurück zum Zitat Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (1993)MATH Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (1993)MATH
5.
Zurück zum Zitat Hinton, G., Deng, L., Dong, Y., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef Hinton, G., Deng, L., Dong, Y., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)CrossRef
6.
Zurück zum Zitat Granell, E., Martínez-Hinarejos, C.D.: A multimodal crowdsourcing framework for transcribing historical handwritten documents. In Proceedings of the 16th DocEng, pp. 157–163 (2016) Granell, E., Martínez-Hinarejos, C.D.: A multimodal crowdsourcing framework for transcribing historical handwritten documents. In Proceedings of the 16th DocEng, pp. 157–163 (2016)
7.
Zurück zum Zitat Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)CrossRef Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)CrossRef
8.
Zurück zum Zitat Xue, J., Zhao, Y.: Improved confusion network algorithm and shortest path search from word lattice. In: Proceedings of the 30th ICASSP, vol. 1, pp. 853–856 (2005) Xue, J., Zhao, Y.: Improved confusion network algorithm and shortest path search from word lattice. In: Proceedings of the 30th ICASSP, vol. 1, pp. 853–856 (2005)
9.
Zurück zum Zitat Alabau, V., Romero, V., Lagarda, A.L., Martínez-Hinarejos, C.D.: A multimodal approach to dictation of handwritten historical documents. In: Proceedings of the 12th Interspeech, pp. 2245–2248 (2011) Alabau, V., Romero, V., Lagarda, A.L., Martínez-Hinarejos, C.D.: A multimodal approach to dictation of handwritten historical documents. In: Proceedings of the 12th Interspeech, pp. 2245–2248 (2011)
10.
Zurück zum Zitat Granell, E., Martínez-Hinarejos, C.D.: Combining handwriting and speech recognition for transcribing historical handwritten documents. In: Proceedings of the 13th ICDAR, pp. 126–130 (2015) Granell, E., Martínez-Hinarejos, C.D.: Combining handwriting and speech recognition for transcribing historical handwritten documents. In: Proceedings of the 13th ICDAR, pp. 126–130 (2015)
11.
Zurück zum Zitat Rueber, B.: Obtaining confidence measures from sentence probabilities. In: Proceedings of the 5th Eurospeech, pp. 739–742 (1997) Rueber, B.: Obtaining confidence measures from sentence probabilities. In: Proceedings of the 5th Eurospeech, pp. 739–742 (1997)
12.
Zurück zum Zitat Wessel, F., Schlüter, R., Macherey, K., Ney, H.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 9(3), 288–298 (2001)CrossRef Wessel, F., Schlüter, R., Macherey, K., Ney, H.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 9(3), 288–298 (2001)CrossRef
13.
Zurück zum Zitat Serrano, N., Castro, F., Juan, A.: The RODRIGO database. In: Proceedings of the 7th LREC, pp. 2709–2712 (2010) Serrano, N., Castro, F., Juan, A.: The RODRIGO database. In: Proceedings of the 7th LREC, pp. 2709–2712 (2010)
14.
Zurück zum Zitat Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: Proceedings of the 3rd EuroSpeech, pp. 175–178 (1993) Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: Proceedings of the 3rd EuroSpeech, pp. 175–178 (1993)
15.
Zurück zum Zitat Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: Proceedings of the 19th ICPR, pp. 1–4 (2008) Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: Proceedings of the 19th ICPR, pp. 1–4 (2008)
16.
Zurück zum Zitat Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book. Cambridge University Engineering Department, Cambridge (2006) Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book. Cambridge University Engineering Department, Cambridge (2006)
17.
Zurück zum Zitat Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of ICASSP, vol. 1, pp. 181–184 (1995) Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of ICASSP, vol. 1, pp. 181–184 (1995)
18.
Zurück zum Zitat Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in ASR performance evaluation. In: Proceedings of ICASSP, vol. 1, pp. 409–412 (2004) Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in ASR performance evaluation. In: Proceedings of ICASSP, vol. 1, pp. 409–412 (2004)
19.
Zurück zum Zitat Luján-Mares, M., Tamarit, V., Alabau, V., Martínez-Hinarejos, C.D., Pastor, M., Sanchis, A., Toselli, A.H.: iATROS: a speech and handwritting recognition system. In: Procedings of the V Jornadas en Tecnologías del Habla, pp. 75–78 (2008) Luján-Mares, M., Tamarit, V., Alabau, V., Martínez-Hinarejos, C.D., Pastor, M., Sanchis, A., Toselli, A.H.: iATROS: a speech and handwritting recognition system. In: Procedings of the V Jornadas en Tecnologías del Habla, pp. 75–78 (2008)
20.
Zurück zum Zitat Stolcke, A.: SRILM-an extensible language modeling toolkit. In Proceedings of the 3rd Interspeech, pp. 901–904 (2002) Stolcke, A.: SRILM-an extensible language modeling toolkit. In Proceedings of the 3rd Interspeech, pp. 901–904 (2002)
Metadaten
Titel
Collaborator Effort Optimisation in Multimodal Crowdsourcing for Transcribing Historical Manuscripts
verfasst von
Emilio Granell
Carlos-D. Martínez-Hinarejos
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49169-1_23