skip to main content
10.1145/3025453.3025640acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Honorable Mention

Respeak: A Voice-based, Crowd-powered Speech Transcription System

Published:02 May 2017Publication History

ABSTRACT

Speech transcription is an expensive service with high turnaround time for audio files containing languages spoken in developing countries and regional accents of well-represented languages. We present Respeak - a voice-based, crowd-powered system that capitalizes on the strengths of crowdsourcing and automatic speech recognition (instead of typing) to transcribe such audio files. We created Respeak and optimized its design through a series of cognitive experiments. We deployed it with 25 university students in India who completed 5464 micro-transcription tasks, transcribing 55 minutes of widely-varied audio content, and collectively earning USD 46 as mobile airtime. The Respeak engine aligned the transcript generated by five randomly selected users to transcribe Hindi and Indian English audio files with a word error rate (WER) of 8.6% and 15.2%, respectively. The cost of speech transcription was USD 0.83 per minute with a turnaround time of 39.8 hours, substantially less than industry standards. Using a mixed-methods analysis of cognitive experiments, system performance and qualitative interviews, we evaluate Respeak's design, user experience, strengths, and weaknesses. Our findings suggest that Respeak improves the quality of speech transcription while enhancing the earning potential of low-income populations in resource-constrained settings.

Skip Supplemental Material Section

Supplemental Material

pn1920p.mp4

mp4

2 MB

References

  1. 2012. Press Note on Release of Data on Houses, Household Amenities and Assets, Census 2011. Technical Report. Ministry of Home Affairs, Government of India. http://censusindia.gov.in/2011census/hlo/Data_sheet/ India/HLO_Press_Release.pdfGoogle ScholarGoogle Scholar
  2. 2014. Global Findex 2014 - Financial Inclusion. Technical Report. World Bank. http://datatopics. worldbank.org/financialinclusion/country/indiaGoogle ScholarGoogle Scholar
  3. 2016. CastingWords. (2016). https://castingwords.com/.Google ScholarGoogle Scholar
  4. 2016. CLOUD SPEECH API: Speech to text conversion powered by machine learning. (2016). https://cloud.google.com/speech/.Google ScholarGoogle Scholar
  5. 2016. CrowdSurf. (2016). http://crowdsurfwork.com/.Google ScholarGoogle Scholar
  6. 2016. Google Input Tools. (2016). https://www.google.com/inputtools/.Google ScholarGoogle Scholar
  7. 2016. India Average Daily Wage Rate Forecast 2016--2020. (2016). http://www.tradingeconomics.com/india/wages/forecast.Google ScholarGoogle Scholar
  8. 2016. India Typing. (2016). http://indiatyping.com/.Google ScholarGoogle Scholar
  9. 2016. Jana. (2016). https://www.jana.com/.Google ScholarGoogle Scholar
  10. 2016. Medical Transcription Services Market - Global Industry Analysis, Size, Share, Growth, Trends and Forecast, 2013 - 2019. Technical Report. Transparency Market Research.Google ScholarGoogle Scholar
  11. 2016. Quick Transcription Service. (2016). http://www.quicktranscriptionservice.com/ Hindi-Transcription.html.Google ScholarGoogle Scholar
  12. 2016. Rev. (2016). https://www.rev.com/.Google ScholarGoogle Scholar
  13. 2016. Samasource. (2016). http://www.samasource.org/.Google ScholarGoogle Scholar
  14. 2016. Scripts Complete. (2016). http://scriptscomplete. com/Hindi-Transcription-Services.php.Google ScholarGoogle Scholar
  15. 2016. SpeechPad. (2016). https://www.speechpad.com/.Google ScholarGoogle Scholar
  16. 2016. Tigerfish. (2016). http://tigerfish.com/.Google ScholarGoogle Scholar
  17. 2016a. TranscribeMe. (2016). http://transcribeme.com/.Google ScholarGoogle Scholar
  18. 2016b. Transcription Services Us. (2016). http://www.transcription-services-us.com/ Language-Transcription-Rates.php.Google ScholarGoogle Scholar
  19. Rio Akasaka. 2009. Foreign accented speech transcription and accent recognition using a game-based approach. Ph.D. Dissertation. Swarthmore Dept. of Linguistics.Google ScholarGoogle Scholar
  20. Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2012. Tools for Placing Cuts and Transitions in Interview Video. ACM Transaction of Graphics 31, 4 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nathan Eagle. 2009. Txteagle: Mobile Crowdsourcing. In Proceedings of HCI International. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Keelan Evanini and Klaus Zechner. 2011. Using crowdsourcing to provide prosodic annotations for non-native speech. In Proceedings of Interspeech.Google ScholarGoogle ScholarCross RefCross Ref
  23. Alexander Gruenstein, Ian McGraw, and Andrew Sutherland. 2009. A Self-Transcribing Speech Corpus: Collecting Continuous Speech with an Online Educational Game. In Proceedings of SLaTE.Google ScholarGoogle Scholar
  24. Aakar Gupta, William Thies, Edward Cutrell, and Ravin Balakrishnan. 2012. mClerk: Enabling Mobile Crowdsourcing in Developing Regions. In Proceedings of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Toru Imai, Atsushi Matsui, Shinichi Homma, Takeshi Kobayakawa, Kazuo Onoe, Shoei Sato, and Akio Ando. 2002. Speech recognition with a re-speak method for subtitling live broadcasts. In Proceedings of ICSLP.Google ScholarGoogle ScholarCross RefCross Ref
  26. Jennifer Lai and John Vergo. 1997. MedSpeak: Report Creation with Continuous Speech Recognition. In Proceedings of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ian Lane, Alex Waibel, Matthias Eck, and Kay Rottmann. 2010. Tools for Collecting Speech Corpora via Mechanical Turk. In Proceedings of the NAACL HLT.Google ScholarGoogle Scholar
  28. Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time Captioning by Groups of Non-experts. In Proceedings of UIST. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jonathan Ledlie, Billy Odero, Einat Minkov, Imre Kiss, and Joseph Polifroni. 2010. Crowd Translator: On Building Localized Speech Recognizers Through Micropayments. SIGOPS Oper. Syst. Rev. 43, 4 (Jan. 2010).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Chia-ying Lee and James Glass. 2011. A Transcription Task for Crowdsourcing with Automatic Quality Control. In Proceedings of Interspeech.Google ScholarGoogle Scholar
  31. Ian Mcgraw, Er Gruenstein, and Andrew Sutherl. 2009. A Self-Labeling Speech Corpus: Collecting Spoken Words with an Online Educational Game. In Proceedings of Interspeech.Google ScholarGoogle ScholarCross RefCross Ref
  32. Ian Mcgraw, Chia-ying Lee, Lee Hetherington, Stephanie Seneff, and Jim Glass. 2010. Collecting Voices from the Cloud. In Proceedings of LREC.Google ScholarGoogle Scholar
  33. Mary Meeker. 2015. 2015 Internet Trends. Technical Report. KPCB. http://www.kpcb.com/blog/2015-internet-trendsGoogle ScholarGoogle Scholar
  34. Mary Meeker and Liang Wu. 2014. 2014 Internet Trends. Technical Report. KPCB. https://www.kpcb.com/insights/2014-internet-trendsGoogle ScholarGoogle Scholar
  35. Preeti Mudliar, Jonathan Donner, and William Thies. 2012. Emergent Practices Around CGNet Swara, A Voice Forum for Citizen Journalism in Rural India. In Proceedings of ICTD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Iftekhar Naim, Daniel Gildea, Walter S. Lasecki, and Jeffrey P. Bigham. 2013. Text Alignment for Real-Time Crowd Captioning. In Proceesings of HLT-NAACL.Google ScholarGoogle Scholar
  37. Prayag Narula, David Rolnitzky, and Bjoern Hartmann. 2011. MobileWorks: A Mobile Crowdsourcing Platform for Workers at the Bottom of the Pyramid. In In Proceedings of HCOMP.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Parent and M. Eskenazi. 2010. Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data. In Proceedings of SLT. Google ScholarGoogle ScholarCross RefCross Ref
  39. Ales PrazÃak, Zdenek Loose, Jan Trmal, Josef V. Psutka, and Josef Psutka. 2012. Novel Approach to Live Captioning Through Re-speaking: Tailoring Speech Recognition to Re-speaker's Needs.. In Proceedings of Interspeech.Google ScholarGoogle ScholarCross RefCross Ref
  40. Agha Ali Raza, Farhan Ul Haq, Zain Tariq, Mansoor Pervaiz, Samia Razaq, Umar Saif, and Roni Rosenfeld. 2013. Job Opportunities Through Entertainment: Virally Spread Speech-based Services for Low-literate Users. In Proceedings of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Venkatesh Sivaraman, Dongwook Yoon, and Piotr Mitros. 2016. Simplified Audio Production in Asynchronous Voice-Based Discussions. In Proceedings of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Matthias Sperber, Graham Neubig, Christian Fugen, Satoshi Nakamura, and Alex Waibel. 2013. Efficient Speech Transcription Through Respeaking. In Proceesings of Interspeech.Google ScholarGoogle Scholar
  43. Aditya Vashistha, Edward Cutrell, Gaetano Borriello, and William Thies. 2015. Sangeet Swara: A Community-Moderated Voice Forum in Rural India. In Proceedings of CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Dongwook Yoon, Nicholas Chen, Franğois Guimbretire, and Abigail Sellen. 2014. RichReview: Blending Ink, Speech, and Gesture to Support Collaborative Document Review. In Proceedings of UIST. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Respeak: A Voice-based, Crowd-powered Speech Transcription System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
      May 2017
      7138 pages
      ISBN:9781450346559
      DOI:10.1145/3025453

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 May 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '17 Paper Acceptance Rate600of2,400submissions,25%Overall Acceptance Rate6,199of26,314submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader