skip to main content
10.1145/2884781.2884824acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Too long; didn't watch!: extracting relevant fragments from software development video tutorials

Published:14 May 2016Publication History

ABSTRACT

When knowledgeable colleagues are not available, developers resort to offline and online resources, e.g., tutorials, mailing lists, and Q&A websites. These, however, need to be found, read, and understood, which takes its toll in terms of time and mental energy. A more immediate and accessible resource are video tutorials found on the web, which in recent years have seen a steep increase in popularity. Nonetheless, videos are an intrinsically noisy data source, and finding the right piece of information might be even more cumbersome than using the previously mentioned resources.

We present CodeTube, an approach which mines video tutorials found on the web, and enables developers to query their contents. The video tutorials are split into coherent fragments, to return only fragments related to the query. These are complemented with information from additional sources, such as Stack Overflow discussions. The results of two studies to assess CodeTube indicate that video tutorials---if appropriately processed---represent a useful, yet still under-utilized source of information for software development.

References

  1. M. Acharya, T. Xie, J. Pei, and J. Xu. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In Proceedings of ESEC/FSE 2007 (6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering), pages 25--34. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Antoniol, G. Canfora, G. Casazza, and A. De Lucia. Information retrieval models for recovering traceability links between code and documentation. In Proceedings of ICSM (16th IEEE International Conference on Software Maintenance), pages 40--51. IEEE CS Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Bacchelli, A. Cleve, M. Lanza, and A. Mocci. Extracting structured data from natural language documents with island parsing. In Proceedings of ASE 2011 (26th IEEE/ACM International Conference On Automated Software Engineering), pages 476--479, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications, pages 681--682. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. P. L. Buse and W. Weimer. Synthesizing API usage examples. In Proceedings of ICSE 2012 (34th International Conference on Software Engineering), pages 782--792. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Cordeiro, B. Antunes, and P. Gomes. Context-based recommendation to support problem solving in software development. In Proceedings of RSSE 2012 (3rd International Workshop on Recommendation Systems for Software Engineering), pages 85--89. IEEE Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Cubranic, G. Murphy, J. Singer, and K. Booth. Hipikat: A project memory for software development. IEEE Transactions on Software Engineering, 31(6):446--465, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. Du, Y. Junsong, and D. Forsyth. Video event detection: From subvolume localization to spatiotemporal path search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2):404--416, Feb. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Duffy. Engaging the youtube google-eyed generation: Strategies for using web 2.0 in teaching and learning. In European Conference on ELearning, ECEL, pages 173--182, 2007.Google ScholarGoogle Scholar
  11. P. Galuščáková and P. Pecina. Experiments with segmentation strategies for passage retrieval in audio-visual documents. In Proceedings of ICMR 2014 (4th International Conference on Multimedia Retrieval), pages 217:217--217:224. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Goldman and R. C. Miller. Codetrail: Connecting source code and web resources. Journal of Visual Languages & Computing, 20(4):223--235, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Holmes and A. Begel. Deep intellisense: A tool for rehydrating evaporated information. In Proceedings of MSR 2008 (5th IEEE International Working Conference on Mining Software Repositories), pages 23--26, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In Proceedings of ICSE 2005 (27th International Conference on Software Engineering, pages 117--125. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Holmes, R. J. Walker, and G. C. Murphy. Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering, 32(12):952--970, Dec. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of ICSE 2014 (36th International Conference on Software Engineering), pages 664--675. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Kersten and G. C. Murphy. Using task context to improve programmer productivity. In Proceedings of FSE 2006 (14th ACM SIGSOFT International Symposium on Foundations of Software Engineering), pages 1--11. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O. Kononenko, D. Dietrich, R. Sharma, and R. Holmes. Automatically locating relevant programming help online. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 127--134, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  19. L. MacLeod, M.-A. Storey, and A. Bergen. Code, camera, action: How software developers document and share program knowledge using YouTube. In Proceedings of ICPC 2015 (23rd IEEE International Conference on Program Comprehension), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: Helping to navigate the api jungle. In Proceedings of PLDI 2005 (16th ACM SIGPLAN Conference on Programming Language Design and Implementation), pages 48--61. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  22. R. E. Mayer. Multimedia Learning. Cambridge University Press, New York, NY, USA, 2nd edition, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Mettes, J. C. van Gemert, S. Cappallo, T. Mensink, and C. G. Snoek. Bag-of-fragments: Selecting and encoding video fragments for event detection and recounting. In Proceedings of ICMR 2015 (5th ACM International Conference on Multimedia Retrieval), pages 427--434. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Moonen. Generating robust parsers using island grammars. In Proceedings of WCRE 2001 (8th Working Conference on Reverse Engineering), pages 13--22. IEEE CS, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus. How can I use this method? In Proceedings of ICSE 2015 (37th IEEE/ACM International Conference on Software Engineering), pages 880--890, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Mullamphy, P. Higgins, S. Belward, and L. Ward. To screencast or not to screencast. Anziam Journal, 51:C446--C460, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  27. A. N. Oppenheim. Questionnaire Design, Interviewing and Attitude Measurement. Pinter Publishers, 1992.Google ScholarGoogle Scholar
  28. G. Petrosyan, M. P. Robillard, and R. De Mori. Discovering information explaining api types using text classification. In Proceedings of ICSE 2015 (37th ACM/IEEE International Conference on Software Engineering), pages 869--879, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Ponzanelli. Holistic recommender systems for software engineering. In Proceedings of ICSE 2014 (36th ACM/IEEE International Conference on Software Engineering), Doctoral Symposium, pages 686--689. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Ponzanelli, A. Bacchelli, and M. Lanza. Leveraging crowd knowledge for software comprehension and development. In Proceedings of CSMR 2013 (17th European Conference on Software Maintenance and Reengineering), CSMR '13, pages 57--66. IEEE Computer Society, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Ponzanelli, A. Bacchelli, and M. Lanza. Seahawk: Stack overflow in the ide. In Proceedings of ICSE 2013 (37th International Conference on Software Engineering), pages 1295--1298. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Ponzanelli, G. Bavota, M. di Penta, R. Oliveto, and M. Lanza. Mining StackOverflow to turn the IDE into a self-confident programming Prompter. In Proceedings of MSR 2014 (11th Working Conference on Mining Software Repositories), pages 102--111. ACM Press, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. L. Ponzanelli, G. Bavota, A. Mocci, M. Di Penta, R. Oliveto, B. Russo, S. Haiduc, and M. Lanza. CodeTube: extracting relevant fragments from software development video tutorials. In Proceedings of the 38th ACM-IEEE International Conference on Software Engineering (ICSE 2016), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Ponzanelli, A. Mocci, and M. Lanza. StORMeD: Stack Overflow ready made data. In Proceedings of MSR 2015 (12th Working Conference on Mining Software Repositories), pages 474--477. ACM Press, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. P. Reiss. Semantics-based code search. In Proceedings of ICSE 2009 (31st International Conference on Software Engineering), pages 243--253. IEEE CS Press, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proceedings of ICSE 2013 (35th International Conference on Software Engineering), pages 832--841. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. P. Robillard and Y. B. Chhetri. Recommending reference API documentation. Empirical Software Engineering, pages 1--29, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. N. Sawadsky and G. C. Murphy. Fishtail: From task context to source code examples. In Proceedings of TOPI 2011 (1st Workshop on Developing Tools As Plug-ins), pages 48--51. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Stylos and B. A. Myers. Mica: A web-search tool for finding API components and examples. In Proceedings of the Visual Languages and Human-Centric Computing, VLHCC '06, pages 195--202. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Subramanian, L. Inozemtseva, and R. Holmes. Live API documentation. In Proceedings of ICSE 2014 (36th International Conference on Software Engineering), pages 643--652. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. W. Takuya and H. Masuhara. A spontaneous code recommendation tool based on associative search. In Proceedings of SUITE 2011 (3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation), pages 17--20. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Thummalapenta and T. Xie. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the ASE (22nd IEEE/ACM International Conference on Automated Software Engineering), pages 204--213, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Umarji, S. Sim, and C. Lopes. Archetypal Internet-Scale source code searching. In B. Russo, E. Damiani, S. Hissam, B. Lundell, and G. Succi, editors, Open Source Development, Communities and Quality, volume 275 of IFIP The International Federation for Information Processing, pages 257--263. Springer US, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  44. Z. Wen and V. Tzerpos. An effectiveness measure for software clustering algorithms. In Proceedings of the 12th IEEE International Workshop on Program Comprehension, pages 194--203, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Too long; didn't watch!: extracting relevant fragments from software development video tutorials

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICSE '16: Proceedings of the 38th International Conference on Software Engineering
        May 2016
        1235 pages
        ISBN:9781450339001
        DOI:10.1145/2884781

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 May 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate276of1,856submissions,15%

        Upcoming Conference

        ICSE 2025

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader