ABSTRACT
When knowledgeable colleagues are not available, developers resort to offline and online resources, e.g., tutorials, mailing lists, and Q&A websites. These, however, need to be found, read, and understood, which takes its toll in terms of time and mental energy. A more immediate and accessible resource are video tutorials found on the web, which in recent years have seen a steep increase in popularity. Nonetheless, videos are an intrinsically noisy data source, and finding the right piece of information might be even more cumbersome than using the previously mentioned resources.
We present CodeTube, an approach which mines video tutorials found on the web, and enables developers to query their contents. The video tutorials are split into coherent fragments, to return only fragments related to the query. These are complemented with information from additional sources, such as Stack Overflow discussions. The results of two studies to assess CodeTube indicate that video tutorials---if appropriately processed---represent a useful, yet still under-utilized source of information for software development.
- M. Acharya, T. Xie, J. Pei, and J. Xu. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In Proceedings of ESEC/FSE 2007 (6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering), pages 25--34. ACM, 2007. Google ScholarDigital Library
- G. Antoniol, G. Canfora, G. Casazza, and A. De Lucia. Information retrieval models for recovering traceability links between code and documentation. In Proceedings of ICSM (16th IEEE International Conference on Software Maintenance), pages 40--51. IEEE CS Press, 2000. Google ScholarDigital Library
- A. Bacchelli, A. Cleve, M. Lanza, and A. Mocci. Extracting structured data from natural language documents with island parsing. In Proceedings of ASE 2011 (26th IEEE/ACM International Conference On Automated Software Engineering), pages 476--479, 2011. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarDigital Library
- S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications, pages 681--682. ACM, 2006. Google ScholarDigital Library
- R. P. L. Buse and W. Weimer. Synthesizing API usage examples. In Proceedings of ICSE 2012 (34th International Conference on Software Engineering), pages 782--792. IEEE, 2012. Google ScholarDigital Library
- J. Cordeiro, B. Antunes, and P. Gomes. Context-based recommendation to support problem solving in software development. In Proceedings of RSSE 2012 (3rd International Workshop on Recommendation Systems for Software Engineering), pages 85--89. IEEE Press, 2012. Google ScholarDigital Library
- D. Cubranic, G. Murphy, J. Singer, and K. Booth. Hipikat: A project memory for software development. IEEE Transactions on Software Engineering, 31(6):446--465, 2005. Google ScholarDigital Library
- T. Du, Y. Junsong, and D. Forsyth. Video event detection: From subvolume localization to spatiotemporal path search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2):404--416, Feb. 2014. Google ScholarDigital Library
- P. Duffy. Engaging the youtube google-eyed generation: Strategies for using web 2.0 in teaching and learning. In European Conference on ELearning, ECEL, pages 173--182, 2007.Google Scholar
- P. Galuščáková and P. Pecina. Experiments with segmentation strategies for passage retrieval in audio-visual documents. In Proceedings of ICMR 2014 (4th International Conference on Multimedia Retrieval), pages 217:217--217:224. ACM, 2014. Google ScholarDigital Library
- M. Goldman and R. C. Miller. Codetrail: Connecting source code and web resources. Journal of Visual Languages & Computing, 20(4):223--235, Aug. 2009. Google ScholarDigital Library
- R. Holmes and A. Begel. Deep intellisense: A tool for rehydrating evaporated information. In Proceedings of MSR 2008 (5th IEEE International Working Conference on Mining Software Repositories), pages 23--26, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In Proceedings of ICSE 2005 (27th International Conference on Software Engineering, pages 117--125. ACM, 2005. Google ScholarDigital Library
- R. Holmes, R. J. Walker, and G. C. Murphy. Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering, 32(12):952--970, Dec. 2006. Google ScholarDigital Library
- I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of ICSE 2014 (36th International Conference on Software Engineering), pages 664--675. ACM, 2014. Google ScholarDigital Library
- M. Kersten and G. C. Murphy. Using task context to improve programmer productivity. In Proceedings of FSE 2006 (14th ACM SIGSOFT International Symposium on Foundations of Software Engineering), pages 1--11. ACM, 2006. Google ScholarDigital Library
- O. Kononenko, D. Dietrich, R. Sharma, and R. Holmes. Automatically locating relevant programming help online. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 127--134, 2012.Google ScholarCross Ref
- L. MacLeod, M.-A. Storey, and A. Bergen. Code, camera, action: How software developers document and share program knowledge using YouTube. In Proceedings of ICPC 2015 (23rd IEEE International Conference on Program Comprehension), 2015. Google ScholarDigital Library
- D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: Helping to navigate the api jungle. In Proceedings of PLDI 2005 (16th ACM SIGPLAN Conference on Programming Language Design and Implementation), pages 48--61. ACM, 2005. Google ScholarDigital Library
- C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarCross Ref
- R. E. Mayer. Multimedia Learning. Cambridge University Press, New York, NY, USA, 2nd edition, 2009. Google ScholarDigital Library
- P. Mettes, J. C. van Gemert, S. Cappallo, T. Mensink, and C. G. Snoek. Bag-of-fragments: Selecting and encoding video fragments for event detection and recounting. In Proceedings of ICMR 2015 (5th ACM International Conference on Multimedia Retrieval), pages 427--434. ACM, 2015. Google ScholarDigital Library
- L. Moonen. Generating robust parsers using island grammars. In Proceedings of WCRE 2001 (8th Working Conference on Reverse Engineering), pages 13--22. IEEE CS, 2001. Google ScholarDigital Library
- L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus. How can I use this method? In Proceedings of ICSE 2015 (37th IEEE/ACM International Conference on Software Engineering), pages 880--890, 2015. Google ScholarDigital Library
- D. Mullamphy, P. Higgins, S. Belward, and L. Ward. To screencast or not to screencast. Anziam Journal, 51:C446--C460, 2010.Google ScholarCross Ref
- A. N. Oppenheim. Questionnaire Design, Interviewing and Attitude Measurement. Pinter Publishers, 1992.Google Scholar
- G. Petrosyan, M. P. Robillard, and R. De Mori. Discovering information explaining api types using text classification. In Proceedings of ICSE 2015 (37th ACM/IEEE International Conference on Software Engineering), pages 869--879, 2015. Google ScholarDigital Library
- L. Ponzanelli. Holistic recommender systems for software engineering. In Proceedings of ICSE 2014 (36th ACM/IEEE International Conference on Software Engineering), Doctoral Symposium, pages 686--689. ACM, 2014. Google ScholarDigital Library
- L. Ponzanelli, A. Bacchelli, and M. Lanza. Leveraging crowd knowledge for software comprehension and development. In Proceedings of CSMR 2013 (17th European Conference on Software Maintenance and Reengineering), CSMR '13, pages 57--66. IEEE Computer Society, 2013. Google ScholarDigital Library
- L. Ponzanelli, A. Bacchelli, and M. Lanza. Seahawk: Stack overflow in the ide. In Proceedings of ICSE 2013 (37th International Conference on Software Engineering), pages 1295--1298. IEEE Press, 2013. Google ScholarDigital Library
- L. Ponzanelli, G. Bavota, M. di Penta, R. Oliveto, and M. Lanza. Mining StackOverflow to turn the IDE into a self-confident programming Prompter. In Proceedings of MSR 2014 (11th Working Conference on Mining Software Repositories), pages 102--111. ACM Press, 2014. Google ScholarDigital Library
- L. Ponzanelli, G. Bavota, A. Mocci, M. Di Penta, R. Oliveto, B. Russo, S. Haiduc, and M. Lanza. CodeTube: extracting relevant fragments from software development video tutorials. In Proceedings of the 38th ACM-IEEE International Conference on Software Engineering (ICSE 2016), 2015. Google ScholarDigital Library
- L. Ponzanelli, A. Mocci, and M. Lanza. StORMeD: Stack Overflow ready made data. In Proceedings of MSR 2015 (12th Working Conference on Mining Software Repositories), pages 474--477. ACM Press, 2015. Google ScholarDigital Library
- S. P. Reiss. Semantics-based code search. In Proceedings of ICSE 2009 (31st International Conference on Software Engineering), pages 243--253. IEEE CS Press, 2009. Google ScholarDigital Library
- P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proceedings of ICSE 2013 (35th International Conference on Software Engineering), pages 832--841. IEEE Press, 2013. Google ScholarDigital Library
- M. P. Robillard and Y. B. Chhetri. Recommending reference API documentation. Empirical Software Engineering, pages 1--29, 2014. Google ScholarDigital Library
- N. Sawadsky and G. C. Murphy. Fishtail: From task context to source code examples. In Proceedings of TOPI 2011 (1st Workshop on Developing Tools As Plug-ins), pages 48--51. ACM, 2011. Google ScholarDigital Library
- J. Stylos and B. A. Myers. Mica: A web-search tool for finding API components and examples. In Proceedings of the Visual Languages and Human-Centric Computing, VLHCC '06, pages 195--202. IEEE Computer Society, 2006. Google ScholarDigital Library
- S. Subramanian, L. Inozemtseva, and R. Holmes. Live API documentation. In Proceedings of ICSE 2014 (36th International Conference on Software Engineering), pages 643--652. ACM, 2014. Google ScholarDigital Library
- W. Takuya and H. Masuhara. A spontaneous code recommendation tool based on associative search. In Proceedings of SUITE 2011 (3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation), pages 17--20. ACM, 2011. Google ScholarDigital Library
- S. Thummalapenta and T. Xie. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the ASE (22nd IEEE/ACM International Conference on Automated Software Engineering), pages 204--213, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- M. Umarji, S. Sim, and C. Lopes. Archetypal Internet-Scale source code searching. In B. Russo, E. Damiani, S. Hissam, B. Lundell, and G. Succi, editors, Open Source Development, Communities and Quality, volume 275 of IFIP The International Federation for Information Processing, pages 257--263. Springer US, 2008.Google ScholarCross Ref
- Z. Wen and V. Tzerpos. An effectiveness measure for software clustering algorithms. In Proceedings of the 12th IEEE International Workshop on Program Comprehension, pages 194--203, 2004. Google ScholarDigital Library
Index Terms
- Too long; didn't watch!: extracting relevant fragments from software development video tutorials
Recommendations
Supporting software developers with a holistic recommender system
ICSE '17: Proceedings of the 39th International Conference on Software EngineeringThe promise of recommender systems is to provide intelligent support to developers during their programming tasks. Such support ranges from suggesting program entities to taking into account pertinent Q&A pages. However, current recommender systems ...
Industry watch*
A year ago in this column I wondered aloud whether 2007 was to be the year in which question-answering (QA) really took off in the commercial space. I was provoked to ask that question by the increasing number of Web-based QA systems that were ...
"I like ISIS, but I want to watch Chris Nolan's new movie": Exploring ISIS Supporters on Twitter
HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social MediaThe recent rise of the "Islamic State of Iraq and Syria" (ISIS) has sparked significant interest in the group. We explored the tweets of a large number of Twitter users who frequently comment on this subject by either showing support or opposition. ISIS ...
Comments