research-article

Too long; didn't watch!: extracting relevant fragments from software development video tutorials

Authors:
Luca Ponzanelli

Università della Svizzera Italiana (USI), Switzerland

Università della Svizzera Italiana (USI), Switzerland
View Profile

,
Gabriele Bavota

Free University of Bozen-Bolzano, Italy

Free University of Bozen-Bolzano, Italy
View Profile

,
Andrea Mocci

Università della Svizzera Italiana (USI), Switzerland

Università della Svizzera Italiana (USI), Switzerland
View Profile

,
Massimiliano Di Penta

University of Sannio, Italy

University of Sannio, Italy
View Profile

,
Rocco Oliveto

University of Molise, Italy

University of Molise, Italy
View Profile

,
Mir Hasan

Florida State University

Florida State University
View Profile

,
Barbara Russo

Free University of Bozen-Bolzano, Italy

Free University of Bozen-Bolzano, Italy
View Profile

,
Sonia Haiduc

Florida State University

Florida State University
View Profile

,
Michele Lanza

Università della Svizzera Italiana (USI), Switzerland

Università della Svizzera Italiana (USI), Switzerland
View Profile

ICSE '16: Proceedings of the 38th International Conference on Software EngineeringMay 2016Pages 261–272https://doi.org/10.1145/2884781.2884824

Published:14 May 2016Publication History

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

Pages 261–272

ABSTRACT

When knowledgeable colleagues are not available, developers resort to offline and online resources, e.g., tutorials, mailing lists, and Q&A websites. These, however, need to be found, read, and understood, which takes its toll in terms of time and mental energy. A more immediate and accessible resource are video tutorials found on the web, which in recent years have seen a steep increase in popularity. Nonetheless, videos are an intrinsically noisy data source, and finding the right piece of information might be even more cumbersome than using the previously mentioned resources.

We present CodeTube, an approach which mines video tutorials found on the web, and enables developers to query their contents. The video tutorials are split into coherent fragments, to return only fragments related to the query. These are complemented with information from additional sources, such as Stack Overflow discussions. The results of two studies to assess CodeTube indicate that video tutorials---if appropriately processed---represent a useful, yet still under-utilized source of information for software development.

References

M. Acharya, T. Xie, J. Pei, and J. Xu. Mining API patterns as partial orders from source code: from usage scenarios to specifications. In Proceedings of ESEC/FSE 2007 (6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering), pages 25--34. ACM, 2007. Google ScholarDigital Library
G. Antoniol, G. Canfora, G. Casazza, and A. De Lucia. Information retrieval models for recovering traceability links between code and documentation. In Proceedings of ICSM (16th IEEE International Conference on Software Maintenance), pages 40--51. IEEE CS Press, 2000. Google ScholarDigital Library
A. Bacchelli, A. Cleve, M. Lanza, and A. Mocci. Extracting structured data from natural language documents with island parsing. In Proceedings of ASE 2011 (26th IEEE/ACM International Conference On Automated Software Engineering), pages 476--479, 2011. Google ScholarDigital Library
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999. Google ScholarDigital Library
S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications, pages 681--682. ACM, 2006. Google ScholarDigital Library
R. P. L. Buse and W. Weimer. Synthesizing API usage examples. In Proceedings of ICSE 2012 (34th International Conference on Software Engineering), pages 782--792. IEEE, 2012. Google ScholarDigital Library
J. Cordeiro, B. Antunes, and P. Gomes. Context-based recommendation to support problem solving in software development. In Proceedings of RSSE 2012 (3rd International Workshop on Recommendation Systems for Software Engineering), pages 85--89. IEEE Press, 2012. Google ScholarDigital Library
D. Cubranic, G. Murphy, J. Singer, and K. Booth. Hipikat: A project memory for software development. IEEE Transactions on Software Engineering, 31(6):446--465, 2005. Google ScholarDigital Library
T. Du, Y. Junsong, and D. Forsyth. Video event detection: From subvolume localization to spatiotemporal path search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2):404--416, Feb. 2014. Google ScholarDigital Library
P. Duffy. Engaging the youtube google-eyed generation: Strategies for using web 2.0 in teaching and learning. In European Conference on ELearning, ECEL, pages 173--182, 2007.Google Scholar
P. Galuščáková and P. Pecina. Experiments with segmentation strategies for passage retrieval in audio-visual documents. In Proceedings of ICMR 2014 (4th International Conference on Multimedia Retrieval), pages 217:217--217:224. ACM, 2014. Google ScholarDigital Library
M. Goldman and R. C. Miller. Codetrail: Connecting source code and web resources. Journal of Visual Languages & Computing, 20(4):223--235, Aug. 2009. Google ScholarDigital Library
R. Holmes and A. Begel. Deep intellisense: A tool for rehydrating evaporated information. In Proceedings of MSR 2008 (5th IEEE International Working Conference on Mining Software Repositories), pages 23--26, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In Proceedings of ICSE 2005 (27th International Conference on Software Engineering, pages 117--125. ACM, 2005. Google ScholarDigital Library
R. Holmes, R. J. Walker, and G. C. Murphy. Approximate structural context matching: An approach to recommend relevant examples. IEEE Transactions on Software Engineering, 32(12):952--970, Dec. 2006. Google ScholarDigital Library
I. Keivanloo, J. Rilling, and Y. Zou. Spotting working code examples. In Proceedings of ICSE 2014 (36th International Conference on Software Engineering), pages 664--675. ACM, 2014. Google ScholarDigital Library
M. Kersten and G. C. Murphy. Using task context to improve programmer productivity. In Proceedings of FSE 2006 (14th ACM SIGSOFT International Symposium on Foundations of Software Engineering), pages 1--11. ACM, 2006. Google ScholarDigital Library
O. Kononenko, D. Dietrich, R. Sharma, and R. Holmes. Automatically locating relevant programming help online. In IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 127--134, 2012.Google ScholarCross Ref
L. MacLeod, M.-A. Storey, and A. Bergen. Code, camera, action: How software developers document and share program knowledge using YouTube. In Proceedings of ICPC 2015 (23rd IEEE International Conference on Program Comprehension), 2015. Google ScholarDigital Library
D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: Helping to navigate the api jungle. In Proceedings of PLDI 2005 (16th ACM SIGPLAN Conference on Programming Language Design and Implementation), pages 48--61. ACM, 2005. Google ScholarDigital Library
C. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarCross Ref
R. E. Mayer. Multimedia Learning. Cambridge University Press, New York, NY, USA, 2nd edition, 2009. Google ScholarDigital Library
P. Mettes, J. C. van Gemert, S. Cappallo, T. Mensink, and C. G. Snoek. Bag-of-fragments: Selecting and encoding video fragments for event detection and recounting. In Proceedings of ICMR 2015 (5th ACM International Conference on Multimedia Retrieval), pages 427--434. ACM, 2015. Google ScholarDigital Library
L. Moonen. Generating robust parsers using island grammars. In Proceedings of WCRE 2001 (8th Working Conference on Reverse Engineering), pages 13--22. IEEE CS, 2001. Google ScholarDigital Library
L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus. How can I use this method? In Proceedings of ICSE 2015 (37th IEEE/ACM International Conference on Software Engineering), pages 880--890, 2015. Google ScholarDigital Library
D. Mullamphy, P. Higgins, S. Belward, and L. Ward. To screencast or not to screencast. Anziam Journal, 51:C446--C460, 2010.Google ScholarCross Ref
A. N. Oppenheim. Questionnaire Design, Interviewing and Attitude Measurement. Pinter Publishers, 1992.Google Scholar
G. Petrosyan, M. P. Robillard, and R. De Mori. Discovering information explaining api types using text classification. In Proceedings of ICSE 2015 (37th ACM/IEEE International Conference on Software Engineering), pages 869--879, 2015. Google ScholarDigital Library
L. Ponzanelli. Holistic recommender systems for software engineering. In Proceedings of ICSE 2014 (36th ACM/IEEE International Conference on Software Engineering), Doctoral Symposium, pages 686--689. ACM, 2014. Google ScholarDigital Library
L. Ponzanelli, A. Bacchelli, and M. Lanza. Leveraging crowd knowledge for software comprehension and development. In Proceedings of CSMR 2013 (17th European Conference on Software Maintenance and Reengineering), CSMR '13, pages 57--66. IEEE Computer Society, 2013. Google ScholarDigital Library
L. Ponzanelli, A. Bacchelli, and M. Lanza. Seahawk: Stack overflow in the ide. In Proceedings of ICSE 2013 (37th International Conference on Software Engineering), pages 1295--1298. IEEE Press, 2013. Google ScholarDigital Library
L. Ponzanelli, G. Bavota, M. di Penta, R. Oliveto, and M. Lanza. Mining StackOverflow to turn the IDE into a self-confident programming Prompter. In Proceedings of MSR 2014 (11th Working Conference on Mining Software Repositories), pages 102--111. ACM Press, 2014. Google ScholarDigital Library
L. Ponzanelli, G. Bavota, A. Mocci, M. Di Penta, R. Oliveto, B. Russo, S. Haiduc, and M. Lanza. CodeTube: extracting relevant fragments from software development video tutorials. In Proceedings of the 38th ACM-IEEE International Conference on Software Engineering (ICSE 2016), 2015. Google ScholarDigital Library
L. Ponzanelli, A. Mocci, and M. Lanza. StORMeD: Stack Overflow ready made data. In Proceedings of MSR 2015 (12th Working Conference on Mining Software Repositories), pages 474--477. ACM Press, 2015. Google ScholarDigital Library
S. P. Reiss. Semantics-based code search. In Proceedings of ICSE 2009 (31st International Conference on Software Engineering), pages 243--253. IEEE CS Press, 2009. Google ScholarDigital Library
P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proceedings of ICSE 2013 (35th International Conference on Software Engineering), pages 832--841. IEEE Press, 2013. Google ScholarDigital Library
M. P. Robillard and Y. B. Chhetri. Recommending reference API documentation. Empirical Software Engineering, pages 1--29, 2014. Google ScholarDigital Library
N. Sawadsky and G. C. Murphy. Fishtail: From task context to source code examples. In Proceedings of TOPI 2011 (1st Workshop on Developing Tools As Plug-ins), pages 48--51. ACM, 2011. Google ScholarDigital Library
J. Stylos and B. A. Myers. Mica: A web-search tool for finding API components and examples. In Proceedings of the Visual Languages and Human-Centric Computing, VLHCC '06, pages 195--202. IEEE Computer Society, 2006. Google ScholarDigital Library
S. Subramanian, L. Inozemtseva, and R. Holmes. Live API documentation. In Proceedings of ICSE 2014 (36th International Conference on Software Engineering), pages 643--652. ACM, 2014. Google ScholarDigital Library
W. Takuya and H. Masuhara. A spontaneous code recommendation tool based on associative search. In Proceedings of SUITE 2011 (3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation), pages 17--20. ACM, 2011. Google ScholarDigital Library
S. Thummalapenta and T. Xie. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the ASE (22nd IEEE/ACM International Conference on Automated Software Engineering), pages 204--213, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
M. Umarji, S. Sim, and C. Lopes. Archetypal Internet-Scale source code searching. In B. Russo, E. Damiani, S. Hissam, B. Lundell, and G. Succi, editors, Open Source Development, Communities and Quality, volume 275 of IFIP The International Federation for Information Processing, pages 257--263. Springer US, 2008.Google ScholarCross Ref
Z. Wen and V. Tzerpos. An effectiveness measure for software clustering algorithms. In Proceedings of the 12th IEEE International Workshop on Program Comprehension, pages 194--203, 2004. Google ScholarDigital Library

Index Terms

Too long; didn't watch!: extracting relevant fragments from software development video tutorials
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation
  2. Software notations and tools
    1. Software maintenance tools

Recommendations

Supporting software developers with a holistic recommender system
ICSE '17: Proceedings of the 39th International Conference on Software Engineering

The promise of recommender systems is to provide intelligent support to developers during their programming tasks. Such support ranges from suggesting program entities to taking into account pertinent Q&A pages. However, current recommender systems ...
Read More
Industry watch*

A year ago in this column I wondered aloud whether 2007 was to be the year in which question-answering (QA) really took off in the commercial space. I was provoked to ask that question by the increasing number of Web-based QA systems that were ...
Read More
"I like ISIS, but I want to watch Chris Nolan's new movie": Exploring ISIS Supporters on Twitter
HT '15: Proceedings of the 26th ACM Conference on Hypertext & Social Media

The recent rise of the "Islamic State of Iraq and Syria" (ISIS) has sparked significant interest in the group. We explored the tweets of a large number of Twitter users who frequently comment on this subject by either showing support or opposition. ISIS ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '16: Proceedings of the 38th International Conference on Software Engineering
May 2016
1235 pages
ISBN:9781450339001
DOI:10.1145/2884781
General Chair:
Laura Dillon
Michigan State University
,
Program Chairs:
Willem Visser
Stellenbosch University, South Africa
,
Laurie Williams
North Carolina State University
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
mining unstructured data
recommender systems
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 47
  Total Citations
  View Citations
- 826
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Too long; didn't watch!: extracting relevant fragments from software development video tutorials

ICSE '16: Proceedings of the 38th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Supporting software developers with a holistic recommender system

Industry watch*

"I like ISIS, but I want to watch Chris Nolan's new movie": Exploring ISIS Supporters on Twitter