Skip to main content
Erschienen in: Empirical Software Engineering 6/2020

16.09.2020

A feature location approach for mapping application features extracted from crowd-based screencasts to source code

verfasst von: Parisa Moslehi, Bram Adams, Juergen Rilling

Erschienen in: Empirical Software Engineering | Ausgabe 6/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Crowd-based multimedia documents such as screencasts have emerged as a source for documenting requirements, the workflow and implementation issues of open source and agile software projects. For example, users can show and narrate how they manipulate an application’s GUI to perform a certain functionality, or a bug reporter could visually explain how to trigger a bug or a security vulnerability. Unfortunately, the streaming nature of programming screencasts and their binary format limit how developers can interact with a screencast’s content. In this research, we present an automated approach for mining and linking the multimedia content found in screencasts to their relevant software artifacts and, more specifically, to source code. We apply LDA-based mining approaches that take as input a set of screencast artifacts, such as GUI text and spoken word, to make the screencast content accessible and searchable to users and to link it to their relevant source code artifacts. To evaluate the applicability of our approach, we report on results from case studies that we conducted on existing WordPress and Mozilla Firefox screencasts. We found that our automated approach can significantly speed up the feature location process. For WordPress, we find that our approach using screencast speech and GUI text can successfully link relevant source code files within the top 10 hits of the result set with median Reciprocal Rank (RR) of 50% (rank 2) and 100% (rank 1). In the case of Firefox, our approach can identify relevant source code directories within the top 100 hits using screencast speech and GUI text with the median RR = 20%, meaning that the first true positive is ranked 5 or higher in more than 50% of the cases. Also, source code related to the frontend implementation that handles high-level or GUI-related aspects of an application is located with higher accuracy. We also found that term frequency rebalancing can further improve the linking results when using less noisy scenarios or locating less technical implementation of scenarios. Investigating the results of using original and weighted screencast data sources (speech, GUI, speech and GUI) that can result in having the highest median RR values in both case studies shows that speech data is an important information source that can result in having RR of 100%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
8
Portals such as https://​www.​wikipedia.​org/​ and https://​stackoverflow.​com/​ contain crowd-based textual documentation.
 
Literatur
Zurück zum Zitat Asuncion HU, Asuncion AU, Taylor RN (2010) Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - ICSE ‘10. Cape Town, South Africa: ACM Press, pp 95–104. https://doi.org/10.1145/1806799.1806817 Asuncion HU, Asuncion AU, Taylor RN (2010) Software traceability with topic modeling. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - ICSE ‘10. Cape Town, South Africa: ACM Press, pp 95–104. https://​doi.​org/​10.​1145/​1806799.​1806817
Zurück zum Zitat Bao L et al (2015) Reverse engineering time-series interaction data from screen-captured videos. 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., pp 399–408. https://doi.org/10.1109/SANER.2015.7081850 Bao L et al (2015) Reverse engineering time-series interaction data from screen-captured videos. 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., pp 399–408. https://​doi.​org/​10.​1109/​SANER.​2015.​7081850
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003a) Latent dirichlet allocation. The Journal of Machine Learning Research. JMLR.org, 3, pp 993–1022 Blei DM, Ng AY, Jordan MI (2003a) Latent dirichlet allocation. The Journal of Machine Learning Research. JMLR.​org, 3, pp 993–1022
Zurück zum Zitat Blei DM et al (2003b) Hierarchical topic models and the nested Chinese restaurant process. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press (NIPS’03), pp 17–24 Blei DM et al (2003b) Hierarchical topic models and the nested Chinese restaurant process. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press (NIPS’03), pp 17–24
Zurück zum Zitat Campbell JC et al (2013) Deficient documentation detection: a methodology to locate deficient project documentation using topic analysis. IEEE International Working Conference on Mining Software Repositories. IEEE, Piscataway, NJ, USA, pp 57–60. https://doi.org/10.1109/MSR.2013.6624005 Campbell JC et al (2013) Deficient documentation detection: a methodology to locate deficient project documentation using topic analysis. IEEE International Working Conference on Mining Software Repositories. IEEE, Piscataway, NJ, USA, pp 57–60. https://​doi.​org/​10.​1109/​MSR.​2013.​6624005
Zurück zum Zitat Cheriet M et al (2007) Character recognition systems: a guide for students and practitioners. Wiley-Interscience Cheriet M et al (2007) Character recognition systems: a guide for students and practitioners. Wiley-Interscience
Zurück zum Zitat Deerwester S et al (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRef Deerwester S et al (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407CrossRef
Zurück zum Zitat Gray WD (2007) Integrated models of cognitive systems (advances in cognitive models and architectures). Oxford University Press, Inc., New YorkCrossRef Gray WD (2007) Integrated models of cognitive systems (advances in cognitive models and architectures). Oxford University Press, Inc., New YorkCrossRef
Zurück zum Zitat Jurafsky D, Martin JH (2009) Speech and language processing, 2nd edn. Prentice-Hall, Inc., USA Jurafsky D, Martin JH (2009) Speech and language processing, 2nd edn. Prentice-Hall, Inc., USA
Zurück zum Zitat Kagdi H, Maletic JI (2007) Software repositories : a source for traceability links. TEFSE/GCT 2007 - 4th International Workshop on Traceability in Emerging Forms of Software Engineering, (APRIL 2002), pp 32–39 Kagdi H, Maletic JI (2007) Software repositories : a source for traceability links. TEFSE/GCT 2007 - 4th International Workshop on Traceability in Emerging Forms of Software Engineering, (APRIL 2002), pp 32–39
Zurück zum Zitat Khandwala K, Guo PJ (2018) Codemotion: expanding the design space of learner interactions with computer programming tutorial videos. In: Proceedings of the fifth annual ACM conference on learning at scale - L@S ‘18. London, United Kingdom: ACM Press, pp. 1–10. doi: https://doi.org/10.1145/3231644.3231652 Khandwala K, Guo PJ (2018) Codemotion: expanding the design space of learner interactions with computer programming tutorial videos. In: Proceedings of the fifth annual ACM conference on learning at scale - L@S ‘18. London, United Kingdom: ACM Press, pp. 1–10. doi: https://​doi.​org/​10.​1145/​3231644.​3231652
Zurück zum Zitat Leach RJ (2000) Introduction to software engineering. CRC Press, Inc., Boca RatonMATH Leach RJ (2000) Introduction to software engineering. CRC Press, Inc., Boca RatonMATH
Zurück zum Zitat Li C et al (2016) Topic modeling for short texts with auxiliary word Embeddings. In: Proceedings of the 39th international ACM SIGIR conference on Research and Development in information retrieval - SIGIR ‘16. Pisa, Italy: ACM Press, pp 165–174. https://doi.org/10.1145/2911451.2911499 Li C et al (2016) Topic modeling for short texts with auxiliary word Embeddings. In: Proceedings of the 39th international ACM SIGIR conference on Research and Development in information retrieval - SIGIR ‘16. Pisa, Italy: ACM Press, pp 165–174. https://​doi.​org/​10.​1145/​2911451.​2911499
Zurück zum Zitat MacLeod L, Storey M-A, Bergen A (2015) Code, camera, action: how software developers document and share program knowledge using YouTube. 2015 IEEE 23rd International Conference on Program Comprehension (ICPC). IEEE, Piscataway, NJ, USA https://doi.org/10.1109/ICPC.2015.19 MacLeod L, Storey M-A, Bergen A (2015) Code, camera, action: how software developers document and share program knowledge using YouTube. 2015 IEEE 23rd International Conference on Program Comprehension (ICPC). IEEE, Piscataway, NJ, USA https://​doi.​org/​10.​1109/​ICPC.​2015.​19
Zurück zum Zitat Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, USACrossRef Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, USACrossRef
Zurück zum Zitat Mohorovičič S (2012) Creation and use of screencasts in higher education. MIPRO 2012 - 35th International Convention on Information and Communication Technology, Electronics and Microelectronics - Proceedings, pp 1293–1298 Mohorovičič S (2012) Creation and use of screencasts in higher education. MIPRO 2012 - 35th International Convention on Information and Communication Technology, Electronics and Microelectronics - Proceedings, pp 1293–1298
Zurück zum Zitat Nguyen AT et al (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering. Essen, GermanyUSA: ACM Press, pp 70–79. https://doi.org/10.1145/2351676.2351687 Nguyen AT et al (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM international conference on automated software engineering. Essen, GermanyUSA: ACM Press, pp 70–79. https://​doi.​org/​10.​1145/​2351676.​2351687
Zurück zum Zitat Nixon MS, Aguado AS (2012a) Chapter 5 - high-level feature extraction: fixed shape matching. In: Nixon MS, Aguado AS (eds) Feature extraction and image processing for computer vision (third edition). Third edit. Oxford: Academic Press, pp 217–291 Nixon MS, Aguado AS (2012a) Chapter 5 - high-level feature extraction: fixed shape matching. In: Nixon MS, Aguado AS (eds) Feature extraction and image processing for computer vision (third edition). Third edit. Oxford: Academic Press, pp 217–291
Zurück zum Zitat Nixon MS, Aguado AS (2012b) Chapter 7 - object description. In: Nixon MS, Aguado AS (eds) Feature Extraction and Image Processing for Computer Vision (Third edition). Third edit. Academic Press, Oxford, pp 343–397CrossRef Nixon MS, Aguado AS (2012b) Chapter 7 - object description. In: Nixon MS, Aguado AS (eds) Feature Extraction and Image Processing for Computer Vision (Third edition). Third edit. Academic Press, Oxford, pp 343–397CrossRef
Zurück zum Zitat Ponzanelli L et al (2016) Too long; didn’t watch!: extracting relevant fragments from software development video tutorials. In: Proceedings of the 38th international conference on software engineering - ICSE ‘16. Austin, ACM Press, pp 261–272. https://doi.org/10.1145/2884781.2884824 Ponzanelli L et al (2016) Too long; didn’t watch!: extracting relevant fragments from software development video tutorials. In: Proceedings of the 38th international conference on software engineering - ICSE ‘16. Austin, ACM Press, pp 261–272. https://​doi.​org/​10.​1145/​2884781.​2884824
Zurück zum Zitat Ramage D et al (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. Singapore, pp 248–256 Ramage D et al (2009) Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora. In: EMNLP 2009 - Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: A Meeting of SIGDAT, a Special Interest Group of ACL, Held in Conjunction with ACL-IJCNLP 2009. Singapore, pp 248–256
Zurück zum Zitat Thomas SW (2012) Mining unstructured software repositories using IR models. Queen’s University Thomas SW (2012) Mining unstructured software repositories using IR models. Queen’s University
Zurück zum Zitat Turk D, France R, Rumpe B (2014) Limitations of agile software processes. abs/1409.6, pp 43–46 Turk D, France R, Rumpe B (2014) Limitations of agile software processes. abs/1409.6, pp 43–46
Zurück zum Zitat Yadid S, Yahav E (2016) Extracting code from programming tutorial videos. In: Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software - Onward! 2016. Amsterdam, Netherlands: ACM Press, pp 98–111. https://doi.org/10.1145/2986012.2986021 Yadid S, Yahav E (2016) Extracting code from programming tutorial videos. In: Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software - Onward! 2016. Amsterdam, Netherlands: ACM Press, pp 98–111. https://​doi.​org/​10.​1145/​2986012.​2986021
Metadaten
Titel
A feature location approach for mapping application features extracted from crowd-based screencasts to source code
verfasst von
Parisa Moslehi
Bram Adams
Juergen Rilling
Publikationsdatum
16.09.2020
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 6/2020
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-020-09874-z

Weitere Artikel der Ausgabe 6/2020

Empirical Software Engineering 6/2020 Zur Ausgabe

Premium Partner