Top

Empirical Software Engineering

Published in:

04-07-2018

How do developers utilize source code from stack overflow?

Authors: Yuhao Wu, Shaowei Wang, Cor-Paul Bezemer, Katsuro Inoue

Published in: Empirical Software Engineering | Issue 2/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Technical question and answer Q&A platforms, such as Stack Overflow, provide a platform for users to ask and answer questions about a wide variety of programming topics. These platforms accumulate a large amount of knowledge, including hundreds of thousands lines of source code. Developers can benefit from the source code that is attached to the questions and answers on Q&A platforms by copying or learning from (parts of) it. By understanding how developers utilize source code from Q&A platforms, we can provide insights for researchers which can be used to improve next-generation Q&A platforms to help developers reuse source code fast and easily. In this paper, we first conduct an exploratory study on 289 files from 182 open-source projects, which contain source code that has an explicit reference to a Stack Overflow post. Our goal is to understand how developers utilize code from Q&A platforms and to reveal barriers that may make code reuse more difficult. In 31.5% of the studied files, developers needed to modify source code from Stack Overflow to make it work in their own projects. The degree of required modification varied from simply renaming variables to rewriting the whole algorithm. Developers sometimes chose to implement an algorithm from scratch based on the descriptions from Stack Overflow answers, even if there was an implementation readily available in the post. In 35.5% of the studied files, developers used Stack Overflow posts as an information source for later reference. To further understand the barriers of reusing code and to obtain suggestions for improving the code reuse process on Q&A platforms, we conducted a survey with 453 open-source developers who are also on Stack Overflow. We found that the top 3 barriers that make it difficult for developers to reuse code from Stack Overflow are: (1) too much code modification required to fit in their projects, (2) incomprehensive code, and (3) low code quality. We summarized and analyzed all survey responses and we identified that developers suggest improvements for future Q&A platforms along the following dimensions: code quality, information enhancement & management, data organization, license, and the human factor. For instance, developers suggest to improve the code quality by adding an integrated validator that can test source code online, and an outdated code detection mechanism. Our findings can be used as a roadmap for researchers and developers to improve code reuse.

previous article On the relative value of data resampling approaches for software defect prediction

next article Test them all, is it worth it? Assessing configuration sampling on the JHipster Web development stack

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

https://goo.gl/X84SFi

https://creativecommons.org/licenses/by-sa/3.0/

https://goo.gl/KKbPWk

https://goo.gl/aC4auZ

https://goo.gl/gzMCgy

https://goo.gl/8foVXq

Abdalkareem R, Shihab E, Rilling J (2017) What do developers use the crowd for? a study using Stack Overflow. IEEE Soft 34(2):53–60CrossRef

Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions in Stack Overflow. In: Proceedings of the 13th international conference on mining software repositories (MSR), pp 402–412

Almeida DA, Murphy GC, Wilson G, Hoye M (2017) Do software developers understand open source licenses?. In: Proceedings of the 25th international conference on program comprehension (ICPC), pp 1–11. IEEE

Alnusair A, Rawashdeh M, Hossain MA, Alhamid MF (2016) Utilizing semantic techniques for automatic code reuse in software repositories. In: Quality software through reuse and integration, pp 42–62. Springer

An L, Mlouki O, Khomh F, Antoniol G (2017) Stack Overflow: A code laundering platform?. In: Proceedings of the 24th IEEE international conference on software analysis, evolution, and reengineering (SANER), pp 283–293. IEEE

Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2013) Steering user behavior with badges. In: Proceedings of the 22nd international conference on World Wide Web (WWW), pp 95–106. ACM

Armaly A, McMillan C (2016) Pragmatic source code reuse via execution record and replay. J Soft Evolution Process 28(8):642–664CrossRef

Atwood J (2009) Attribution required – Stack Overflow blog. https://stackoverflow.blog/2009/06/25/attribution-required/. (last visited: Aug 25, 2017)

Azad S, Rigby PC, Guerrouj L (2017) Generating API call rules from version history and stack overflow posts. ACM Trans Softw Eng Methodol (TOSEM) 25(4):29CrossRef

Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: A search engine for open source code supporting structure-based search. In: Companion to the 21st ACM SIGPLAN symposium on object-oriented programming systems, languages, and applications (OOPSLA), pp 681–682. ACM

Barzilay O (2011) Example embedding. In: Proceedings of the 10th SIGPLAN symposium on new ideas, new paradigms, and reflections on programming and software, Onward!, pp 137-144

Bian J, Gao B, Liu T-Y (2014) Knowledge-powered deep learning for word embedding. Springer, Berlin, pp 132–148

Cavusoglu H, Li Z, Huang K-W (2015) Can gamification motivate voluntary contributions?: The case of StackOverflow Q&A community. In: Proceedings of the 18th ACM conference companion on computer supported cooperative work & social computing, pp 171–174. ACM

Chen C, Gao S, Xing Z (2016) Mining analogical libraries in Q&A discussions - incorporating relational and categorical knowledge into word embedding. In: IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), pp 338–348. IEEE

Chen C, Xing Z, Wang X (2017) Unsupervised software-specific morphological forms inference from informal discussions. In: Proceedings of the 39th international conference on software engineering (ICSE), pp 450–461. IEEE

Cottrell R, Walker RJ, Denzinger J (2008) Semi-automating small-scale source code reuse via structural correspondence. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering (SIGSOFT), pp 214–225. ACM

Feldthaus A, Møller A (2013) Semi-automatic rename refactoring for javascript. In: Proceedings of the 2013 ACM SIGPLAN international conference on object oriented programming systems languages & applications, vol 48, pp 323–338. ACM

Galenson J, Reames P, Bodik R, Hartmann B, Sen K (2014) Codehint: Dynamic and interactive synthesis of code snippets. In: Proceedings of the 36th international conference on software engineering, ICSE, pp 653-663

Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: Elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., BostonMATH

Ganguly D, Roy D, Mitra M, Jones GJ (2015) Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 795–798

Gao Q, Zhang H, Wang J, Xiong Y, Zhang L, Mei H (2015) Fixing recurring crash bugs via analyzing Q&A sites. In: Proceedings of the 30th international conference on automated software engineering (ASE), pp 307–318

Gharehyazie M, Ray B, Filkov V (2017) Some from here, some from there: Cross-project code reuse in github. In: Proceedings of the 14th international conference on mining software repositories, MSR ’17, pp 291–301

Glaser B (2017) Discovery of grounded theory: Strategies for qualitative research. Routledge

Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering (FSE), pp 631–642. ACM

Gwet K et al (2002) Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Statistical Methods for Inter-Rater Reliability Assessment Series 2:1–9

Hua L, Kim M, McKinley KS (2015) Does automated refactoring obviate systematic editing?. In: IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1, pp 392–402. IEEE

Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014a) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 92–101. ACM

Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014b) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 92–101

Krumia (2014) Introduce an “obsolete answer” vote. https://meta.stackoverflow.com/questions/272651/introduce-an-obsolete-answer-vote,. (last visited: Aug 25)

Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 2267–2273. AAAI Press

Liu P, Joty SR, Meng HM (2015) Fine-grained opinion mining with recurrent neural networks and word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP), pp 1433–1443. The Association for Computational Linguistics

Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J (2015) CodeHow: Effective code search based on API understanding and extended boolean model. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE), pp 260–270. IEEE

McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: Finding relevant functions and their usage. In: Proceedings of the 33rd international conference on software engineering (ICSE), pp 111–120

Meng N, Kim M, McKinley KS (2011) Systematic editing: Generating program transformations from an example. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation (PLDI), pages 329–342

Meng N, Kim M, McKinley KS (2013) Lase: locating and applying systematic edits by learning from examples. In: Proceedings of the 2013 international conference on software engineering, pp 502–511. IEEE

Nguyen AT, Nguyen TT, Nguyen HA, Tamrawi A, Nguyen HV, Al-Kofahi J, Nguyen TN (2012) Graph-based pattern-oriented, context-sensitive source code completion. In: Proceedings of the 34th international conference on software engineering (ICSE), pp 69–79

Ponzanelli L, Bacchelli A, Lanza M (2013) Leveraging crowd knowledge for software comprehension and development. In: Proceedings of the 17th european conference on software maintenance and reengineering (CSMR), pp 57–66. IEEE

Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014a) Mining stackoverflow to turn the IDE into a self-confident programming prompter. In: Proceedings of the 11th working conference on mining software repositories, pp 102–111. ACM

Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014b) Prompter: A self-confident recommender system. In: ICSME, pp 577–580

Ponzanelli L, Mocci A, Bacchelli A, Lanza M (2014c) Understanding and classifying the quality of technical forum questions. In: Proceedings of the 14th international conference on quality software (QSIC), pp 343–352

Raychev V, Vechev M, Yahav E (2014) Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 419–428

Reja U, Manfreda KL, Hlebec V, Vehovar V (2003) Open-ended vs. close-ended questions in web questionnaires. Developments in Applied Statistics (Metodološ,ki zvezki) 19:159–77

Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings of the 2013 international conference on software engineering (ICSE), pp 832–841. IEEE

Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng (TSE) 25(4):557–572CrossRef

Seaman CB, Shull F, Regardie M, Elbert D, Feldmann RL, Guo Y, Godfrey S (2008) Defect categorization: making use of a decade of widely varying historical data. In: Proceedings of the 2nd ACM-IEEE international symposium on Empirical software engineering and measurement, pp 149–157. ACM

Searchcode (2016a) Searchcode - API. https://searchcode.com/api/. (last visited: Aug 25, 2017)

Searchcode (2016b) Searchcode - Homepage. https://searchcode.com/. (last visited: Aug 25, 2017)

Sillito J, Maurer F, Nasehi SM, Burns C (2012) What makes a good code example?: A study of programming Q&A in StackOverflow. In: Proceedings of the 2012 IEEE international conference on software maintenance (ICSM), pp 25–34

Stack Exchange (2015) The MIT license — clarity on using code on stack overflow and stack exchange. https://meta.stackexchange.com/q/271080/337948,. (last visited: Aug 25, 2017)

Stack Exchange (2017) All sites - Stack Exchange. https://stackexchange.com/sites,. (last visited: Aug 25, 2017)

Stack Overflow (2014) Feedback requested: Runnable code snippets in questions and answers. https://meta.stackoverflow.com/questions/269753/feedback-requested-runnable-code-snippets-in-questions-and-answers. (last visited: Aug 25, 2017)

Stack Overflow (2016) Stack Overflow developer survey results 2016. http://stackoverflow.com/research/developer-survey-2016,. (last visited: Aug 25, 2017)

Stack Overflow (2017) Stack Overflow - Homepage. https://stackoverflow.com/,. (last visited: Aug 25, 2017)

Treude C, Robillard MP (2016) Augmenting API documentation with insights from Stack Overflow. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 392–403. ACM

Treude C, Robillard MP (2017) Understanding stack overflow code fragments. In: 2017 IEEE international conference on software maintenance and evolution, ICSME 2017, Shanghai, China, September 17-22, pp 509-513

Treude C, Barzilay O, Storey M-A (2011) How do programmers ask and answer questions on the web? (NIER track). In: Proceedings of the 33rd international conference on software engineering (ICSE), pp 804–807

Vasilescu B, Filkov V, Serebrenik A (2013) StackOverflow and GitHub: Associations between software development and crowdsourced knowledge. In: Proceedings of 2013 international conference on social computing (SocialCom), pp 188–195. IEEE

Wang H, Lu Y, Zhai C (2010) Latent aspect rating analysis on review text data: A rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 783–792

Wang S, Lo D, Jiang L (2014a) Active code search: Incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering (ASE), pp 677–682

Wang S, Lo D, Vasilescu B, Serebrenik A (2014b) EnTagRec: An enhanced tag recommendation system for software information sites. In: Proceedings of the 2014 IEEE international conference on software maintenance and evolution (ICSME), pp 291–300

Wang S, Lo D, Jiang L (2016a) Autoquery: automatic construction of dependency queries for code search. Autom Softw Eng 23(3):393–425CrossRef

Wang S, Lo D, Vasilescu B, Serebrenik A (2017a) EnTagRec ++: An enhanced tag recommendation system for software information sites. Empirical Software Engineering

Wang S, Chen T.-H., Hassan AE (2017b) Understanding the factors for fast answers in technical Q&A websites, Empirical Software Engineering, pp 1–42

Wang X, Pollock LL, Vijay-Shanker K (2014c) Automatic segmentation of method code into meaningful blocks: Design and evaluation. J Soft Evolution Process 26(1):27–49CrossRef

Wang X, Pollock LL, Vijay-Shanker K (2017) Automatically generating natural language descriptions for object-related statement sequences. In: IEEE 24th international conference on software analysis, evolution and reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, pp 205–216

Wang Y, Feng Y, Martins R, Kaushik A, Dillig I, Reiss SP (2016b) Hunter: next-generation code reuse for java. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 1028–1032. ACM

Wang Z, Hamza W, Florian R (2017d) Bilateral multi-perspective matching for natural language sentences. arXiv:1702.03814

Wong T-L, Lam W, Wong T-S (2008) An unsupervised framework for extracting and normalizing product attributes from multiple web sites. In: Proceedings of the 31st annual international acm sigir conference on research and development in information retrieval (SIGIR), pp 35–42

Wu Y, Wang S, Bezemer C-P, Inoue K (2017) Online appendix of manuscript ”How Do Developers Utilize Source Code from Stack Overflow?”. https://zenodo.org/record/1116508

Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, Xing Z (2017) What do developers search for on the web? Empirical Software Engineering

Xin X, Lingfeng B, David L, Zhenchang X, Ahmed EH, Shanping L (2017) Measuring program comprehension: A large-scale field study with professionals. IEEE Trans Softw Eng (TSE) 99(26):1–1

Yellin DM, Strom RE (1997) Protocol specifications and component adaptors. ACM Trans Program Lang Syst (TOPLAS) 19(2):292–333CrossRef

Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. arXiv:1704.01696

Yu J, Zha Z-J, Wang M, Chua T-S (2011) Aspect ranking: Identifying important product aspects from online consumer reviews. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - vol 1, pp 1496–1505

Zagalsky A, German DM, Storey M-A, Teshima CG, Poo-Caamaño G (2017) How the R community creates and curates knowledge: an extended study of Stack Overflow and mailing lists. Empirical Software Engineering

Zhang WE, Sheng QZ, Lau JH, Abebe E (2017) Detecting duplicate posts in programming qa communities via latent semantics and association rules. In: Proceedings of the 26th international conference on World Wide Web (WWW), pp 1221–1229

Zhang Y, Lo D, Xia X, Sun J-L (2015) Multi-factor duplicate question detection in Stack Overflow. J Comput Sci Technol 30(5):981–997CrossRef

Zhao L, Li C (2009) Ontology based opinion mining for movie reviews. Springer, Berlin, pp 204–214

Zhou P, Liu J, Yang Z, Zhou G (2017) Scalable tag recommendation for software information sites. In: Proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER), pp 272–282. IEEE

Title: How do developers utilize source code from stack overflow?
Authors: Yuhao Wu
Shaowei Wang
Cor-Paul Bezemer
Katsuro Inoue
Publication date: 04-07-2018
Publisher: Springer US
Published in: Empirical Software Engineering / Issue 2/2019
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-018-9634-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 2/2019

Successes, challenges, and rethinking – an industrial investigation on crowdsourced mobile application testing

An empirical analysis of the transition from Python 2 to Python 3

An ensemble-based model for predicting agile software development effort

An empirical study on the issue reports with questions raised during the issue resolving process

Balancing the trade-off between accuracy and interpretability in software defect prediction

Will this clone be short-lived? Towards a better understanding of the characteristics of short-lived clones

Premium Partner