Abstract
Software developers search the Web for various kinds of source code for diverse reasons. In a previous study, we found that searches varied along two dimensions: the size of the search target (e.g., block, subsystem, or system) and the motivation for the search (e.g., reference example or as-is reuse). Would each of these kinds of searches require different search technologies? To answer this question, we conducted an experiment with 36 participants to evaluate three diverse approaches (general purpose information retrieval, source code search, and component reuse), as represented by five Web sites (Google, Koders, Krugle, Google Code Search, and SourceForge). The independent variables were search engine, size of search target, and motivation for search. The dependent variable was the participants judgement of the relevance of the first ten hits. We found that it was easier to find reference examples than components for as-is reuse and that participants obtained the best results using a general-purpose information retrieval site. However, we also found an interaction effect: code-specific search engines worked better in searches for subsystems, but Google worked better on searches for blocks. These results can be used to guide the creation of new tools for retrieving source code from the Web.
Supplemental Material
Available for Download
The proof is given in an electronic appendix, available online in the ACM Digital Library.
- Aiken, A. and Murphy, B. R. 1991. Implementing regular tree expressions. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture. Springer, 427--447. Google ScholarDigital Library
- Balmas, F. 1999. Qbo: A query tool specially developed to explore programs. In Proceedings of the 6th Working Conference on Reverse Engineering. IEEE Computer Society Press, Los Alamitos, CA, 270--279. Google ScholarDigital Library
- Boehm, B. 2006. A view of 20th and 21st century software engineering. In Proceedings of the 28th International Conference on Software Engineering. ACM Press, New York, NY, 12--29. Google ScholarDigital Library
- Borgman, C. L. 1996. Why are online catalogs still hard to use? J. Amer. Soc. Inform. Sci. 47, 7, 493--503. Google ScholarDigital Library
- Brooks, F. P. 1987. No silver bullet: Essence and accidents of software engineering. IEEE Comput. 20, 4, 10--19. Google ScholarDigital Library
- Chen, Y., Nishimoto, M., and Ramamoorthy, C. 1990. The c information abstraction system. IEEE Trans. Softw. Engin. 16, 3, 325--334. Google ScholarDigital Library
- Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences 2nd Ed. Lawrence Erlbaum Associates, Philadelphia, PA.Google Scholar
- Craswell, N. and Hawking, D. 2004. Overview of the trec 2004 Webl track. In Proceedings of the 13th Text REtrieval Conference. NIST, Gaithersburg, MD, 1--9.Google Scholar
- Deshpande, A. and Riehle, D. 2008. The total growth of open source. In Proceedings of the 4th IFIP International Conference on Open Source Systems (OSS’08). Springer.Google Scholar
- Faul, F., Erdfelder, E., Lang, A.-G., and Buchner, A. 2007. G*power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Resear. Meth. 39, 175--191.Google ScholarCross Ref
- Fischer, G., Henninger, S., and Redmiles, D. 1991. Cognitive tools for locating and comprehending software objects for reuse. In Proceedings of the 13th International Conference on Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 318--328. Google ScholarDigital Library
- Frakes, W. B. and Kang, K. 2005. Software reuse research: Status and future. IEEE Trans. Softw. Engin. 31, 7, 529--536. Google ScholarDigital Library
- Gil, J. and Maman, I. 2005. Micro patterns in java code. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, NY, 97--116. Google ScholarDigital Library
- Grechanik, M., Conroy, K. M., and Probst, K. 2007. Finding relevant applications for prototyping. In Proceedings of the 4th International Workshop on Mining Software Repositories. IEEE Computer Society, Los Alamitos, CA, 12. Google ScholarDigital Library
- Hartmann, B., Doorley, S., and Klemmer, S. R. 2006. Hacking, mashing, gluing: A study of opportunistic design. Tech. rep. CSTR 2006-14, Department of Computer Science, Stanford University.Google Scholar
- Hill, E., Fry, Z. P., Boyd, H., Sridhara, G., Novikova, Y., Pollock, L., and Vijay-Shankar, K. 2008. Amap: Automatically mining abbreviation expansions in programs to enhance software maintenance tools. In Proceedings of the 5th Working Conference on Mining Software Repositories. ACM Press, New York, NY. Google ScholarDigital Library
- Hoffmann, R., Fogarty, J., and Weld, D. S. 2007. Assieme: Finding and leveraging implicit references in a Web search interface for programmers. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology. ACM Press New York, NY, USA, 13--22. Google ScholarDigital Library
- Holmes, R. and Murphy, G. 2005. Using structural context to recommend source code examples. In Proceedings of the 27th International Conference on Software Engineering. ACM Press, New York, NY, 117--125. Google ScholarDigital Library
- Holmes, R. and Walker, R. 2007. Supporting the investigation and planning of pragmatic reuse tasks. In Proceedings of the 29th International Conference on Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 447--457. Google ScholarDigital Library
- Holmes, R. and Walker, R. 2008. Lightweight, semi-automated enactment of pragmatic-reuse plans. In Proceedings of the 10th International Conference on a High Confidence Software Reuse in Large Systems. Lecture Notes in Computer Science, vol. 5030, Springer, 330--342. Google ScholarDigital Library
- Holmes, R., Walker, R. J., and Murphy, G. C. 2005. Strathcona example recommendation tool. In ESEC/SIGSOFT FSE, M. Wermelinger and H. Gall, Eds. ACM, 237--240. Google ScholarDigital Library
- Hummel, O., Janjic, W., and Atkinson, C. 2008. Code conjurer: Pulling reusable software out of thin air. IEEE Softw. 25, 5, 45--52. Google ScholarDigital Library
- Kamp, P.-H. 2004. Keep in touch! IEEE Softw. 21, 1, 46--47.Google Scholar
- Kampenes, V. B., Dybå, T., Hannay, J. E., and Sjøberg, D. I. 2007. A systematic review of effect size in software engineering experiments. Info. Softw. Techn. 49, 11-12, 1073--1086. Google ScholarDigital Library
- Langville, A. and Meyer, C. 2006. Google’s Pagerank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton, NJ. Google ScholarDigital Library
- Lemos, O., Bajracharya, S., and Ossher, J. 2007. Codegenie: A tool for test-driven source code search. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, NY, 917--918. Google ScholarDigital Library
- Linton, M. 1984. Implementing relational views of programs. ACM SIGPLAN Notices 19, 5, 132--140. Google ScholarDigital Library
- Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. 2005. Jungloid mining: Helping to navigate the api jungle. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM New York, NY, 48--61. Google ScholarDigital Library
- Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
- Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. 2004. An information retrieval approach to concept location in source code. In Proceedings of the 11th Working Conference on Reverse Engineering. IEEE Computer Society Press, Los Alamitos, CA, 214--223. Google ScholarDigital Library
- Mili, A., Mili, R., and Mittermeir, R. 1998. A survey of software reuse libraries. Ann. Softw. Engin. 5, 349--414. Google ScholarDigital Library
- Murphy, G. and Notkin, D. 1996. Lightweight lexical source model extraction. ACM Trans. Softw. Engin. Meth. 5, 3, 262--292. Google ScholarDigital Library
- Noble, J. and Biddle, R. 2002. Notes on postmodern programming. In Proceedings of the Onward Track at OOPSLA. vol. 2. ACM Press, New York, NY, 49--71.Google Scholar
- Northrop, L., Feiler, P., Gabriel, R. P., Goodenough, J., Linger, R., Longstaff, T., Kazman, R., Klein, M., Schmidt, D., Sullivan, K., and Wallnau, K. 2006. Ultra-large-scale systems: The software challenge of the future. Tech. rep., Software Engineering Institute, Carnegie Mellon University.Google Scholar
- Nuseibeh, B. 2001. Weaving together requirements and architectures. IEEE Comput. 34, 2, 115--117. Google ScholarDigital Library
- Paul, S. and Prakash, A. 1994. A framework for source code search using program patterns. IEEE Trans. Softw. Engin. 20, 6, 463--475. Google ScholarDigital Library
- Prieto-Diaz, R. 1991. Implementing faceted classification for software reuse. Comm. ACM 34, 5, 88--97. Google ScholarDigital Library
- Ravichandran, T. and Rothenberger, M. 2003. Software reuse strategies and component markets. Comm. ACM 46, 8, 109--114. Google ScholarDigital Library
- Rosenthal, R. 1983. The Volunteer Subject. Ardent Media Inc.Google Scholar
- Sahavechaphan, N. and Claypool, K. T. 2006. Xsnippet: Mining for sample code. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, NY, 413--430. Google ScholarDigital Library
- Saracevic, T. 2007a. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part ii: Nature and manifestations of relevance. J. Amer. Soc. Inform. Sci. Techn. 58, 13, 1915--1933. Google ScholarDigital Library
- Saracevic, T. 2007b. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part iii: Behavior and effects of relevance. J. Amer. Soc. Inform. Sci. Techn. 58, 13, 2126--2144. Google ScholarDigital Library
- Shavelson, R. J. 1996. Statisical Reasoning for the Behavioral Sciences 3rd Ed. Allyn and Bacon.Google Scholar
- Sim, S. E., Clarke, C. L. A., and Holt, R. C. 1998. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 6th International Workshop on Program Comprehension. IEEE Computer Society, Los Alamitos, CA, 180. Google ScholarDigital Library
- Singer, J. and Lethbridge, T. 1997. What’s so great about ‘grep’? implications for program comprehension tools. Tech. rep., National Research Council, Canada.Google Scholar
- Spinellis, D. and Szyperski, C. 2004. Guest editors’ introduction: How is open source affecting software development? IEEE Softw. 21, 1, 28--33. Google ScholarDigital Library
- Thompson, K. 1968. Programming techniques: Regular expression search algorithm. Comm. ACM 11, 6, 419--422. Google ScholarDigital Library
- Thummalapenta, S. and Xie, T. 2007. ParseWeb: A programmer assistant for reusing open source code on the Web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. ACM Press, New York, NY, 204--213. Google ScholarDigital Library
- Umarji, M., Sim, S. E., and Lopes, C. V. 2008. Archetypal internet-scale source code searching. In Proceedings of the Working Group on Open Source Software (OSS) at the IFIP 20th World Computer Congress. B. Russo, Ed. Springer, New York, NY, 7.Google Scholar
- Voorhees, E. 2003. Overview of the trec 2003 question answering track. In Proceedings of the 12th Text REtrieval Conference. vol. 142. NIST, Gaithersburg, MD.Google Scholar
- Xu, Y. and Chen, Z. 2006. Relevance judgment: What do information users consider beyond topicality? J. Amer. Soc. Inform. Sci. Techn. 57, 7, 961. Google ScholarDigital Library
- Ye, Y. and Fischer, G. 2002. Supporting reuse by delivering task-relevant and personalized information. In Proceedings of the 24th Interntional Conference on Software Engineering. ACM Press, New York, NY, 513--523. Google ScholarDigital Library
- Zaremski, A. and Wing, J. 1997. Specification matching of software components. ACM Trans. Softw. Engin. Methodol. 6, 4, 333--369. Google ScholarDigital Library
- Zimmermann, T., Weißgerber, P., Diehl, S., and Zeller, A. 2005. Mining version histories to guide software changes. IEEE Trans. Softw. Eng. 31, 6, 429--445. Google ScholarDigital Library
Index Terms
- How Well Do Search Engines Support Code Retrieval on the Web?
Recommendations
Evaluating how developers use general-purpose web-search for code retrieval
MSR '18: Proceedings of the 15th International Conference on Mining Software RepositoriesSearch is an integral part of a software development process. Developers often use search engines to look for information during development, including reusable code snippets, API understanding, and reference examples. Developers tend to prefer general-...
Overlap Among Major Web Search Engines
ITNG '06: Proceedings of the Third International Conference on Information Technology: New GenerationsOur study examined the overlap among results retrieved by three major Web search engines for a large set of more than 10,316 queries. Previous smaller studies have discussed the lack of overlap in results returned by Web search engines for the same ...
A study of results overlap and uniqueness among major web search engines
The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Comments