skip to main content
research-article

How Well Do Search Engines Support Code Retrieval on the Web?

Published:01 December 2011Publication History
Skip Abstract Section

Abstract

Software developers search the Web for various kinds of source code for diverse reasons. In a previous study, we found that searches varied along two dimensions: the size of the search target (e.g., block, subsystem, or system) and the motivation for the search (e.g., reference example or as-is reuse). Would each of these kinds of searches require different search technologies? To answer this question, we conducted an experiment with 36 participants to evaluate three diverse approaches (general purpose information retrieval, source code search, and component reuse), as represented by five Web sites (Google, Koders, Krugle, Google Code Search, and SourceForge). The independent variables were search engine, size of search target, and motivation for search. The dependent variable was the participants judgement of the relevance of the first ten hits. We found that it was easier to find reference examples than components for as-is reuse and that participants obtained the best results using a general-purpose information retrieval site. However, we also found an interaction effect: code-specific search engines worked better in searches for subsystems, but Google worked better on searches for blocks. These results can be used to guide the creation of new tools for retrieving source code from the Web.

Skip Supplemental Material Section

Supplemental Material

References

  1. Aiken, A. and Murphy, B. R. 1991. Implementing regular tree expressions. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture. Springer, 427--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Balmas, F. 1999. Qbo: A query tool specially developed to explore programs. In Proceedings of the 6th Working Conference on Reverse Engineering. IEEE Computer Society Press, Los Alamitos, CA, 270--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Boehm, B. 2006. A view of 20th and 21st century software engineering. In Proceedings of the 28th International Conference on Software Engineering. ACM Press, New York, NY, 12--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Borgman, C. L. 1996. Why are online catalogs still hard to use? J. Amer. Soc. Inform. Sci. 47, 7, 493--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brooks, F. P. 1987. No silver bullet: Essence and accidents of software engineering. IEEE Comput. 20, 4, 10--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chen, Y., Nishimoto, M., and Ramamoorthy, C. 1990. The c information abstraction system. IEEE Trans. Softw. Engin. 16, 3, 325--334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences 2nd Ed. Lawrence Erlbaum Associates, Philadelphia, PA.Google ScholarGoogle Scholar
  8. Craswell, N. and Hawking, D. 2004. Overview of the trec 2004 Webl track. In Proceedings of the 13th Text REtrieval Conference. NIST, Gaithersburg, MD, 1--9.Google ScholarGoogle Scholar
  9. Deshpande, A. and Riehle, D. 2008. The total growth of open source. In Proceedings of the 4th IFIP International Conference on Open Source Systems (OSS’08). Springer.Google ScholarGoogle Scholar
  10. Faul, F., Erdfelder, E., Lang, A.-G., and Buchner, A. 2007. G*power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Resear. Meth. 39, 175--191.Google ScholarGoogle ScholarCross RefCross Ref
  11. Fischer, G., Henninger, S., and Redmiles, D. 1991. Cognitive tools for locating and comprehending software objects for reuse. In Proceedings of the 13th International Conference on Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 318--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Frakes, W. B. and Kang, K. 2005. Software reuse research: Status and future. IEEE Trans. Softw. Engin. 31, 7, 529--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gil, J. and Maman, I. 2005. Micro patterns in java code. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, NY, 97--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Grechanik, M., Conroy, K. M., and Probst, K. 2007. Finding relevant applications for prototyping. In Proceedings of the 4th International Workshop on Mining Software Repositories. IEEE Computer Society, Los Alamitos, CA, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hartmann, B., Doorley, S., and Klemmer, S. R. 2006. Hacking, mashing, gluing: A study of opportunistic design. Tech. rep. CSTR 2006-14, Department of Computer Science, Stanford University.Google ScholarGoogle Scholar
  16. Hill, E., Fry, Z. P., Boyd, H., Sridhara, G., Novikova, Y., Pollock, L., and Vijay-Shankar, K. 2008. Amap: Automatically mining abbreviation expansions in programs to enhance software maintenance tools. In Proceedings of the 5th Working Conference on Mining Software Repositories. ACM Press, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hoffmann, R., Fogarty, J., and Weld, D. S. 2007. Assieme: Finding and leveraging implicit references in a Web search interface for programmers. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology. ACM Press New York, NY, USA, 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Holmes, R. and Murphy, G. 2005. Using structural context to recommend source code examples. In Proceedings of the 27th International Conference on Software Engineering. ACM Press, New York, NY, 117--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Holmes, R. and Walker, R. 2007. Supporting the investigation and planning of pragmatic reuse tasks. In Proceedings of the 29th International Conference on Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 447--457. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Holmes, R. and Walker, R. 2008. Lightweight, semi-automated enactment of pragmatic-reuse plans. In Proceedings of the 10th International Conference on a High Confidence Software Reuse in Large Systems. Lecture Notes in Computer Science, vol. 5030, Springer, 330--342. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Holmes, R., Walker, R. J., and Murphy, G. C. 2005. Strathcona example recommendation tool. In ESEC/SIGSOFT FSE, M. Wermelinger and H. Gall, Eds. ACM, 237--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hummel, O., Janjic, W., and Atkinson, C. 2008. Code conjurer: Pulling reusable software out of thin air. IEEE Softw. 25, 5, 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kamp, P.-H. 2004. Keep in touch! IEEE Softw. 21, 1, 46--47.Google ScholarGoogle Scholar
  24. Kampenes, V. B., Dybå, T., Hannay, J. E., and Sjøberg, D. I. 2007. A systematic review of effect size in software engineering experiments. Info. Softw. Techn. 49, 11-12, 1073--1086. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Langville, A. and Meyer, C. 2006. Google’s Pagerank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lemos, O., Bajracharya, S., and Ossher, J. 2007. Codegenie: A tool for test-driven source code search. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, NY, 917--918. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Linton, M. 1984. Implementing relational views of programs. ACM SIGPLAN Notices 19, 5, 132--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. 2005. Jungloid mining: Helping to navigate the api jungle. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM New York, NY, 48--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. 2004. An information retrieval approach to concept location in source code. In Proceedings of the 11th Working Conference on Reverse Engineering. IEEE Computer Society Press, Los Alamitos, CA, 214--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mili, A., Mili, R., and Mittermeir, R. 1998. A survey of software reuse libraries. Ann. Softw. Engin. 5, 349--414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Murphy, G. and Notkin, D. 1996. Lightweight lexical source model extraction. ACM Trans. Softw. Engin. Meth. 5, 3, 262--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Noble, J. and Biddle, R. 2002. Notes on postmodern programming. In Proceedings of the Onward Track at OOPSLA. vol. 2. ACM Press, New York, NY, 49--71.Google ScholarGoogle Scholar
  34. Northrop, L., Feiler, P., Gabriel, R. P., Goodenough, J., Linger, R., Longstaff, T., Kazman, R., Klein, M., Schmidt, D., Sullivan, K., and Wallnau, K. 2006. Ultra-large-scale systems: The software challenge of the future. Tech. rep., Software Engineering Institute, Carnegie Mellon University.Google ScholarGoogle Scholar
  35. Nuseibeh, B. 2001. Weaving together requirements and architectures. IEEE Comput. 34, 2, 115--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Paul, S. and Prakash, A. 1994. A framework for source code search using program patterns. IEEE Trans. Softw. Engin. 20, 6, 463--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Prieto-Diaz, R. 1991. Implementing faceted classification for software reuse. Comm. ACM 34, 5, 88--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ravichandran, T. and Rothenberger, M. 2003. Software reuse strategies and component markets. Comm. ACM 46, 8, 109--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Rosenthal, R. 1983. The Volunteer Subject. Ardent Media Inc.Google ScholarGoogle Scholar
  40. Sahavechaphan, N. and Claypool, K. T. 2006. Xsnippet: Mining for sample code. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, NY, 413--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Saracevic, T. 2007a. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part ii: Nature and manifestations of relevance. J. Amer. Soc. Inform. Sci. Techn. 58, 13, 1915--1933. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Saracevic, T. 2007b. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part iii: Behavior and effects of relevance. J. Amer. Soc. Inform. Sci. Techn. 58, 13, 2126--2144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Shavelson, R. J. 1996. Statisical Reasoning for the Behavioral Sciences 3rd Ed. Allyn and Bacon.Google ScholarGoogle Scholar
  44. Sim, S. E., Clarke, C. L. A., and Holt, R. C. 1998. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 6th International Workshop on Program Comprehension. IEEE Computer Society, Los Alamitos, CA, 180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Singer, J. and Lethbridge, T. 1997. What’s so great about ‘grep’? implications for program comprehension tools. Tech. rep., National Research Council, Canada.Google ScholarGoogle Scholar
  46. Spinellis, D. and Szyperski, C. 2004. Guest editors’ introduction: How is open source affecting software development? IEEE Softw. 21, 1, 28--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Thompson, K. 1968. Programming techniques: Regular expression search algorithm. Comm. ACM 11, 6, 419--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Thummalapenta, S. and Xie, T. 2007. ParseWeb: A programmer assistant for reusing open source code on the Web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. ACM Press, New York, NY, 204--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Umarji, M., Sim, S. E., and Lopes, C. V. 2008. Archetypal internet-scale source code searching. In Proceedings of the Working Group on Open Source Software (OSS) at the IFIP 20th World Computer Congress. B. Russo, Ed. Springer, New York, NY, 7.Google ScholarGoogle Scholar
  50. Voorhees, E. 2003. Overview of the trec 2003 question answering track. In Proceedings of the 12th Text REtrieval Conference. vol. 142. NIST, Gaithersburg, MD.Google ScholarGoogle Scholar
  51. Xu, Y. and Chen, Z. 2006. Relevance judgment: What do information users consider beyond topicality? J. Amer. Soc. Inform. Sci. Techn. 57, 7, 961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ye, Y. and Fischer, G. 2002. Supporting reuse by delivering task-relevant and personalized information. In Proceedings of the 24th Interntional Conference on Software Engineering. ACM Press, New York, NY, 513--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Zaremski, A. and Wing, J. 1997. Specification matching of software components. ACM Trans. Softw. Engin. Methodol. 6, 4, 333--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Zimmermann, T., Weißgerber, P., Diehl, S., and Zeller, A. 2005. Mining version histories to guide software changes. IEEE Trans. Softw. Eng. 31, 6, 429--445. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How Well Do Search Engines Support Code Retrieval on the Web?

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Software Engineering and Methodology
              ACM Transactions on Software Engineering and Methodology  Volume 21, Issue 1
              December 2011
              205 pages
              ISSN:1049-331X
              EISSN:1557-7392
              DOI:10.1145/2063239
              Issue’s Table of Contents

              Copyright © 2011 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 December 2011
              • Accepted: 1 April 2010
              • Revised: 1 March 2010
              • Received: 1 June 2008
              Published in tosem Volume 21, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader