research-article

How Well Do Search Engines Support Code Retrieval on the Web?

Authors:
Susan Elliott Sim

University of California, Irvine

University of California, Irvine
View Profile

,
Medha Umarji

University of Maryland, Baltimore County

University of Maryland, Baltimore County
View Profile

,
Sukanya Ratanotayanon

University of California, Irvine

University of California, Irvine
View Profile

,
Cristina V. Lopes

University of California, Irvine

University of California, Irvine
View Profile

ACM Transactions on Software Engineering and Methodology Volume 21 Issue 1Article No.: 4pp 1–25https://doi.org/10.1145/2063239.2063243

Published:01 December 2011Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Software developers search the Web for various kinds of source code for diverse reasons. In a previous study, we found that searches varied along two dimensions: the size of the search target (e.g., block, subsystem, or system) and the motivation for the search (e.g., reference example or as-is reuse). Would each of these kinds of searches require different search technologies? To answer this question, we conducted an experiment with 36 participants to evaluate three diverse approaches (general purpose information retrieval, source code search, and component reuse), as represented by five Web sites (Google, Koders, Krugle, Google Code Search, and SourceForge). The independent variables were search engine, size of search target, and motivation for search. The dependent variable was the participants judgement of the relevance of the first ten hits. We found that it was easier to find reference examples than components for as-is reuse and that participants obtained the best results using a general-purpose information retrieval site. However, we also found an interaction effect: code-specific search engines worked better in searches for subsystems, but Google worked better on searches for blocks. These results can be used to guide the creation of new tools for retrieving source code from the Web.

Supplemental Material

Available for Download

pdf

a4-sim_appendix.pdf (36.6 KB)

The proof is given in an electronic appendix, available online in the ACM Digital Library.

References

Aiken, A. and Murphy, B. R. 1991. Implementing regular tree expressions. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture. Springer, 427--447. Google ScholarDigital Library
Balmas, F. 1999. Qbo: A query tool specially developed to explore programs. In Proceedings of the 6th Working Conference on Reverse Engineering. IEEE Computer Society Press, Los Alamitos, CA, 270--279. Google ScholarDigital Library
Boehm, B. 2006. A view of 20th and 21st century software engineering. In Proceedings of the 28th International Conference on Software Engineering. ACM Press, New York, NY, 12--29. Google ScholarDigital Library
Borgman, C. L. 1996. Why are online catalogs still hard to use? J. Amer. Soc. Inform. Sci. 47, 7, 493--503. Google ScholarDigital Library
Brooks, F. P. 1987. No silver bullet: Essence and accidents of software engineering. IEEE Comput. 20, 4, 10--19. Google ScholarDigital Library
Chen, Y., Nishimoto, M., and Ramamoorthy, C. 1990. The c information abstraction system. IEEE Trans. Softw. Engin. 16, 3, 325--334. Google ScholarDigital Library
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences 2nd Ed. Lawrence Erlbaum Associates, Philadelphia, PA.Google Scholar
Craswell, N. and Hawking, D. 2004. Overview of the trec 2004 Webl track. In Proceedings of the 13th Text REtrieval Conference. NIST, Gaithersburg, MD, 1--9.Google Scholar
Deshpande, A. and Riehle, D. 2008. The total growth of open source. In Proceedings of the 4th IFIP International Conference on Open Source Systems (OSS’08). Springer.Google Scholar
Faul, F., Erdfelder, E., Lang, A.-G., and Buchner, A. 2007. G*power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Resear. Meth. 39, 175--191.Google ScholarCross Ref
Fischer, G., Henninger, S., and Redmiles, D. 1991. Cognitive tools for locating and comprehending software objects for reuse. In Proceedings of the 13th International Conference on Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 318--328. Google ScholarDigital Library
Frakes, W. B. and Kang, K. 2005. Software reuse research: Status and future. IEEE Trans. Softw. Engin. 31, 7, 529--536. Google ScholarDigital Library
Gil, J. and Maman, I. 2005. Micro patterns in java code. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, NY, 97--116. Google ScholarDigital Library
Grechanik, M., Conroy, K. M., and Probst, K. 2007. Finding relevant applications for prototyping. In Proceedings of the 4th International Workshop on Mining Software Repositories. IEEE Computer Society, Los Alamitos, CA, 12. Google ScholarDigital Library
Hartmann, B., Doorley, S., and Klemmer, S. R. 2006. Hacking, mashing, gluing: A study of opportunistic design. Tech. rep. CSTR 2006-14, Department of Computer Science, Stanford University.Google Scholar
Hill, E., Fry, Z. P., Boyd, H., Sridhara, G., Novikova, Y., Pollock, L., and Vijay-Shankar, K. 2008. Amap: Automatically mining abbreviation expansions in programs to enhance software maintenance tools. In Proceedings of the 5th Working Conference on Mining Software Repositories. ACM Press, New York, NY. Google ScholarDigital Library
Hoffmann, R., Fogarty, J., and Weld, D. S. 2007. Assieme: Finding and leveraging implicit references in a Web search interface for programmers. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology. ACM Press New York, NY, USA, 13--22. Google ScholarDigital Library
Holmes, R. and Murphy, G. 2005. Using structural context to recommend source code examples. In Proceedings of the 27th International Conference on Software Engineering. ACM Press, New York, NY, 117--125. Google ScholarDigital Library
Holmes, R. and Walker, R. 2007. Supporting the investigation and planning of pragmatic reuse tasks. In Proceedings of the 29th International Conference on Software Engineering. IEEE Computer Society Press, Los Alamitos, CA, 447--457. Google ScholarDigital Library
Holmes, R. and Walker, R. 2008. Lightweight, semi-automated enactment of pragmatic-reuse plans. In Proceedings of the 10th International Conference on a High Confidence Software Reuse in Large Systems. Lecture Notes in Computer Science, vol. 5030, Springer, 330--342. Google ScholarDigital Library
Holmes, R., Walker, R. J., and Murphy, G. C. 2005. Strathcona example recommendation tool. In ESEC/SIGSOFT FSE, M. Wermelinger and H. Gall, Eds. ACM, 237--240. Google ScholarDigital Library
Hummel, O., Janjic, W., and Atkinson, C. 2008. Code conjurer: Pulling reusable software out of thin air. IEEE Softw. 25, 5, 45--52. Google ScholarDigital Library
Kamp, P.-H. 2004. Keep in touch! IEEE Softw. 21, 1, 46--47.Google Scholar
Kampenes, V. B., Dybå, T., Hannay, J. E., and Sjøberg, D. I. 2007. A systematic review of effect size in software engineering experiments. Info. Softw. Techn. 49, 11-12, 1073--1086. Google ScholarDigital Library
Langville, A. and Meyer, C. 2006. Google’s Pagerank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton, NJ. Google ScholarDigital Library
Lemos, O., Bajracharya, S., and Ossher, J. 2007. Codegenie: A tool for test-driven source code search. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, NY, 917--918. Google ScholarDigital Library
Linton, M. 1984. Implementing relational views of programs. ACM SIGPLAN Notices 19, 5, 132--140. Google ScholarDigital Library
Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. 2005. Jungloid mining: Helping to navigate the api jungle. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM New York, NY, 48--61. Google ScholarDigital Library
Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. 2004. An information retrieval approach to concept location in source code. In Proceedings of the 11th Working Conference on Reverse Engineering. IEEE Computer Society Press, Los Alamitos, CA, 214--223. Google ScholarDigital Library
Mili, A., Mili, R., and Mittermeir, R. 1998. A survey of software reuse libraries. Ann. Softw. Engin. 5, 349--414. Google ScholarDigital Library
Murphy, G. and Notkin, D. 1996. Lightweight lexical source model extraction. ACM Trans. Softw. Engin. Meth. 5, 3, 262--292. Google ScholarDigital Library
Noble, J. and Biddle, R. 2002. Notes on postmodern programming. In Proceedings of the Onward Track at OOPSLA. vol. 2. ACM Press, New York, NY, 49--71.Google Scholar
Northrop, L., Feiler, P., Gabriel, R. P., Goodenough, J., Linger, R., Longstaff, T., Kazman, R., Klein, M., Schmidt, D., Sullivan, K., and Wallnau, K. 2006. Ultra-large-scale systems: The software challenge of the future. Tech. rep., Software Engineering Institute, Carnegie Mellon University.Google Scholar
Nuseibeh, B. 2001. Weaving together requirements and architectures. IEEE Comput. 34, 2, 115--117. Google ScholarDigital Library
Paul, S. and Prakash, A. 1994. A framework for source code search using program patterns. IEEE Trans. Softw. Engin. 20, 6, 463--475. Google ScholarDigital Library
Prieto-Diaz, R. 1991. Implementing faceted classification for software reuse. Comm. ACM 34, 5, 88--97. Google ScholarDigital Library
Ravichandran, T. and Rothenberger, M. 2003. Software reuse strategies and component markets. Comm. ACM 46, 8, 109--114. Google ScholarDigital Library
Rosenthal, R. 1983. The Volunteer Subject. Ardent Media Inc.Google Scholar
Sahavechaphan, N. and Claypool, K. T. 2006. Xsnippet: Mining for sample code. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, NY, 413--430. Google ScholarDigital Library
Saracevic, T. 2007a. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part ii: Nature and manifestations of relevance. J. Amer. Soc. Inform. Sci. Techn. 58, 13, 1915--1933. Google ScholarDigital Library
Saracevic, T. 2007b. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part iii: Behavior and effects of relevance. J. Amer. Soc. Inform. Sci. Techn. 58, 13, 2126--2144. Google ScholarDigital Library
Shavelson, R. J. 1996. Statisical Reasoning for the Behavioral Sciences 3rd Ed. Allyn and Bacon.Google Scholar
Sim, S. E., Clarke, C. L. A., and Holt, R. C. 1998. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 6th International Workshop on Program Comprehension. IEEE Computer Society, Los Alamitos, CA, 180. Google ScholarDigital Library
Singer, J. and Lethbridge, T. 1997. What’s so great about ‘grep’? implications for program comprehension tools. Tech. rep., National Research Council, Canada.Google Scholar
Spinellis, D. and Szyperski, C. 2004. Guest editors’ introduction: How is open source affecting software development? IEEE Softw. 21, 1, 28--33. Google ScholarDigital Library
Thompson, K. 1968. Programming techniques: Regular expression search algorithm. Comm. ACM 11, 6, 419--422. Google ScholarDigital Library
Thummalapenta, S. and Xie, T. 2007. ParseWeb: A programmer assistant for reusing open source code on the Web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. ACM Press, New York, NY, 204--213. Google ScholarDigital Library
Umarji, M., Sim, S. E., and Lopes, C. V. 2008. Archetypal internet-scale source code searching. In Proceedings of the Working Group on Open Source Software (OSS) at the IFIP 20th World Computer Congress. B. Russo, Ed. Springer, New York, NY, 7.Google Scholar
Voorhees, E. 2003. Overview of the trec 2003 question answering track. In Proceedings of the 12th Text REtrieval Conference. vol. 142. NIST, Gaithersburg, MD.Google Scholar
Xu, Y. and Chen, Z. 2006. Relevance judgment: What do information users consider beyond topicality? J. Amer. Soc. Inform. Sci. Techn. 57, 7, 961. Google ScholarDigital Library
Ye, Y. and Fischer, G. 2002. Supporting reuse by delivering task-relevant and personalized information. In Proceedings of the 24th Interntional Conference on Software Engineering. ACM Press, New York, NY, 513--523. Google ScholarDigital Library
Zaremski, A. and Wing, J. 1997. Specification matching of software components. ACM Trans. Softw. Engin. Methodol. 6, 4, 333--369. Google ScholarDigital Library
Zimmermann, T., Weißgerber, P., Diehl, S., and Zeller, A. 2005. Mining version histories to guide software changes. IEEE Trans. Softw. Eng. 31, 6, 429--445. Google ScholarDigital Library

Index Terms

How Well Do Search Engines Support Code Retrieval on the Web?

Recommendations

Evaluating how developers use general-purpose web-search for code retrieval
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories

Search is an integral part of a software development process. Developers often use search engines to look for information during development, including reusable code snippets, API understanding, and reference examples. Developers tend to prefer general-...
Read More
Overlap Among Major Web Search Engines
ITNG '06: Proceedings of the Third International Conference on Information Technology: New Generations

Our study examined the overlap among results retrieved by three major Web search engines for a large set of more than 10,316 queries. Previous smaller studies have discussed the lack of overlap in results returned by Web search engines for the same ...
Read More
A study of results overlap and uniqueness among major web search engines

The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Software Engineering and Methodology Volume 21, Issue 1
December 2011
205 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/2063239
Issue’s Table of Contents

Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2011
- Accepted: 1 April 2010
- Revised: 1 March 2010
- Received: 1 June 2008
Published in tosem Volume 21, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Empirical study
open source
opportunistic development
search archetypes
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 73
  Total Citations
  View Citations
- 929
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How Well Do Search Engines Support Code Retrieval on the Web?

ACM Transactions on Software Engineering and Methodology

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Evaluating how developers use general-purpose web-search for code retrieval

Overlap Among Major Web Search Engines

A study of results overlap and uniqueness among major web search engines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

How Well Do Search Engines Support Code Retrieval on the Web?

ACM Transactions on Software Engineering and Methodology

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Evaluating how developers use general-purpose web-search for code retrieval

Overlap Among Major Web Search Engines

A study of results overlap and uniqueness among major web search engines

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media