skip to main content
10.1145/1882291.1882316acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Leveraging usage similarity for effective retrieval of examples in code repositories

Published:07 November 2010Publication History

ABSTRACT

Developers often learn to use APIs (Application Programming Interfaces) by looking at existing examples of API usage. Code repositories contain many instances of such usage of APIs. However, conventional information retrieval techniques fail to perform well in retrieving API usage examples from code repositories. This paper presents Structural Semantic Indexing (SSI), a technique to associate words to source code entities based on similarities of API usage. The heuristic behind this technique is that entities (classes, methods, etc.) that show similar uses of APIs are semantically related because they do similar things. We evaluate the effectiveness of SSI in code retrieval by comparing three SSI based retrieval schemes with two conventional baseline schemes. We evaluate the performance of the retrieval schemes by running a set of 20 candidate queries against a repository containing 222,397 source code entities from 346 jars belonging to the Eclipse framework. The results of the evaluation show that SSI is effective in improving the retrieval of examples in code repositories.

References

  1. Stackoverflow Web Site. http://stackoverflow.com.Google ScholarGoogle Scholar
  2. Java2s Web Site. http://java2s.com/.Google ScholarGoogle Scholar
  3. Apache lucene - scoring web page http://lucene.apache.org/java/2_4_0/scoring.html, Mar 2010.Google ScholarGoogle Scholar
  4. Eclipse faqs web site http://wiki.eclipse.org/index.php/Eclipse_FAQs, Jan 2010.Google ScholarGoogle Scholar
  5. Lucene web site. http://lucene.apache.org, Jan 2010.Google ScholarGoogle Scholar
  6. Sourcerer wiki page on api location http://wiki.github.com/sourcerer/Sourcerer/locating, Jan 2010.Google ScholarGoogle Scholar
  7. Swt snippets example web site http://www.eclipse.org/swt/snippets/, Jan. 2010.Google ScholarGoogle Scholar
  8. J. Arthorne and C. Laffra. Official Eclipse 3.0 FAQs. Addison-Wesley Professional, July 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: a search engine for open source code supporting structure-based search. pages 681--682, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An internet-scale software repository. In First Intl. Workshop on Search Driven Development -- Users, Infrastructure, Tools and Evaluation. ICSE 2009, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Balabanović and Y. Shoham. Fab: content-based, collaborative recommendation. Commun. ACM, 40(3):66--72, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In Proceegings of FSE, pages 213--222, Amsterdam, The Netherlands, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Chatterjee, S. Juvekar, and K. Sen. SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering, pages 385--400. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. W. Cleverdon. Factors determining the performance of indexing systems. 1966.Google ScholarGoogle Scholar
  15. B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison Wesley, 1 edition, Feb. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Dagenais and H. Ossher. Automatically locating framework extension examples. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 203--213, Atlanta, Georgia, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. D'Anjou, S. Fairbrother, D. Kehn, J. Kellerman, and P. McCarthy. The Java Developer's Guide to Eclipse, 2nd Edition. Addison-Wesley Professional, 2 edition, Nov. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Fischer, S. Henninger, and D. Redmiles. Cognitive tools for locating and comprehending software objects for reuse. In Proceedings of the 13th international conference on Software engineering, pages 318--328, Austin, Texas, United States, 1991. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Commun. ACM, 30:964--971, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Grechanik, K. M. Conroy, and K. A. Probst. Finding Relevant Applications for Prototyping. In Proceedings of the Fourth International Workshop on Mining Software Repositories, page 12. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Grechanik and D. Poshyvanyk. Evaluating recommended applications. In Proceedings of the 2008 international workshop on Recommendation systems for software engineering, pages 33--35, Atlanta, Georgia, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Henninger. An evolutionary approach to constructing effective software reuse repositories. ACM Trans. Softw. Eng. Methodol., 6(2):111--140, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. R. Henninger. Locating relevant examples for example-based software design. PhD thesis, University of Colorado at Boulder, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Hoffmann, J. Fogarty, and D. S. Weld. Assieme: finding and leveraging implicit references in a web search interface for programmers. In Proceedings of the 20th annual ACM symposium on User interface software and technology, pages 13--22, Newport, Rhode Island, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Hollander and D. A. Wolfe. Nonparametric Statistical Methods, 2nd Edition. Wiley-Interscience, 2 edition, Jan. 1999.Google ScholarGoogle Scholar
  26. R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In ICSE '05: Proceedings of the 27th international conference on Software engineering, pages 117--125, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18(2):300--336, Apr. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Little and R. C. Miller. Keyword programming in java. Automated Software Engg., 16(1):37--71, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In PLDI '05, pages 48--61, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. D. Manning, P. Raghavan, and H. Schufitze. Introduction to Information Retrieval. Cambridge University Press, 1 edition, July 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. McCarey, M. O. Cinneide, and N. Kushmerick. A recommender agent for software libraries: An evaluation of memory-based and model-based collaborative filtering. pages 154--162. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Nykaza, R. Messinger, F. Boehme, C. L. Norman, M. Mace, and M. Gordon. What programmers really want: results of a needs assessment for SDK documentation. In Proceedings of the 20th annual international conference on Computer documentation, pages 133--141, Toronto, Ontario, Canada, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Ossher, S. Bajracharya, and C. Lopes. SourcererDB: An aggregated repository of statically analyzed and cross-linked open source java projects. In MSR 2009: 6th IEEE Working Conference on Mining Software Repositories, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. F. Redmiles. Reducing the variability of programmers' performance through explained examples. In Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in computing systems, pages 67--73, Amsterdam, The Netherlands, 1993. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. P. Robillard. What Makes APIs Hard to Learn? Answers from Developers. IEEE Softw., 26(6):27--34, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. B. Rosson and J. M. Carroll. The reuse of uses in smalltalk programming. ACM Trans. Comput.-Hum. Interact., 3(3):219--253, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. Sahavechaphan and K. Claypool. Xsnippet: mining for sample code. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, pages 413--430, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Salton. The state of retrieval system evaluation. Inf. Process. Manage., 28(4):441--449, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Shull, F. Lanubile, and V. R. Basili. Investigating Reading Techniques for Object-Oriented Framework Learning. IEEE Trans. Softw. Eng., 26(11):1101--1118, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Stylos and B. A. Myers. Mica: A Web-Search tool for finding API components and examples. In Proceedings of the Visual Languages and Human-Centric Computing, pages 195--202. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Stylos, B. A. Myers, and Z. Yang. Jadeite: improving API documentation using usage information. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, pages 4429--4434, Boston, MA, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Thummalapenta and T. Xie. Parseweb: a programmer assistant for reusing open source code on the web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pages 204--213, Atlanta, Georgia, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. S. Thummalapenta and T. Xie. SpotWeb: detecting framework hotspots and coldspots via mining open source code on the web. In Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM International Conference on, pages 327--336, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Umarji, S. Sim, and C. Lopes. Archetypal Internet-Scale source code searching. In Open Source Development, Communities and Quality, volume 275/2008 of IFIP International Federation for Information Processing, pages 257--263. Springer Boston, 2008.Google ScholarGoogle Scholar
  45. Web site for Google Code Search. http://www.google.com/codesearch, 2010.Google ScholarGoogle Scholar
  46. Web site for Koders. http://www.koders.com, 2010.Google ScholarGoogle Scholar
  47. Web site for Krugle. http://www.krugle.com, 2010.Google ScholarGoogle Scholar
  48. P. Willett, J. M. Barnard, and G. M. Downs. Chemical Similarity Searching. Journal of Chemical Information and Computer Sciences, 38(6):983--996, Nov. 1998.Google ScholarGoogle ScholarCross RefCross Ref
  49. Y. Ye and G. Fischer. Reuse-conducive development environments. Automated Software Engg., 12:199--235, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Y. Ye, G. Fischer, and B. Reeves. Integrating active information delivery and reuse repository systems. pages 60--68, New York, NY, USA, 2000. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Y. Ye, Y. Yamamoto, K. Nakakoji, Y. Nishinaka, and M. Asada. Searching the library and asking the peers: learning to use java APIs on demand. In Proceedings of the 5th international symposium on Principles and practice of programming in Java, pages 41--50, Lisboa, Portugal, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Leveraging usage similarity for effective retrieval of examples in code repositories

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
        November 2010
        302 pages
        ISBN:9781605587912
        DOI:10.1145/1882291

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 November 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate17of128submissions,13%

        Upcoming Conference

        FSE '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader