ABSTRACT
Developers often learn to use APIs (Application Programming Interfaces) by looking at existing examples of API usage. Code repositories contain many instances of such usage of APIs. However, conventional information retrieval techniques fail to perform well in retrieving API usage examples from code repositories. This paper presents Structural Semantic Indexing (SSI), a technique to associate words to source code entities based on similarities of API usage. The heuristic behind this technique is that entities (classes, methods, etc.) that show similar uses of APIs are semantically related because they do similar things. We evaluate the effectiveness of SSI in code retrieval by comparing three SSI based retrieval schemes with two conventional baseline schemes. We evaluate the performance of the retrieval schemes by running a set of 20 candidate queries against a repository containing 222,397 source code entities from 346 jars belonging to the Eclipse framework. The results of the evaluation show that SSI is effective in improving the retrieval of examples in code repositories.
- Stackoverflow Web Site. http://stackoverflow.com.Google Scholar
- Java2s Web Site. http://java2s.com/.Google Scholar
- Apache lucene - scoring web page http://lucene.apache.org/java/2_4_0/scoring.html, Mar 2010.Google Scholar
- Eclipse faqs web site http://wiki.eclipse.org/index.php/Eclipse_FAQs, Jan 2010.Google Scholar
- Lucene web site. http://lucene.apache.org, Jan 2010.Google Scholar
- Sourcerer wiki page on api location http://wiki.github.com/sourcerer/Sourcerer/locating, Jan 2010.Google Scholar
- Swt snippets example web site http://www.eclipse.org/swt/snippets/, Jan. 2010.Google Scholar
- J. Arthorne and C. Laffra. Official Eclipse 3.0 FAQs. Addison-Wesley Professional, July 2004. Google ScholarDigital Library
- S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: a search engine for open source code supporting structure-based search. pages 681--682, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An internet-scale software repository. In First Intl. Workshop on Search Driven Development -- Users, Infrastructure, Tools and Evaluation. ICSE 2009, 2009. Google ScholarDigital Library
- M. Balabanović and Y. Shoham. Fab: content-based, collaborative recommendation. Commun. ACM, 40(3):66--72, 1997. Google ScholarDigital Library
- M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In Proceegings of FSE, pages 213--222, Amsterdam, The Netherlands, 2009. ACM. Google ScholarDigital Library
- S. Chatterjee, S. Juvekar, and K. Sen. SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering, pages 385--400. 2009. Google ScholarDigital Library
- C. W. Cleverdon. Factors determining the performance of indexing systems. 1966.Google Scholar
- B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison Wesley, 1 edition, Feb. 2009. Google ScholarDigital Library
- B. Dagenais and H. Ossher. Automatically locating framework extension examples. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 203--213, Atlanta, Georgia, 2008. ACM. Google ScholarDigital Library
- J. D'Anjou, S. Fairbrother, D. Kehn, J. Kellerman, and P. McCarthy. The Java Developer's Guide to Eclipse, 2nd Edition. Addison-Wesley Professional, 2 edition, Nov. 2004. Google ScholarDigital Library
- G. Fischer, S. Henninger, and D. Redmiles. Cognitive tools for locating and comprehending software objects for reuse. In Proceedings of the 13th international conference on Software engineering, pages 318--328, Austin, Texas, United States, 1991. IEEE Computer Society Press. Google ScholarDigital Library
- G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Commun. ACM, 30:964--971, 1987. Google ScholarDigital Library
- M. Grechanik, K. M. Conroy, and K. A. Probst. Finding Relevant Applications for Prototyping. In Proceedings of the Fourth International Workshop on Mining Software Repositories, page 12. IEEE Computer Society, 2007. Google ScholarDigital Library
- M. Grechanik and D. Poshyvanyk. Evaluating recommended applications. In Proceedings of the 2008 international workshop on Recommendation systems for software engineering, pages 33--35, Atlanta, Georgia, 2008. ACM. Google ScholarDigital Library
- S. Henninger. An evolutionary approach to constructing effective software reuse repositories. ACM Trans. Softw. Eng. Methodol., 6(2):111--140, 1997. Google ScholarDigital Library
- S. R. Henninger. Locating relevant examples for example-based software design. PhD thesis, University of Colorado at Boulder, 1993. Google ScholarDigital Library
- R. Hoffmann, J. Fogarty, and D. S. Weld. Assieme: finding and leveraging implicit references in a web search interface for programmers. In Proceedings of the 20th annual ACM symposium on User interface software and technology, pages 13--22, Newport, Rhode Island, USA, 2007. ACM. Google ScholarDigital Library
- M. Hollander and D. A. Wolfe. Nonparametric Statistical Methods, 2nd Edition. Wiley-Interscience, 2 edition, Jan. 1999.Google Scholar
- R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In ICSE '05: Proceedings of the 27th international conference on Software engineering, pages 117--125, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18(2):300--336, Apr. 2009. Google ScholarDigital Library
- G. Little and R. C. Miller. Keyword programming in java. Automated Software Engg., 16(1):37--71, 2009. Google ScholarDigital Library
- D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In PLDI '05, pages 48--61, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schufitze. Introduction to Information Retrieval. Cambridge University Press, 1 edition, July 2008. Google ScholarDigital Library
- F. McCarey, M. O. Cinneide, and N. Kushmerick. A recommender agent for software libraries: An evaluation of memory-based and model-based collaborative filtering. pages 154--162. IEEE Computer Society, 2006. Google ScholarDigital Library
- J. Nykaza, R. Messinger, F. Boehme, C. L. Norman, M. Mace, and M. Gordon. What programmers really want: results of a needs assessment for SDK documentation. In Proceedings of the 20th annual international conference on Computer documentation, pages 133--141, Toronto, Ontario, Canada, 2002. ACM. Google ScholarDigital Library
- J. Ossher, S. Bajracharya, and C. Lopes. SourcererDB: An aggregated repository of statically analyzed and cross-linked open source java projects. In MSR 2009: 6th IEEE Working Conference on Mining Software Repositories, 2009. Google ScholarDigital Library
- D. F. Redmiles. Reducing the variability of programmers' performance through explained examples. In Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in computing systems, pages 67--73, Amsterdam, The Netherlands, 1993. ACM. Google ScholarDigital Library
- M. P. Robillard. What Makes APIs Hard to Learn? Answers from Developers. IEEE Softw., 26(6):27--34, 2009. Google ScholarDigital Library
- M. B. Rosson and J. M. Carroll. The reuse of uses in smalltalk programming. ACM Trans. Comput.-Hum. Interact., 3(3):219--253, 1996. Google ScholarDigital Library
- N. Sahavechaphan and K. Claypool. Xsnippet: mining for sample code. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, pages 413--430, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- G. Salton. The state of retrieval system evaluation. Inf. Process. Manage., 28(4):441--449, 1992. Google ScholarDigital Library
- F. Shull, F. Lanubile, and V. R. Basili. Investigating Reading Techniques for Object-Oriented Framework Learning. IEEE Trans. Softw. Eng., 26(11):1101--1118, 2000. Google ScholarDigital Library
- J. Stylos and B. A. Myers. Mica: A Web-Search tool for finding API components and examples. In Proceedings of the Visual Languages and Human-Centric Computing, pages 195--202. IEEE Computer Society, 2006. Google ScholarDigital Library
- J. Stylos, B. A. Myers, and Z. Yang. Jadeite: improving API documentation using usage information. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, pages 4429--4434, Boston, MA, USA, 2009. ACM. Google ScholarDigital Library
- S. Thummalapenta and T. Xie. Parseweb: a programmer assistant for reusing open source code on the web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pages 204--213, Atlanta, Georgia, USA, 2007. ACM. Google ScholarDigital Library
- S. Thummalapenta and T. Xie. SpotWeb: detecting framework hotspots and coldspots via mining open source code on the web. In Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM International Conference on, pages 327--336, 2008. Google ScholarDigital Library
- M. Umarji, S. Sim, and C. Lopes. Archetypal Internet-Scale source code searching. In Open Source Development, Communities and Quality, volume 275/2008 of IFIP International Federation for Information Processing, pages 257--263. Springer Boston, 2008.Google Scholar
- Web site for Google Code Search. http://www.google.com/codesearch, 2010.Google Scholar
- Web site for Koders. http://www.koders.com, 2010.Google Scholar
- Web site for Krugle. http://www.krugle.com, 2010.Google Scholar
- P. Willett, J. M. Barnard, and G. M. Downs. Chemical Similarity Searching. Journal of Chemical Information and Computer Sciences, 38(6):983--996, Nov. 1998.Google ScholarCross Ref
- Y. Ye and G. Fischer. Reuse-conducive development environments. Automated Software Engg., 12:199--235, 2005. Google ScholarDigital Library
- Y. Ye, G. Fischer, and B. Reeves. Integrating active information delivery and reuse repository systems. pages 60--68, New York, NY, USA, 2000. ACM Press. Google ScholarDigital Library
- Y. Ye, Y. Yamamoto, K. Nakakoji, Y. Nishinaka, and M. Asada. Searching the library and asking the peers: learning to use java APIs on demand. In Proceedings of the 5th international symposium on Principles and practice of programming in Java, pages 41--50, Lisboa, Portugal, 2007. ACM. Google ScholarDigital Library
Index Terms
- Leveraging usage similarity for effective retrieval of examples in code repositories
Recommendations
Searching API usage examples in code repositories with sourcerer API search
SUITE '10: Proceedings of 2010 ICSE Workshop on Search-driven Development: Users, Infrastructure, Tools and EvaluationWe present Sourcerer API Search (SAS), a search interface to find API usage examples in large code repositories. SAS facilitates finding API usage examples by providing three unique features: (i) code snippets view for each result that shows the ...
Eclipse API usage: the good and the bad
Today, when constructing software systems, many developers build their systems on top of frameworks. Eclipse is such a framework that has been in existence for over a decade. Like many other evolving software systems, the Eclipse platform has both ...
Improving API Usage through Automatic Detection of Redundant Code
ASE '09: Proceedings of the 24th IEEE/ACM International Conference on Automated Software EngineeringSoftware projects often rely on third-party libraries made accessible through Application Programming Interfaces (APIs). We have observed many cases where APIs are used in ways that are not the most effective. We developed a technique and tool support ...
Comments