ABSTRACT
To improve software productivity, when constructing new software systems, developers often reuse existing class libraries or frameworks by invoking their APIs. Those APIs, however, are often complex and not well documented, posing barriers for developers to use them in new client code. To get familiar with how those APIs are used, developers may search the Web using a general search engine to find relevant documents or code examples. Developers can also use a source code search engine to search open source repositories for source files that use the same APIs. Nevertheless, the number of returned source files is often large. It is difficult for developers to learn API usages from a large number of returned results. In order to help developers understand API usages and write API client code more effectively, we have developed an API usage mining framework and its supporting tool called MAPO (for <u>M</u>ining <u>AP</u>I usages from <u>O</u>pen source repositories). Given a query that describes a method, class, or package for an API, MAPO leverages the existing source code search engines to gather relevant source files and conducts data mining. The mining leads to a short list of frequent API usages for developers to inspect. MAPO currently consists of five components: a code search engine, a source code analyzer, a sequence preprocessor, a frequent sequence miner, and a frequent sequence post processor. We have examined the effectiveness of MAPO using a set of various queries. The preliminary results show that the framework is practical for providing informative and succinct API usage patterns.
- CodeBase, 2005. http://www.codase.com/.Google Scholar
- DocJar, 2005. http://www.docjar.com/.Google Scholar
- The Koders source code search engine, 2005. http://www.koders.com.Google Scholar
- PMD, 2005. http://pmd.sourceforge.net/.Google Scholar
- SPARS-J, 2005. http://demo.spars.info/.Google Scholar
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487--499, Sept. 1994. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. 1995 Int. Conf. Data Engineering, pages 3--14, Taipei, Taiwan, Mar. 1995. Google ScholarDigital Library
- G. Ammons, R. Bodik, and J. R. Larus. Mining specifications. In Proc. 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 4--16, 2002. Google ScholarDigital Library
- M. Dahm and J. van Zyl. Byte Code Engineering Library, April 2003. http://jakarta.apache.org/bcel/.Google Scholar
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data, pages 1--12, May 2000. Google ScholarDigital Library
- D. S. Hirschberg. Algorithms for the longest common subsequence problem. Journal of the ACM, 24:644--675, 1977. Google ScholarDigital Library
- R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In Proc. 27th International Conference on Software Engineering, pages 117--125, 2005. Google ScholarDigital Library
- K. Inoue, R. Yokomori, H. Fujiwara, K. Inoue, R. Yokomori, H. Fujiwara, T. Yamamoto, M. Matsushita, and S. Kusumoto. Ranking significance of software components based on use relations. IEEE Transactions on Software Engineering, 31(3):213--225, March 2005. Google ScholarDigital Library
- Z. Li and Y. Zhou. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In Proc. ESEC/FSE, pages 306--315, 2005. Google ScholarDigital Library
- B. Livshits and T. Zimmermann. DynaMine: finding common error patterns by mining software revision histories. In Proc. ESEC/FSE, pages 296--305, 2005. Google ScholarDigital Library
- L. Mariani and M. Pezzè. Behavior capture and test: Automated analysis of component integration. In Proc. 10th International Conference on Engineering of Complex Computer Systems, pages 292--301, June 2005. Google ScholarDigital Library
- A. Michail. Data mining library reuse patterns using generalized association rules. In Proc. 22nd International Conference on Software Engineering, pages 167--176, 2000. Google ScholarDigital Library
- A. V. Raman and J. D. Patrick. The sk-strings method for inferring pfsa. In Proc. Workshop on Automata Induction, Grammatical Inference and Language Acquisition, 1997.Google Scholar
- J. Wang and J. Han. BIDE: Efficient mining of frequent closed sequences. In Proc. 20th International Conference on Data Engineering, pages 79--90, 2004. Google ScholarDigital Library
- C. C. Williams and J. K. Hollingsworth. Recovering system specific rules from software repositories. In Proc. 2005 International Workshop on Mining Software Repositories, pages 1--5, 2005. Google ScholarDigital Library
Index Terms
- MAPO: mining API usages from open source repositories
Recommendations
MAPO: Mining and Recommending API Usage Patterns
Genoa: Proceedings of the 23rd European Conference on ECOOP 2009 --- Object-Oriented ProgrammingTo improve software productivity, when constructing new software systems, programmers often reuse existing libraries or frameworks by invoking methods provided in their APIs. Those API methods, however, are often complex and not well documented. To get ...
Design patterns for annotation-based APIs
SugarLoafPLoP '16: Proceedings of the 11th Latin-American Conference on Pattern Languages of ProgrammingWith the introduction of code annotations in popular languages like Java and C#, several frameworks and platforms adopted a metadata-based API (Application Programming Interface). By using this approach, instead of extending classes, implementing ...
The SENSEI generic in situ interface
ISAV '16: Proceedings of the 2nd Workshop on In Situ Infrastructures for Enabling Extreme-scale Analysis and VisualizationThe SENSEI generic in situ interface is an API that promotes code portability and reusability. From the simulation view, a developer can instrument their code with the SENSEI API and then make make use of any number of in situ infrastructures. From the ...
Comments