Skip to main content
Erschienen in: Empirical Software Engineering 6/2018

05.03.2018

APIReal: an API recognition and linking approach for online developer forums

verfasst von: Deheng Ye, Lingfeng Bao, Zhenchang Xing, Shang-Wei Lin

Erschienen in: Empirical Software Engineering | Ausgabe 6/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

When discussing programming issues on social platforms (e.g, Stack Overflow, Twitter), developers often mention APIs in natural language texts. Extracting API mentions from natural language texts serves as the prerequisite to effective indexing and searching for API-related information in software engineering social content. The task of extracting API mentions from natural language texts involves two steps: 1) distinguishing API mentions from other English words (i.e., API recognition), 2) disambiguating a recognized API mention to its unique fully qualified name (i.e., API linking). Software engineering social content lacks consistent API mentions and sentence writing format. As a result, API recognition and linking have to deal with the inherent ambiguity of API mentions in informal text, for example, due to the ambiguity between the API sense of a common word and the normal sense of the word (e.g., append, apply and merge), the simple name of an API can map to several APIs of the same library or of different libraries, or different writing forms of an API should be linked to the same API. In this paper, we propose a semi-supervised machine learning approach that exploits name synonyms and rich semantic context of API mentions for API recognition in informal text. Based on the results of our API recognition approach, we further propose an API linking approach leveraging a set of domain-specific heuristics, including mention-mention similarity, scope filtering, and mention-entry similarity, to determine which API in the knowledge base a recognized API actually refers to. To evaluate our API recognition approach, we use 1205 API mentions of three libraries (Pandas, Numpy, and Matplotlib) from Stack Overflow text. We also evaluate our API linking approach with 120 recognized API mentions of these three libraries.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdalkareem R, Shihab E, Rilling J (2017) On code reuse from stackoverflow: an exploratory study on android apps. Inf Softw Technol 88:148–158CrossRef Abdalkareem R, Shihab E, Rilling J (2017) On code reuse from stackoverflow: an exploratory study on android apps. Inf Softw Technol 88:148–158CrossRef
Zurück zum Zitat Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng (TSE) 28(10):970–983CrossRef Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng (TSE) 28(10):970–983CrossRef
Zurück zum Zitat Bacchelli A, D’Ambros M, Lanza M, Robbes R (2009) Benchmarking lightweight techniques to link e-mails and source code. In: Proceedings of the 16th working conference on reverse engineering (WCRE). IEEE, Piscataway, pp 205–214 Bacchelli A, D’Ambros M, Lanza M, Robbes R (2009) Benchmarking lightweight techniques to link e-mails and source code. In: Proceedings of the 16th working conference on reverse engineering (WCRE). IEEE, Piscataway, pp 205–214
Zurück zum Zitat Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering (ICSE). ACM, New York, pp 375–384 Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering (ICSE). ACM, New York, pp 375–384
Zurück zum Zitat Bacchelli A, Cleve A, Lanza M, Mocci A (2011) Extracting structured data from natural language documents with island parsing. In: Proceedings of the 26th IEEE/ACM international conference on automated software engineering (ASE). IEEE, Piscataway, pp 476–479 Bacchelli A, Cleve A, Lanza M, Mocci A (2011) Extracting structured data from natural language documents with island parsing. In: Proceedings of the 26th IEEE/ACM international conference on automated software engineering (ASE). IEEE, Piscataway, pp 476–479
Zurück zum Zitat Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479 Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1992) Class-based n-gram models of natural language. Comput Linguist 18(4):467–479
Zurück zum Zitat Chen F, Kim S (2015) Crowd debugging. In: Proceedings of the 10th joint meeting on foundations of software engineering (FSE). ACM, New York, pp 320–332 Chen F, Kim S (2015) Crowd debugging. In: Proceedings of the 10th joint meeting on foundations of software engineering (FSE). ACM, New York, pp 320–332
Zurück zum Zitat Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation. In: EMNLP, Citeseer, pp 1025–1035 Chen X, Liu Z, Sun M (2014) A unified model for word sense representation and disambiguation. In: EMNLP, Citeseer, pp 1025–1035
Zurück zum Zitat Dagenais B, Robillard MP (2012) Recovering traceability links between an api and its learning resources. In: Proceedings of the 34th international conference on software engineering (ICSE). IEEE, Piscataway, pp 47–57 Dagenais B, Robillard MP (2012) Recovering traceability links between an api and its learning resources. In: Proceedings of the 34th international conference on software engineering (ICSE). IEEE, Piscataway, pp 47–57
Zurück zum Zitat Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378CrossRef Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378CrossRef
Zurück zum Zitat Gao Q, Zhang H, Wang J, Xiong Y, Zhang L, Mei H (2015) Fixing recurring crash bugs via analyzing q&a sites (t). In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, Piscataway, pp 307–318 Gao Q, Zhang H, Wang J, Xiong Y, Zhang L, Mei H (2015) Fixing recurring crash bugs via analyzing q&a sites (t). In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE). IEEE, Piscataway, pp 307–318
Zurück zum Zitat Guo J, Che W, Wang H, Liu T (2014) Revisiting embedding features for simple semi-supervised learning. In: EMNLP, pp 110–120 Guo J, Che W, Wang H, Liu T (2014) Revisiting embedding features for simple semi-supervised learning. In: EMNLP, pp 110–120
Zurück zum Zitat Ji Z, Sun A, Cong G, Han J (2016) Joint recognition and linking of fine-grained locations from tweets. In: Proceedings of the 25th international conference on world wide web (WWW), International World Wide Web Conferences Steering Committee, pp 1271–1281 Ji Z, Sun A, Cong G, Han J (2016) Joint recognition and linking of fine-grained locations from tweets. In: Proceedings of the 25th international conference on world wide web (WWW), International World Wide Web Conferences Steering Committee, pp 1271–1281
Zurück zum Zitat Jiang HY, Nguyen TN, Chen X, Jaygarl H, Chang CK (2008) Incremental latent semantic indexing for automatic traceability link evolution management. In: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering (ASE), IEEE Computer Society, pp 59–68 Jiang HY, Nguyen TN, Chen X, Jaygarl H, Chang CK (2008) Incremental latent semantic indexing for automatic traceability link evolution management. In: Proceedings of the 23rd IEEE/ACM international conference on automated software engineering (ASE), IEEE Computer Society, pp 59–68
Zurück zum Zitat Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth international conference on machine learning, ICML ’01, pp 282–289 Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth international conference on machine learning, ICML ’01, pp 282–289
Zurück zum Zitat Li C, Sun A (2014) Fine-grained location extraction from tweets with temporal awareness. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, New York, pp 43–52 Li C, Sun A (2014) Fine-grained location extraction from tweets with temporal awareness. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval. ACM, New York, pp 43–52
Zurück zum Zitat Liang P (2005) Semi-supervised learning for natural language. PhD thesis, Citeseer Liang P (2005) Semi-supervised learning for natural language. PhD thesis, Citeseer
Zurück zum Zitat Liao W, Veeramachaneni S (2009) A simple semi-supervised algorithm for named entity recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, Association for Computational Linguistics, pp 58–65 Liao W, Veeramachaneni S (2009) A simple semi-supervised algorithm for named entity recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, Association for Computational Linguistics, pp 58–65
Zurück zum Zitat Linares-Vásquez M, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D (2014) How do api changes trigger stack overflow discussions? a study on the android sdk. In: Proceedings of the 22nd international conference on program comprehension (ICPC). ACM, New York, pp 83–94 Linares-Vásquez M, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D (2014) How do api changes trigger stack overflow discussions? a study on the android sdk. In: Proceedings of the 22nd international conference on program comprehension (ICPC). ACM, New York, pp 83–94
Zurück zum Zitat Liu X, Zhang S, Wei F, Zhou M (2011) Recognizing named entities in tweets. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-Volume 1, Association for Computational Linguistics, pp 359–367 Liu X, Zhang S, Wei F, Zhou M (2011) Recognizing named entities in tweets. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-Volume 1, Association for Computational Linguistics, pp 359–367
Zurück zum Zitat Liu X, Li Y, Wu H, Zhou M, Wei F, Lu Y (2013) Entity linking for tweets. In: ACL (1), pp 1304–1311 Liu X, Li Y, Wu H, Zhou M, Wei F, Lu Y (2013) Entity linking for tweets. In: ACL (1), pp 1304–1311
Zurück zum Zitat Marcus A, Maletic J et al. (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th international conference on software engineering (ICSE). IEEE, Piscataway, pp 125–135 Marcus A, Maletic J et al. (2003) Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th international conference on software engineering (ICSE). IEEE, Piscataway, pp 125–135
Zurück zum Zitat Mihalcea R (2004) Co-training and self-training for word sense disambiguation. In: CoNLL, pp 33– 40 Mihalcea R (2004) Co-training and self-training for word sense disambiguation. In: CoNLL, pp 33– 40
Zurück zum Zitat Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, New York, pp 233–242 Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, New York, pp 233–242
Zurück zum Zitat Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Zurück zum Zitat Milne D, Witte IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on Information and knowledge management. ACM , New York, pp 509–518 Milne D, Witte IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on Information and knowledge management. ACM , New York, pp 509–518
Zurück zum Zitat Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings of eighth working conference on reverse engineering (WCRE). IEEE, Piscataway, pp 13–22 Moonen L (2001) Generating robust parsers using island grammars. In: Proceedings of eighth working conference on reverse engineering (WCRE). IEEE, Piscataway, pp 13–22
Zurück zum Zitat Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv (CSUR) 41(2):10CrossRef Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv (CSUR) 41(2):10CrossRef
Zurück zum Zitat Parnin C, Treude C, Grammel L, Storey MA (2012) Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow. Georgia Institute of Technology, Tech Rep Parnin C, Treude C, Grammel L, Storey MA (2012) Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow. Georgia Institute of Technology, Tech Rep
Zurück zum Zitat Rahman MM, Roy CK, Lo D (2016) Rack: Automatic api recommendation using crowdsourced knowledge. In: SANER Rahman MM, Roy CK, Lo D (2016) Rack: Automatic api recommendation using crowdsourced knowledge. In: SANER
Zurück zum Zitat Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings of international conference on software engineering (ICSE). IEEE Press, Piscataway, pp 832–841 Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings of international conference on software engineering (ICSE). IEEE Press, Piscataway, pp 832–841
Zurück zum Zitat Shen W, Wang J, Luo P, Wang M (2012) Liege:: Link entities in web lists with knowledge base. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’12, pp 1424–1432 Shen W, Wang J, Luo P, Wang M (2012) Liege:: Link entities in web lists with knowledge base. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’12, pp 1424–1432
Zurück zum Zitat Subramanian S, Inozemtseva L, Holmes R (2014) Live api documentation. In: Proceedings of the 36th international conference on software engineering (ICSE). ACM, New York, pp 643–652 Subramanian S, Inozemtseva L, Holmes R (2014) Live api documentation. In: Proceedings of the 36th international conference on software engineering (ICSE). ACM, New York, pp 643–652
Zurück zum Zitat Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 384–394 Turian J, Ratinov L, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Association for Computational Linguistics, pp 384–394
Zurück zum Zitat Wang M, Manning CD (2013) Effect of non-linear deep architecture in sequence labeling. In: IJCNLP, pp 1285–1291 Wang M, Manning CD (2013) Effect of non-linear deep architecture in sequence labeling. In: IJCNLP, pp 1285–1291
Zurück zum Zitat Wu D, Lee WS, Ye N, Chieu HL (2009) Domain adaptive bootstrapping for named entity recognition. In: Proceedings of the 2009 conference on empirical methods in natural language processing: Volume 3-Volume 3, Association for Computational Linguistics, pp 1523–1532 Wu D, Lee WS, Ye N, Chieu HL (2009) Domain adaptive bootstrapping for named entity recognition. In: Proceedings of the 2009 conference on empirical methods in natural language processing: Volume 3-Volume 3, Association for Computational Linguistics, pp 1523–1532
Zurück zum Zitat Wu N, Hou D, Liu Q (2016) Linking usage tutorials into api client code pp 22–28 Wu N, Hou D, Liu Q (2016) Linking usage tutorials into api client code pp 22–28
Zurück zum Zitat Yao Y, Sun A (2015) Mobile phone name extraction from internet forums: a semi-supervised approach. World Wide Web pp 1–23 Yao Y, Sun A (2015) Mobile phone name extraction from internet forums: a semi-supervised approach. World Wide Web pp 1–23
Zurück zum Zitat Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting on association for computational linguistics, association for computational linguistics, pp 189–196 Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting on association for computational linguistics, association for computational linguistics, pp 189–196
Zurück zum Zitat Ye D, Xing Z, Foo CY, Ang ZQ, Li J, Kapre N (2016a) Software-specific named entity recognition in software engineering social content. In: Proceedings of the 23rd IEEE international conference on software analysis, evolution and reengineering (SANER) Ye D, Xing Z, Foo CY, Ang ZQ, Li J, Kapre N (2016a) Software-specific named entity recognition in software engineering social content. In: Proceedings of the 23rd IEEE international conference on software analysis, evolution and reengineering (SANER)
Zurück zum Zitat Ye D, Xing Z, Li J, Kapre N (2016b) Software-specific part-of-speech tagging: An experimental study on stack overflow. In: Proceedings of the 31st annual ACM symposium on applied computing, ACM, New York, SAC ’16, pp 1378–1385. https://doi.org/10.1145/2851613.2851772 Ye D, Xing Z, Li J, Kapre N (2016b) Software-specific part-of-speech tagging: An experimental study on stack overflow. In: Proceedings of the 31st annual ACM symposium on applied computing, ACM, New York, SAC ’16, pp 1378–1385. https://​doi.​org/​10.​1145/​2851613.​2851772
Zurück zum Zitat Yu M, Zhao T, Dong D, Tian H, Yu D (2013) Compound embedding features for semi-supervised learning. In: HLT-NAACL, pp 563–568 Yu M, Zhao T, Dong D, Tian H, Yu D (2013) Compound embedding features for semi-supervised learning. In: HLT-NAACL, pp 563–568
Zurück zum Zitat Zheng W, Zhang Q, Lyu M (2011) Cross-library api recommendation using web search engines. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, New York, pp 480–483 Zheng W, Zhang Q, Lyu M (2011) Cross-library api recommendation using web search engines. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering. ACM, New York, pp 480–483
Metadaten
Titel
APIReal: an API recognition and linking approach for online developer forums
verfasst von
Deheng Ye
Lingfeng Bao
Zhenchang Xing
Shang-Wei Lin
Publikationsdatum
05.03.2018
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 6/2018
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-018-9608-7

Weitere Artikel der Ausgabe 6/2018

Empirical Software Engineering 6/2018 Zur Ausgabe