Skip to main content

2019 | OriginalPaper | Buchkapitel

NERSE: Named Entity Recognition in Software Engineering as a Service

verfasst von : M. Veera Prathap Reddy, P. V. R. D. Prasad, Manjunath Chikkamath, Sarathchandra Mandadi

Erschienen in: Service Research and Innovation

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Named Entity Recognition (NER) is a computational linguistics task that seek to classify every word in a document as falling into different category. NER serves as an important component for many domain specific expert systems. Software engineering is one such domain where very minimum work has been done on identifying entities specific to domain. In this paper, we present NERSE, a tool that enables the user to identify software specific entities. It is developed with machine learning algorithms trained on software specific entity categories using Conditional Random Fields (CRF) and Bidirectional Long Short-Term Memory - Conditional Random Fields (BiLSTM-CRF). NERSE identifies 22 different categories of entities specific to software engineering domain with 0.85% and 0.95% for CRF (source code for Named Entity Recognition Model CRF is available at https://​github.​com/​prathapreddymv/​NERSE) and BiLSTM-CRF (source code for Named Entity Recognition Model BiLSTM-CRF is available at https://​github.​com/​prathapreddymv/​NERSE) models respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Rizzo, G., Troncy, R.: NERD: a framework for unifying named entity recognition and disambiguation extraction tools. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 73–76 (2012) Rizzo, G., Troncy, R.: NERD: a framework for unifying named entity recognition and disambiguation extraction tools. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 73–76 (2012)
2.
Zurück zum Zitat Derczynski, L., et al.: Analysis of named entity recognition and linking for tweets. Proc. Inf. Process. Manag. 51, 32–49 (2015)CrossRef Derczynski, L., et al.: Analysis of named entity recognition and linking for tweets. Proc. Inf. Process. Manag. 51, 32–49 (2015)CrossRef
3.
Zurück zum Zitat Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., Hartmann, B.: Design lessons from the fastest Q&A site in the west. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2857–2866. ACM (2011) Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., Hartmann, B.: Design lessons from the fastest Q&A site in the west. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2857–2866. ACM (2011)
4.
Zurück zum Zitat Gantz, J., Reinsel, D.: The Digital Universe Decade - Are You Ready?. Sponsored by EMC Corporation May 2010 Gantz, J., Reinsel, D.: The Digital Universe Decade - Are You Ready?. Sponsored by EMC Corporation May 2010
5.
Zurück zum Zitat Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., Gómez-Berbís, J.M.: Named entity recognition: fallacies, challenges and opportunities. Proc. Comput. Stand. Interfaces 35, 482–489 (2013)CrossRef Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., Gómez-Berbís, J.M.: Named entity recognition: fallacies, challenges and opportunities. Proc. Comput. Stand. Interfaces 35, 482–489 (2013)CrossRef
6.
Zurück zum Zitat Ye, D., Xing, Z., Foo, C.Y., Ang, Z.Q., Li, J., Kapre, N.: Software-specific named entity recognition in software engineering social content. In: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Suita, pp. 90–101 (2016) Ye, D., Xing, Z., Foo, C.Y., Ang, Z.Q., Li, J., Kapre, N.: Software-specific named entity recognition in software engineering social content. In: IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Suita, pp. 90–101 (2016)
7.
Zurück zum Zitat Meij, E., Balog, K., Odijk, D.: Entity linking and retrieval for semantic search. In: WSDM, pp. 683–684 (2014) Meij, E., Balog, K., Odijk, D.: Entity linking and retrieval for semantic search. In: WSDM, pp. 683–684 (2014)
8.
Zurück zum Zitat Pantel, P., Fuxman, A.: Jigs and Lures: associating web queries with structured entities. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 83–92 (2011) Pantel, P., Fuxman, A.: Jigs and Lures: associating web queries with structured entities. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 83–92 (2011)
9.
Zurück zum Zitat Surabhi, M.C.: Natural language processing future. In: Proceedings of International Conference on Optical Imaging Sensor and Security, Coimbatore, TamilNadu, India, 2–3 July 2013 Surabhi, M.C.: Natural language processing future. In: Proceedings of International Conference on Optical Imaging Sensor and Security, Coimbatore, TamilNadu, India, 2–3 July 2013
10.
Zurück zum Zitat Kaur, N., Pushe, V., Kaur, R.: Natural language processing interface for synonym. Proc. Int. J. Comput. Sci. Mobile Comput. 3(7), 638–642 (2014) Kaur, N., Pushe, V., Kaur, R.: Natural language processing interface for synonym. Proc. Int. J. Comput. Sci. Mobile Comput. 3(7), 638–642 (2014)
11.
Zurück zum Zitat Adak, C., Chaudhuri, B.B., Blumenstein, M.: Named entity recognition from unstructured handwritten document images. In: Proceedings of IEEE 12th IAPR Workshop on Document Analysis Systems (2016) Adak, C., Chaudhuri, B.B., Blumenstein, M.: Named entity recognition from unstructured handwritten document images. In: Proceedings of IEEE 12th IAPR Workshop on Document Analysis Systems (2016)
12.
Zurück zum Zitat Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceeding JNLPBA 2004 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, Geneva, Switzerland, pp. 104–107, 28–29 August 2004 Settles, B.: Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceeding JNLPBA 2004 Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, Geneva, Switzerland, pp. 104–107, 28–29 August 2004
13.
Zurück zum Zitat Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING, vol. 96, pp. 466–471 (1996) Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING, vol. 96, pp. 466–471 (1996)
14.
Zurück zum Zitat Rodrigo, Á., Pérez-Iglesias, J., Peñas, A., Garrido, G., Araujo, L.: Answering questions about European legislation. Expert Syst. Appl. 40, 5811–5816 (2013)CrossRef Rodrigo, Á., Pérez-Iglesias, J., Peñas, A., Garrido, G., Araujo, L.: Answering questions about European legislation. Expert Syst. Appl. 40, 5811–5816 (2013)CrossRef
15.
Zurück zum Zitat Chen, Y., Zong, C., Su, K.Y.: A joint model to identify and align bilingual named entities. Comput. Linguist. 39, 229–266 (2013) CrossRef Chen, Y., Zong, C., Su, K.Y.: A joint model to identify and align bilingual named entities. Comput. Linguist. 39, 229–266 (2013) CrossRef
16.
Zurück zum Zitat Jung, J.J.: Online named entity recognition method for microtexts in social networking services: a case study of Twitter. Expert Syst. Appl. 39, 8066–8070 (2012)CrossRef Jung, J.J.: Online named entity recognition method for microtexts in social networking services: a case study of Twitter. Expert Syst. Appl. 39, 8066–8070 (2012)CrossRef
17.
Zurück zum Zitat Habernal, I., Konopík, M.: SWSNL: semantic web search using natural language. Expert Syst. Appl. 40, 3649–3664 (2013)CrossRef Habernal, I., Konopík, M.: SWSNL: semantic web search using natural language. Expert Syst. Appl. 40, 3649–3664 (2013)CrossRef
18.
Zurück zum Zitat Baralis, E., Cagliero, L., Jabeen, S., Fiori, A., Shah, S.: Multi-document summarization based on the Yago ontology. Expert Syst. Appl. 40, 6976–6984 (2013)CrossRef Baralis, E., Cagliero, L., Jabeen, S., Fiori, A., Shah, S.: Multi-document summarization based on the Yago ontology. Expert Syst. Appl. 40, 6976–6984 (2013)CrossRef
19.
Zurück zum Zitat Glavas, G., Snajder, J.: Event graphs for information retrieval and multidocument summarization. Expert Syst. Appl. 41, 6904–6916 (2014)CrossRef Glavas, G., Snajder, J.: Event graphs for information retrieval and multidocument summarization. Expert Syst. Appl. 41, 6904–6916 (2014)CrossRef
20.
Zurück zum Zitat Kabadjov, M., Steinberger, J., Steinberger, R.: Multilingual statistical news summarization. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing, pp. 229–252. Springer, Berlin (2013)CrossRef Kabadjov, M., Steinberger, J., Steinberger, R.: Multilingual statistical news summarization. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing, pp. 229–252. Springer, Berlin (2013)CrossRef
21.
Zurück zum Zitat Etzioni, O., et al.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165(1), 91–134 (2005)MathSciNetCrossRef Etzioni, O., et al.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165(1), 91–134 (2005)MathSciNetCrossRef
23.
Zurück zum Zitat Cao, T.H., Tang, T.M., Chau, C.K.: Text clustering with named entities: a model, experimentation and realization. In: Holmes, D.E., Jain, L.C. (eds.) Data mining: Foundations and Intelligent Paradigms, pp. 267–287. Springer, Berlin (2012)CrossRef Cao, T.H., Tang, T.M., Chau, C.K.: Text clustering with named entities: a model, experimentation and realization. In: Holmes, D.E., Jain, L.C. (eds.) Data mining: Foundations and Intelligent Paradigms, pp. 267–287. Springer, Berlin (2012)CrossRef
24.
Zurück zum Zitat Wang, X., Jiang, X., Liu, M., He, T., Hu, X.: Bacterial named entity recognition based on dictionary and conditional random field. In: Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2017) Wang, X., Jiang, X., Liu, M., He, T., Hu, X.: Bacterial named entity recognition based on dictionary and conditional random field. In: Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2017)
25.
Zurück zum Zitat Cruzes, D., Mendonça, M., Basili, V., Shull, F., Jino, M.: Automated information extraction from empirical software engineering literature: is that possible? In: Proceeding of IEEE First International Symposium on Empirical Software Engineering and Measurement (2007) Cruzes, D., Mendonça, M., Basili, V., Shull, F., Jino, M.: Automated information extraction from empirical software engineering literature: is that possible? In: Proceeding of IEEE First International Symposium on Empirical Software Engineering and Measurement (2007)
26.
Zurück zum Zitat Das, P., Das, A.K.: A two-stage approach of named-entity recognition for crime analysis. In: Proceeding of IEEE - 40222 8th ICCCNT 2017, 3–5 July 2017 Das, P., Das, A.K.: A two-stage approach of named-entity recognition for crime analysis. In: Proceeding of IEEE - 40222 8th ICCCNT 2017, 3–5 July 2017
27.
Zurück zum Zitat Lin, B.Y., Xu, F., Luo, Z., Zhu, K.: Multi channel BiLSTM CRF model for emerging named entity recognition in social media. In: Proceedings of the 3rd Workshop on Noisy User Generated Text, Copenhagen, Denmark, 7 September, pp. 160–165 (2017) Lin, B.Y., Xu, F., Luo, Z., Zhu, K.: Multi channel BiLSTM CRF model for emerging named entity recognition in social media. In: Proceedings of the 3rd Workshop on Noisy User Generated Text, Copenhagen, Denmark, 7 September, pp. 160–165 (2017)
28.
Zurück zum Zitat Seshathriaathithyan, S., Sriram, M.V., Prasanna, S., Venkatesan, R.: Affective—hierarchical classification of text—an approach using NLP toolkit. In: Proceedings of 2016 International Conference on Circuit, Power and Computing Technologies (2016) Seshathriaathithyan, S., Sriram, M.V., Prasanna, S., Venkatesan, R.: Affective—hierarchical classification of text—an approach using NLP toolkit. In: Proceedings of 2016 International Conference on Circuit, Power and Computing Technologies (2016)
29.
Zurück zum Zitat Barcala, F.M., Vilares, J., Alonso, M.A., Grana, J., Vilares, M.: Tokenization and proper noun recognition for information retrieval. In: Proceedings of the 13th International Workshop on Database and Expert Systems Applications (DEXA 2002) (2002) Barcala, F.M., Vilares, J., Alonso, M.A., Grana, J., Vilares, M.: Tokenization and proper noun recognition for information retrieval. In: Proceedings of the 13th International Workshop on Database and Expert Systems Applications (DEXA 2002) (2002)
30.
Zurück zum Zitat Kanya, N., Ravi, T.: Modelings and techniques in named entity recognition an information extraction task. In: Proceeding of Third International Conference on Sustainable Energy and Intelligent System, 27–29 December (2012) Kanya, N., Ravi, T.: Modelings and techniques in named entity recognition an information extraction task. In: Proceeding of Third International Conference on Sustainable Energy and Intelligent System, 27–29 December (2012)
31.
Zurück zum Zitat Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proc. ICML (2001) Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proc. ICML (2001)
32.
Zurück zum Zitat Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Sixth Workshop on Computational Language Learning CoNLL (2002) Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Sixth Workshop on Computational Language Learning CoNLL (2002)
33.
Zurück zum Zitat Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology, NAACL (2003) Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology, NAACL (2003)
Metadaten
Titel
NERSE: Named Entity Recognition in Software Engineering as a Service
verfasst von
M. Veera Prathap Reddy
P. V. R. D. Prasad
Manjunath Chikkamath
Sarathchandra Mandadi
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-32242-7_6

Premium Partner