Skip to main content

2018 | OriginalPaper | Buchkapitel

Is a Common Phrase an Entity Mention or Not? Dual Representations for Domain-Specific Named Entity Recognition

verfasst von : Jiangtao Zhang, Juanzi Li, Xiao-Li Li, Yixin Cao, Lei Hou, Shuai Wang

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Named Entity Recognition (NER) for specific domains is critical for building and managing domain-specific knowledge bases, but conventional NER methods cannot be applied to specific domains effectively. We found that one of reasons is the problem of common-phrase-like entity mention prevalent in many domains. That is, many common phrases frequently occurring in general corpora may or may not be treated as named entities in specific domains. Therefore, determining whether a common phrase is an entity mention or not is a challenge. To address this issue, we present a novel BLSTM based NER model tailored for specific domains by learning dual representations for each word. It learns not only general domain knowledge derived from an external large scale general corpus via a word embedding model, but also the specific domain knowledge by training a stacked deep neural network (SDNN) integrating the results of a low-cost pre-entity-linking process. Extensive experiments on a real-world dataset of movie comments demonstrate the superiority of our model over existing state-of-the-art methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)MathSciNetMATH Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)MathSciNetMATH
2.
Zurück zum Zitat Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2787–2795 (2013) Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, vol. 26, pp. 2787–2795 (2013)
3.
Zurück zum Zitat Cao, Y., Huang, L., Ji, H., Chen, X., Li, J.: Bridge text and knowledge by learning multi-prototype entity mention embedding. In: ACL (2017) Cao, Y., Huang, L., Ji, H., Chen, X., Li, J.: Bridge text and knowledge by learning multi-prototype entity mention embedding. In: ACL (2017)
4.
Zurück zum Zitat Cao, Y., Li, J., Guo, X., Bai, S., Ji, H., Tang, J.: Name list only? Target entity disambiguation in short texts. In: EMNLP (2015) Cao, Y., Li, J., Guo, X., Bai, S., Ji, H., Tang, J.: Name list only? Target entity disambiguation in short texts. In: EMNLP (2015)
5.
Zurück zum Zitat Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. TACL 4, 357–370 (2016) Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. TACL 4, 357–370 (2016)
6.
Zurück zum Zitat Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)MATH Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)MATH
7.
Zurück zum Zitat Durrett, G., Klein, D.: A joint model for entity analysis: coreference, typing, and linking. In: TACL (2014) Durrett, G., Klein, D.: A joint model for entity analysis: coreference, typing, and linking. In: TACL (2014)
8.
Zurück zum Zitat Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 168–171 (2003) Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 168–171 (2003)
9.
Zurück zum Zitat Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Information extraction: identifying protein names from biological papers. In: PSB, pp. 707–718 (1998) Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Information extraction: identifying protein names from biological papers. In: PSB, pp. 707–718 (1998)
10.
Zurück zum Zitat Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition and classification in biological science journal articles. In: Proceedings of the Computational Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44 (2000) Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition and classification in biological science journal articles. In: Proceedings of the Computational Terminology for Medical and Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44 (2000)
11.
Zurück zum Zitat Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)CrossRef Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)CrossRef
12.
Zurück zum Zitat Gu, B.: Recognizing nested named entities in GENIA corpus. In: Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology at HLT-NAACL 2006, pp. 112–113 (2006) Gu, B.: Recognizing nested named entities in GENIA corpus. In: Proceedings of the BioNLP Workshop on Linking Natural Language Processing and Biology at HLT-NAACL 2006, pp. 112–113 (2006)
13.
Zurück zum Zitat Henriksson, A., Dalianis, H., Kowalski, S.: Generating features for named entity recognition by learning prototypes in semantic space: the case of de-identifying health records. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 450–457 (2014) Henriksson, A., Dalianis, H., Kowalski, S.: Generating features for named entity recognition by learning prototypes in semantic space: the case of de-identifying health records. In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 450–457 (2014)
14.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
15.
Zurück zum Zitat Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: NAACL-Short 2006, pp. 57–60 (2006) Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: NAACL-Short 2006, pp. 57–60 (2006)
16.
Zurück zum Zitat Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015) Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR abs/1508.01991 (2015)
17.
Zurück zum Zitat Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2181–2187. AAAI (2015) Lin, Y., Liu, Z., Sun, M., Liu, Y., Zhu, X.: Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2181–2187. AAAI (2015)
18.
Zurück zum Zitat Luo, G., Huang, X., Nie, Z., Lin, C.-Y.: Joint named entity recognition and disambiguation. In: EMNLP, pp. 879–888 (2015) Luo, G., Huang, X., Nie, Z., Lin, C.-Y.: Joint named entity recognition and disambiguation. In: EMNLP, pp. 879–888 (2015)
19.
Zurück zum Zitat Ma, X., Hovy, E.H.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. CoRR abs/1603.01354 (2016) Ma, X., Hovy, E.H.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. CoRR abs/1603.01354 (2016)
20.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: EMNLP, vol. 14, pp. 1532–1543 (2014)
21.
Zurück zum Zitat Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL (2009) Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: CoNLL (2009)
22.
Zurück zum Zitat Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: HLT-NAACL 2003, vol. 4, pp. 142–147 (2003) Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: HLT-NAACL 2003, vol. 4, pp. 142–147 (2003)
23.
Zurück zum Zitat Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. Trans. Knowl. Data Eng. 27, 443–460 (2015)CrossRef Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. Trans. Knowl. Data Eng. 27, 443–460 (2015)CrossRef
24.
Zurück zum Zitat Tomori, S., Ninomiya, T., Mori, S.: Domain specific named entity recognition referring to the real world by deep neural networks. In: ACL, vol. 2, Short Papers (2016) Tomori, S., Ninomiya, T., Mori, S.: Domain specific named entity recognition referring to the real world by deep neural networks. In: ACL, vol. 2, Short Papers (2016)
25.
Zurück zum Zitat Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theor. 13, 260–269 (2006)CrossRef Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theor. 13, 260–269 (2006)CrossRef
26.
Zurück zum Zitat Wang, J., Zhao, W.X., Wei, H., Yan, H., Li, X.: Mining new business opportunities: identifying trend related products by leveraging commercial intents from microblogs. In: EMNLP, pp. 1337–1347 (2013) Wang, J., Zhao, W.X., Wei, H., Yan, H., Li, X.: Mining new business opportunities: identifying trend related products by leveraging commercial intents from microblogs. In: EMNLP, pp. 1337–1347 (2013)
27.
Zurück zum Zitat Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: A unified tagging solution: bidirectional LSTM recurrent neural network with word embedding. CoRR abs/1511.00215 (2015) Wang, P., Qian, Y., Soong, F.K., He, L., Zhao, H.: A unified tagging solution: bidirectional LSTM recurrent neural network with word embedding. CoRR abs/1511.00215 (2015)
28.
Zurück zum Zitat Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI, pp. 1112–1119 (2014) Wang, Z., Zhang, J., Feng, J., Chen, Z.: Knowledge graph embedding by translating on hyperplanes. In: AAAI, pp. 1112–1119 (2014)
29.
Zurück zum Zitat Yang, Z., Lin, H., Li, Y.: Exploiting the contextual cues for bio-entity name recognition in biomedical literature. J. Biomed. Inform. 41, 580–587 (2008)CrossRef Yang, Z., Lin, H., Li, Y.: Exploiting the contextual cues for bio-entity name recognition in biomedical literature. J. Biomed. Inform. 41, 580–587 (2008)CrossRef
30.
32.
Zurück zum Zitat Zhao, S.: Named entity recognition in biomedical texts using an hmm model. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, JNLPBA 2004, pp. 84–87 (2004) Zhao, S.: Named entity recognition in biomedical texts using an hmm model. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, JNLPBA 2004, pp. 84–87 (2004)
Metadaten
Titel
Is a Common Phrase an Entity Mention or Not? Dual Representations for Domain-Specific Named Entity Recognition
verfasst von
Jiangtao Zhang
Juanzi Li
Xiao-Li Li
Yixin Cao
Lei Hou
Shuai Wang
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-91452-7_53