Skip to main content
Top
Published in: World Wide Web 3/2020

15-02-2020

A continuous learning method for recognizing named entities by integrating domain contextual relevance measurement and Web farming mode of Web intelligence

Authors: Shaofu Lin, Jiangfan Gao, Shun Zhang, Xiaobo He, Ying Sheng, Jianhui Chen

Published in: World Wide Web | Issue 3/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Web farming can advance computational social science into a never-end learning process, in which social phenomena are dynamically and scientifically understood based on continuously produced, updated and expired data in the connected hyper world. Named entity recognition is a basic and core task of Web farming. However, the existing named entity recognition methods mainly depend on the complete, high-quality and well-labelled data sets and cannot meet the requirements of real-world applications. This paper proposes a continuous learning method for recognizing named entity by introducing the Web farming mode of Web Intelligence into the recognizing process. During the on-line stage, the domain contextual relevance of candidate entities is calculated by using the domain discrimination degree and the domain dependence function for recognizing the target entities. During the off-line stage, an active learning approach is designed to continuously improve the target corpus set by binding density-based clustering with semantic distance measurement. Experimental results show that the proposed method can effectively improve the accuracy of entity recognition and is more suitable for real-world applications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Asim, M. N., Wasim, M., Khan, M. U. G., Mahmood, W., Abbasi, H. M.: A survey of ontology learning techniques and applications. Database 2018 (2018) Asim, M. N., Wasim, M., Khan, M. U. G., Mahmood, W., Abbasi, H. M.: A survey of ontology learning techniques and applications. Database 2018 (2018)
2.
go back to reference Bhatia, P., Arumae, K., Celikkaya, E. B.: Dynamic Transfer Learning for Named Entity Recognition. International Workshop on Health Intelligence, pp.69–81. Springer, Cham (2019) Bhatia, P., Arumae, K., Celikkaya, E. B.: Dynamic Transfer Learning for Named Entity Recognition. International Workshop on Health Intelligence, pp.69–81. Springer, Cham (2019)
3.
go back to reference Cheng, J., Wang, K.: Active learning for image retrieval with co-SVM. Pattern Recogn. 40(1), 330–334 (2007)CrossRef Cheng, J., Wang, K.: Active learning for image retrieval with co-SVM. Pattern Recogn. 40(1), 330–334 (2007)CrossRef
4.
go back to reference Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. TACL. 4, 357–370 (2016)CrossRef Chiu, J.P., Nichols, E.: Named entity recognition with bidirectional LSTM-CNNs. TACL. 4, 357–370 (2016)CrossRef
5.
go back to reference Cioffi-Revilla, C.: Bigger Computational Social Science: Data, Theories, Models, and Simulations--Not Just Big Data. Theories, Models, and Simulations--Not Just Big Data (May 24, 2016) (2016) Cioffi-Revilla, C.: Bigger Computational Social Science: Data, Theories, Models, and Simulations--Not Just Big Data. Theories, Models, and Simulations--Not Just Big Data (May 24, 2016) (2016)
6.
go back to reference De Boom, C., Van Canneyt, S., Bohez, S., Demeester, T., and Dhoedt, B.: Learning semantic similarity for very short texts. In: the 2015 IEEE International Conference on Data Mining Workshop (ICDW 2015), pp. 1229–1234. IEEE (2015) De Boom, C., Van Canneyt, S., Bohez, S., Demeester, T., and Dhoedt, B.: Learning semantic similarity for very short texts. In: the 2015 IEEE International Conference on Data Mining Workshop (ICDW 2015), pp. 1229–1234. IEEE (2015)
7.
go back to reference Dong, G., Chen, J., Wang, H., Zhong, N.: A narrow-domain entity recognition method based on domain relevance measurement and context information. In: the 2017 International Conference on Web Intelligence, pp. 623–628. ACM (2017) Dong, G., Chen, J., Wang, H., Zhong, N.: A narrow-domain entity recognition method based on domain relevance measurement and context information. In: the 2017 International Conference on Web Intelligence, pp. 623–628. ACM (2017)
8.
go back to reference Gao, C., Liu, J., Zhong, N.: Network immunization with distributed autonomy-oriented entities. IEEE TPDS. 22(7), 1222–1229 (2010)CrossRef Gao, C., Liu, J., Zhong, N.: Network immunization with distributed autonomy-oriented entities. IEEE TPDS. 22(7), 1222–1229 (2010)CrossRef
9.
go back to reference Hakenberg, J., Bickel, S., Plake, C., Brefeld, U., Zahn, H., Faulstich, L., Leser, U., Scheffer, T.: Systematic feature evaluation for gene name recognition. BMC BIOINFORMATICS. 6(1), S9 (2005)CrossRef Hakenberg, J., Bickel, S., Plake, C., Brefeld, U., Zahn, H., Faulstich, L., Leser, U., Scheffer, T.: Systematic feature evaluation for gene name recognition. BMC BIOINFORMATICS. 6(1), S9 (2005)CrossRef
10.
go back to reference Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 945–954 (2001) Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 945–954 (2001)
11.
go back to reference Han, X., Kwoh, C. K., Kim, J. J.: Clustering based active learning for biomedical named entity recognition. In: the 2016 International Joint Conference on Neural Networks (IJCNN), pp. 1253–1260. IEEE (2016) Han, X., Kwoh, C. K., Kim, J. J.: Clustering based active learning for biomedical named entity recognition. In: the 2016 International Joint Conference on Neural Networks (IJCNN), pp. 1253–1260. IEEE (2016)
12.
go back to reference Hu, J., Zhong, N.: Web farming with clickstream. Int. J. Inf. Technol. Decis. Mak. 7(02), 291–308 (2008)CrossRef Hu, J., Zhong, N.: Web farming with clickstream. Int. J. Inf. Technol. Decis. Mak. 7(02), 291–308 (2008)CrossRef
13.
go back to reference Jiang, X., Tan, A.H.: CRCTOL: a semantic-based domain ontology learning system. J. Am. Soc. Inf. Sci. Technol. 61(1), 150–168 (2010)CrossRef Jiang, X., Tan, A.H.: CRCTOL: a semantic-based domain ontology learning system. J. Am. Soc. Inf. Sci. Technol. 61(1), 150–168 (2010)CrossRef
14.
go back to reference Ju Z, Wang J, Zhu F.: Named entity recognition from biomedical text using SVM. In: 2011 International Conference on Bioinformatics and Biomedical Engineering (BIBM), pp. 1–4. IEEE (2011) Ju Z, Wang J, Zhu F.: Named entity recognition from biomedical text using SVM. In: 2011 International Conference on Bioinformatics and Biomedical Engineering (BIBM), pp. 1–4. IEEE (2011)
15.
go back to reference Kang, Y.B., Haghighi, P.D., Burstein, F.: CFinder: an intelligent key concept finder from text for ontology development. Expert Syst. Appl. 41(9), 4494–4504 (2014)CrossRef Kang, Y.B., Haghighi, P.D., Burstein, F.: CFinder: an intelligent key concept finder from text for ontology development. Expert Syst. Appl. 41(9), 4494–4504 (2014)CrossRef
16.
go back to reference Kim, S., Song, Y., Kim, K., Cha, J. W., Lee, G. G.: Mmr-based active machine learning for bio named entity recognition. In: the 2006 Human Language Technology Conference of the NAACL (2006) Kim, S., Song, Y., Kim, K., Cha, J. W., Lee, G. G.: Mmr-based active machine learning for bio named entity recognition. In: the 2006 Human Language Technology Conference of the NAACL (2006)
17.
go back to reference Leaman, R., Wei, C.H., Lu, Z.: tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminformatics. 7(S1), S3 (2015) Leaman, R., Wei, C.H., Lu, Z.: tmChem: a high performance approach for chemical named entity recognition and normalization. J Cheminformatics. 7(S1), S3 (2015)
18.
go back to reference Lewis, D. D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. Machine Learning Proceedings 1994, pp. 148–156. Morgan Kaufmann (1994) Lewis, D. D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. Machine Learning Proceedings 1994, pp. 148–156. Morgan Kaufmann (1994)
19.
go back to reference Li, Y.F., Zhong, N.: Web mining model and its applications for information gathering. Knowl.-Based Syst. 17(5–6), 207–217 (2004)CrossRef Li, Y.F., Zhong, N.: Web mining model and its applications for information gathering. Knowl.-Based Syst. 17(5–6), 207–217 (2004)CrossRef
20.
go back to reference Li, L., Zhou, R., Huang, D.: Two-phase biomedical named entity recognition using CRFs. Comput. Biol. Chem. 33(4), 334–338 (2009)CrossRef Li, L., Zhou, R., Huang, D.: Two-phase biomedical named entity recognition using CRFs. Comput. Biol. Chem. 33(4), 334–338 (2009)CrossRef
21.
go back to reference Li, J., Sun, A., Han, J., Li, C.: A Survey on Deep Learning for Named Entity Recognition. In: the CoRR (2018), p. 1 (2018) Li, J., Sun, A., Han, J., Li, C.: A Survey on Deep Learning for Named Entity Recognition. In: the CoRR (2018), p. 1 (2018)
22.
go back to reference Ling, X., Weld, D. S.: Fine-grained entity recognition. In: the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 94–100. AAAI Press (2012) Ling, X., Weld, D. S.: Fine-grained entity recognition. In: the Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 94–100. AAAI Press (2012)
23.
go back to reference Martin, J.H., Jurafsky, D.: Speech and Language Processing: an Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, Pearson/Prentice Hall (2009) Martin, J.H., Jurafsky, D.: Speech and Language Processing: an Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, Pearson/Prentice Hall (2009)
24.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In Workshop at International Conference on Learning Representations (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In Workshop at International Conference on Learning Representations (2013)
25.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J.: Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
26.
go back to reference Mikolov, T., Yih, W. T., Zweig, G.: Linguistic regularities in continuous space word representations. In: the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013) Mikolov, T., Yih, W. T., Zweig, G.: Linguistic regularities in continuous space word representations. In: the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)
27.
go back to reference Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Yang, B., Betteridge, J., Krishnamurthy, J., et al.: Never-ending learning. Commun. ACM. 61(5), 103–115 (2018)CrossRef Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Yang, B., Betteridge, J., Krishnamurthy, J., et al.: Never-ending learning. Commun. ACM. 61(5), 103–115 (2018)CrossRef
28.
go back to reference Navigli, R., Velardi, P.: Learning domain ontologies from document warehouses and dedicated Web sites. Comput Linguist. 30(2), 151–179 (2004)CrossRef Navigli, R., Velardi, P.: Learning domain ontologies from document warehouses and dedicated Web sites. Comput Linguist. 30(2), 151–179 (2004)CrossRef
29.
go back to reference Nguyen, T. D., Mai, K., Pham, T. H., Nguyen, M. T., Nguyen, T. V. T., Eguchi, T., Sasano R., Sekine, S.: Extended Named Entity Recognition API and Its Applications in Language Education. In: the 2017 ACL, System Demonstrations, pp. 37–42 (2017) Nguyen, T. D., Mai, K., Pham, T. H., Nguyen, M. T., Nguyen, T. V. T., Eguchi, T., Sasano R., Sekine, S.: Extended Named Entity Recognition API and Its Applications in Language Education. In: the 2017 ACL, System Demonstrations, pp. 37–42 (2017)
30.
go back to reference Pasolli, E., Melgani, F.: Active learning methods for electrocardiographic signal classification. IEEE Trans. Inf. Technol. Biomed. 14(6), 1405–1416 (2010)CrossRef Pasolli, E., Melgani, F.: Active learning methods for electrocardiographic signal classification. IEEE Trans. Inf. Technol. Biomed. 14(6), 1405–1416 (2010)CrossRef
31.
go back to reference Qu, L., Ferraro, G., Zhou, L., Hou, W., Baldwin, T.: Named entity recognition for novel types by transfer learning. In: the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 899–905 (2016) Qu, L., Ferraro, G., Zhou, L., Hou, W., Baldwin, T.: Named entity recognition for novel types by transfer learning. In: the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 899–905 (2016)
32.
go back to reference Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60(5), 503–520 (2004)CrossRef Robertson, S.: Understanding inverse document frequency: on theoretical arguments for IDF. J. Doc. 60(5), 503–520 (2004)CrossRef
33.
go back to reference Rodríguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15(2), 442–456 (2003)CrossRef Rodríguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15(2), 442–456 (2003)CrossRef
34.
go back to reference Sathiya, B., Geetha, T.V.: Automatic ontology learning from multiple knowledge sources of text. International IJIIT. 14(2), 1–21 (2018)CrossRef Sathiya, B., Geetha, T.V.: Automatic ontology learning from multiple knowledge sources of text. International IJIIT. 14(2), 1–21 (2018)CrossRef
35.
go back to reference Settles, B., and Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: the Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079. Association for Computational Linguistics (2008) Settles, B., and Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: the Conference on Empirical Methods in Natural Language Processing, pp. 1070–1079. Association for Computational Linguistics (2008)
36.
go back to reference Seung, H. S., Opper, M., Sompolinsky, H.: Query by committee. In: the fifth annual workshop on Computational learning theory, pp. 287–294. ACM (1992) Seung, H. S., Opper, M., Sompolinsky, H.: Query by committee. In: the fifth annual workshop on Computational learning theory, pp. 287–294. ACM (1992)
37.
go back to reference Tao, X., Li, Y., Zhong, N., Nayak, R.: Automatically acquiring training sets for Web information gathering. In: the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp.532–535. IEEE Computer Society (2006) Tao, X., Li, Y., Zhong, N., Nayak, R.: Automatically acquiring training sets for Web information gathering. In: the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pp.532–535. IEEE Computer Society (2006)
38.
go back to reference Tao, X., Li, Y., Zhong, N.: A personalized ontology model for Web information gathering. IEEE Trans. Knowl. Data Eng. 23(4), 496–511 (2010)CrossRef Tao, X., Li, Y., Zhong, N.: A personalized ontology model for Web information gathering. IEEE Trans. Knowl. Data Eng. 23(4), 496–511 (2010)CrossRef
39.
go back to reference Tomanek, K., Wermter, J., Hahn, U.: An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 486–495 (2007) Tomanek, K., Wermter, J., Hahn, U.: An approach to text corpus construction which cuts annotation costs and maintains reusability of annotated data. In: the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 486–495 (2007)
40.
go back to reference Tran, V.C., Nguyen, N.T., Fujita, H., Hoang, D.T., Hwang, D.: A combination of active learning and self-learning for named entity recognition on twitter using conditional random fields. Knowl.-Based Syst. 132, 179–187 (2017)CrossRef Tran, V.C., Nguyen, N.T., Fujita, H., Hoang, D.T., Hwang, D.: A combination of active learning and self-learning for named entity recognition on twitter using conditional random fields. Knowl.-Based Syst. 132, 179–187 (2017)CrossRef
41.
go back to reference Yao, Y.Y., Zhong, N., Liu, J., Ohsuga, S.: Web intelligence (WI): research challenges and trends in the new information age. Lecture Notes in Artificial Intelligence, 2198, 1–17 (2001) Yao, Y.Y., Zhong, N., Liu, J., Ohsuga, S.: Web intelligence (WI): research challenges and trends in the new information age. Lecture Notes in Artificial Intelligence, 2198, 1–17 (2001)
42.
go back to reference Zeng, Y., Zhong, N., Wang, Y., Qin, Y., Huang, Z., Zhou, H., Yao, Y., Van Harmelen, F.: User-centric query refinement and processing using granularity-based strategies. Knowl. Inf. Syst. 27(3), 419–450 (2011)CrossRef Zeng, Y., Zhong, N., Wang, Y., Qin, Y., Huang, Z., Zhou, H., Yao, Y., Van Harmelen, F.: User-centric query refinement and processing using granularity-based strategies. Knowl. Inf. Syst. 27(3), 419–450 (2011)CrossRef
43.
go back to reference Zhong, N.: Developing intelligent portals by using WI technologies. Wavelet Analysis and Its Applications, and Active Media Technology: (In 2 Volumes) pp. 555–567 (2004) Zhong, N.: Developing intelligent portals by using WI technologies. Wavelet Analysis and Its Applications, and Active Media Technology: (In 2 Volumes) pp. 555–567 (2004)
44.
go back to reference Zhong, N., Chen, J.: Constructing a new-style conceptual model of brain data for systematic brain informatics. IEEE Trans. Knowl. Data Eng. 24(12), 2127–2142 (2011)CrossRef Zhong, N., Chen, J.: Constructing a new-style conceptual model of brain data for systematic brain informatics. IEEE Trans. Knowl. Data Eng. 24(12), 2127–2142 (2011)CrossRef
45.
go back to reference Zhong, N., Liu, J., Yao, Y.: Envisioning intelligent information technologies through the prism of Web intelligence. Commun. ACM. 50(3), 89–94 (2007)CrossRef Zhong, N., Liu, J., Yao, Y.: Envisioning intelligent information technologies through the prism of Web intelligence. Commun. ACM. 50(3), 89–94 (2007)CrossRef
46.
go back to reference Zhong N, Liu, J., Yao, Y.: Web intelligence (WI). Wiley Encyclopedia of Computer Science and Engineering, 1–11 (2007) Zhong N, Liu, J., Yao, Y.: Web intelligence (WI). Wiley Encyclopedia of Computer Science and Engineering, 1–11 (2007)
47.
go back to reference Zhong, N., Li, Y., Wu, S.T.: Effective pattern discovery for text mining. IEEE Trans. Knowl. Data Eng. 24(1), 30–44 (2012)CrossRef Zhong, N., Li, Y., Wu, S.T.: Effective pattern discovery for text mining. IEEE Trans. Knowl. Data Eng. 24(1), 30–44 (2012)CrossRef
48.
go back to reference Zhong, N., Ma, J.H., Huang, R.H., Liu, J.M., Yao, Y.Y., Zhang, Y.X., Chen, J.H.: Research challenges and perspectives on wisdom Web of things (W2T). J. Supercomput. 64(3), 862–882 (2013)CrossRef Zhong, N., Ma, J.H., Huang, R.H., Liu, J.M., Yao, Y.Y., Zhang, Y.X., Chen, J.H.: Research challenges and perspectives on wisdom Web of things (W2T). J. Supercomput. 64(3), 862–882 (2013)CrossRef
49.
go back to reference Zhong, N., Liu, J., Shi, Y., Yao, Y.: An interview with professor raj Reddy on Web intelligence (WI) and computational social science (CSS). WI. 16(3), 143–146 (2018)CrossRef Zhong, N., Liu, J., Shi, Y., Yao, Y.: An interview with professor raj Reddy on Web intelligence (WI) and computational social science (CSS). WI. 16(3), 143–146 (2018)CrossRef
Metadata
Title
A continuous learning method for recognizing named entities by integrating domain contextual relevance measurement and Web farming mode of Web intelligence
Authors
Shaofu Lin
Jiangfan Gao
Shun Zhang
Xiaobo He
Ying Sheng
Jianhui Chen
Publication date
15-02-2020
Publisher
Springer US
Published in
World Wide Web / Issue 3/2020
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI
https://doi.org/10.1007/s11280-019-00758-x

Other articles of this Issue 3/2020

World Wide Web 3/2020 Go to the issue

Premium Partner