Skip to main content
Top

2020 | OriginalPaper | Chapter

Automatic Extraction of Locations from News Articles Using Domain Knowledge

Authors : Loitongbam Sanayai Meetei, Ringki Das, Thoudam Doren Singh, Sivaji Bandyopadhyay

Published in: Big Data, Machine Learning, and Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the increasing amount of digital data, it is becoming increasingly hard to extract useful information from text data, especially for resource-constrained languages. In this work, we report the task of language-independent automatic extraction of locations from news articles using domain knowledge. The work is tested on four languages namely, English and three resource-constrained languages: Assamese, Manipuri and Mizo, the lingua francas of three neighboring North-Eastern states of India namely Assam, Manipur, and Mizoram respectively. Our architecture is based on semantic similarity between similar words based on the popular word embedding, word2vec model coupled with the domain knowledge of the aforementioned regions. The model is able to detect the best possible detailed locations.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Bentham, J., Pakray, P., Majumder, G., Lalbiaknia, S., Gelbukh, A.: Identification of rules for recognition of named entity classes in mizo language. In: Fifteenth Mexican International Conference on Artificial Intelligence (MICAI), pp. 8–13. IEEE (2016) Bentham, J., Pakray, P., Majumder, G., Lalbiaknia, S., Gelbukh, A.: Identification of rules for recognition of named entity classes in mizo language. In: Fifteenth Mexican International Conference on Artificial Intelligence (MICAI), pp. 8–13. IEEE (2016)
3.
go back to reference Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990) Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16(1), 22–29 (1990)
4.
go back to reference Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative- sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014) Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative- sampling word-embedding method. arXiv preprint arXiv:​1402.​3722 (2014)
5.
go back to reference Imani, M.B., Chandra, S., Ma, S., Khan, L., Thuraisingham, B.: Focus location extraction from political news reports with bias correction. In: IEEE International Conference on Big Data (Big Data), pp. 1956–1964. IEEE (2017) Imani, M.B., Chandra, S., Ma, S., Khan, L., Thuraisingham, B.: Focus location extraction from political news reports with bias correction. In: IEEE International Conference on Big Data (Big Data), pp. 1956–1964. IEEE (2017)
6.
go back to reference Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discovery Data (TKDD) 2(2), 10 (2008) Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discovery Data (TKDD) 2(2), 10 (2008)
7.
go back to reference Lingad, J., Karimi, S., Yin, J.: Location extraction from disaster-related microblogs. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1017–1020. ACM (2013) Lingad, J., Karimi, S., Yin, J.: Location extraction from disaster-related microblogs. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1017–1020. ACM (2013)
8.
go back to reference Meetei, L.S., Singh, T.D., Bandyopadhyay, S.: Extraction and Identification of Manipuri and Mizo Texts from Scene and Document Images. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D.K., Bora, P.K., Pal, S.K. (eds.) PReMI 2019. LNCS, vol. 11941, pp. 405–414. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34869-4_44CrossRef Meetei, L.S., Singh, T.D., Bandyopadhyay, S.: Extraction and Identification of Manipuri and Mizo Texts from Scene and Document Images. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D.K., Bora, P.K., Pal, S.K. (eds.) PReMI 2019. LNCS, vol. 11941, pp. 405–414. Springer, Cham (2019). https://​doi.​org/​10.​1007/​978-3-030-34869-4_​44CrossRef
9.
go back to reference Mihalcea, R., Corley, C., Strapparava, C., et al.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006) Mihalcea, R., Corley, C., Strapparava, C., et al.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)
10.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781 (2013)
11.
12.
go back to reference Sharma, P., Sharma, U., Kalita, J.: The first steps towards Assamese named entity recognition. In: Brisbane Convention Center, vol. 1, pp. 1–11 (2010) Sharma, P., Sharma, U., Kalita, J.: The first steps towards Assamese named entity recognition. In: Brisbane Convention Center, vol. 1, pp. 1–11 (2010)
13.
go back to reference Singh, T.D., Bandyopadhyay, S.: Web based Manipuri corpus for multiword ner and reduplicated MWEs identification using Svm. In: Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing, pp. 35–42 (2010) Singh, T.D., Bandyopadhyay, S.: Web based Manipuri corpus for multiword ner and reduplicated MWEs identification using Svm. In: Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing, pp. 35–42 (2010)
14.
go back to reference Singh, T.D., Nongmeikapam, K., Ekbal, A., Bandyopadhyay, S.: Named entity recognition for Manipuri using support vector machine. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, vol. 2, pp. 811–818 (2009) Singh, T.D., Nongmeikapam, K., Ekbal, A., Bandyopadhyay, S.: Named entity recognition for Manipuri using support vector machine. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, vol. 2, pp. 811–818 (2009)
15.
go back to reference Wang, Z., Mi, H., Ittycheriah, A.: Sentence similarity learning by lexical decomposition and composition. arXiv preprint arXiv:1602.07019 (2016) Wang, Z., Mi, H., Ittycheriah, A.: Sentence similarity learning by lexical decomposition and composition. arXiv preprint arXiv:​1602.​07019 (2016)
16.
go back to reference Wen, Y., Yuan, H., Zhang, P.: Research on keyword extraction based on word2vec weighted textrank. In: 2nd IEEE International Conference on Computer and Communications (ICCC), pp. 2109–2113. IEEE (2016) Wen, Y., Yuan, H., Zhang, P.: Research on keyword extraction based on word2vec weighted textrank. In: 2nd IEEE International Conference on Computer and Communications (ICCC), pp. 2109–2113. IEEE (2016)
Metadata
Title
Automatic Extraction of Locations from News Articles Using Domain Knowledge
Authors
Loitongbam Sanayai Meetei
Ringki Das
Thoudam Doren Singh
Sivaji Bandyopadhyay
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-62625-9_4

Premium Partner