skip to main content
10.1145/3281354.3281363acmconferencesArticle/Chapter ViewAbstractPublication PagesgirConference Proceedingsconference-collections
research-article

Towards Generalizable Place Name Recognition Systems: Analysis and Enhancement of NER Systems on English News from India

Published:06 November 2018Publication History

ABSTRACT

Place name recognition is one of the key tasks in Information Extraction. In this paper, we tackle this task in English News from India. We first analyze the results obtained by using available tools and corpora and then train our own models to obtain better results. Most of the previous work done on entity recognition for English makes use of similar corpora for both training and testing. Yet we observe that the performance drops significantly when we test the models on different datasets. For this reason, we have trained various models using combinations of several corpora. Our results show that training models using combinations of several corpora improves the relative performance of these models but still more research on this area is necessary to obtain place name recognizers that generalize to any given dataset.

References

  1. N Abinaya, Neethu John, Barathi HB Ganesh, Anand M Kumar, and KP Soman. 2014. AMRITA_CEN@ FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features. In Proceedings of the Forum for Information Retrieval Evaluation. ACM, 103--111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Beatrice Alex, Kate Byrne, Claire Grover, and Richard Tobin. 2015. Adapting the Edinburgh geoparser for historical georeferencing. International Journal of Humanities and Arts Computing 9, 1 (2015), 15--35.Google ScholarGoogle ScholarCross RefCross Ref
  3. Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722--735. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Isabelle Augenstein, Leon Derczynski, and Kalina Bontcheva. 2017. Generalisation in named entity recognition: A quantitative analysis. Computer Speech & Language 44 (2017), 61--83. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jason PC Chiu and Eric Nichols. 2015. Named entity recognition with bidirectional LSTM-CNNs. arXiv preprint arXiv:1511.08308 (2015).Google ScholarGoogle Scholar
  6. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Francisco Couto, Luis Campos, and Andre Lamurias. 2017. MER: a Minimal Named-Entity Recognition Tagger and Annotation Server. (04 2017).Google ScholarGoogle Scholar
  8. Grant DeLozier, Jason Baldridge, and Loretta London. 2015. Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles.. In AAAI. 2382--2388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Franck Dernoncourt, Ji Young Lee, and Peter Szolovits. 2017. NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. Conference on Empirical Methods on Natural Language Processing (EMNLP) (2017).Google ScholarGoogle ScholarCross RefCross Ref
  10. Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, and Peter Szolovits. 2016. De-identification of Patient Notes with Recurrent Neural Networks. Journal of the American Medical Informatics Association (JAMIA) (2016).Google ScholarGoogle Scholar
  11. Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily M Bender. 2017. Towards linguistically generalizable nlp systems: A workshop and shared task. arXiv preprint arXiv:1711.01505 (2017).Google ScholarGoogle Scholar
  12. Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, 363--370. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Christopher B. Jones and Ross S. Purves. 2008. Geographical information retrieval. International Journal of Geographical Information Science 22, 3 (2008), 219--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Morteza Karimzadeh, Wenyi Huang, Siddhartha Banerjee, Jan Oliver Wallgrün, Frank Hardisty, Scott Pezanowski, Prasenjit Mitra, and Alan M MacEachren. 2013. GeoTxt: a web API to leverage place references in text. In Proceedings of the 7th workshop on geographic information retrieval. ACM, 72--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google ScholarGoogle Scholar
  16. Thomas Lavergne, Olivier Cappé, and François Yvon. 2010. Practical Very Large Scale CRFs. In Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 504--513. http://www.aclweb.org/anthology/P10-1052 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3--26.Google ScholarGoogle ScholarCross RefCross Ref
  18. Sailaja Pingali. 2009. Indian English. Edinburgh University Press.Google ScholarGoogle Scholar
  19. Dinesh Kumar Prabhakar, Shantanu Dubey, Bharti Goel, and Sukomal Pal. 2014. ISM@FIRE-2014: Named Entity Recognition for Indian Languages. In Proceedings of the Forum for Information Retrieval Evaluation. ACM, 98--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ross S Purves, Paul Clough, Christopher B Jones, Mark H Hall, Vanessa Murdock, et al. 2018. Geographic Information Retrieval: Progress and Challenges in Spatial Search of Text. Foundations and Trends® in Information Retrieval 12, 2-3 (2018), 164--318.Google ScholarGoogle Scholar
  21. Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer.Google ScholarGoogle Scholar
  22. SP Sanjay, M Anand Kumar, and KP Soman. 2015. AMRITA_CEN-NLP@ FIRE 2015: CRF Based Named Entity Extractor For Twitter Microposts.. In FIRE Workshops. 96--99.Google ScholarGoogle Scholar
  23. Erik F Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 142--147. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Generalizable Place Name Recognition Systems: Analysis and Enhancement of NER Systems on English News from India

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            GIR'18: Proceedings of the 12th Workshop on Geographic Information Retrieval
            November 2018
            37 pages
            ISBN:9781450360340
            DOI:10.1145/3281354

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 November 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            GIR'18 Paper Acceptance Rate8of12submissions,67%Overall Acceptance Rate46of61submissions,75%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader