ABSTRACT
Place name recognition is one of the key tasks in Information Extraction. In this paper, we tackle this task in English News from India. We first analyze the results obtained by using available tools and corpora and then train our own models to obtain better results. Most of the previous work done on entity recognition for English makes use of similar corpora for both training and testing. Yet we observe that the performance drops significantly when we test the models on different datasets. For this reason, we have trained various models using combinations of several corpora. Our results show that training models using combinations of several corpora improves the relative performance of these models but still more research on this area is necessary to obtain place name recognizers that generalize to any given dataset.
- N Abinaya, Neethu John, Barathi HB Ganesh, Anand M Kumar, and KP Soman. 2014. AMRITA_CEN@ FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features. In Proceedings of the Forum for Information Retrieval Evaluation. ACM, 103--111. Google ScholarDigital Library
- Beatrice Alex, Kate Byrne, Claire Grover, and Richard Tobin. 2015. Adapting the Edinburgh geoparser for historical georeferencing. International Journal of Humanities and Arts Computing 9, 1 (2015), 15--35.Google ScholarCross Ref
- Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The semantic web. Springer, 722--735. Google ScholarDigital Library
- Isabelle Augenstein, Leon Derczynski, and Kalina Bontcheva. 2017. Generalisation in named entity recognition: A quantitative analysis. Computer Speech & Language 44 (2017), 61--83. Google ScholarDigital Library
- Jason PC Chiu and Eric Nichols. 2015. Named entity recognition with bidirectional LSTM-CNNs. arXiv preprint arXiv:1511.08308 (2015).Google Scholar
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, Aug (2011), 2493--2537. Google ScholarDigital Library
- Francisco Couto, Luis Campos, and Andre Lamurias. 2017. MER: a Minimal Named-Entity Recognition Tagger and Annotation Server. (04 2017).Google Scholar
- Grant DeLozier, Jason Baldridge, and Loretta London. 2015. Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles.. In AAAI. 2382--2388. Google ScholarDigital Library
- Franck Dernoncourt, Ji Young Lee, and Peter Szolovits. 2017. NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. Conference on Empirical Methods on Natural Language Processing (EMNLP) (2017).Google ScholarCross Ref
- Franck Dernoncourt, Ji Young Lee, Ozlem Uzuner, and Peter Szolovits. 2016. De-identification of Patient Notes with Recurrent Neural Networks. Journal of the American Medical Informatics Association (JAMIA) (2016).Google Scholar
- Allyson Ettinger, Sudha Rao, Hal Daumé III, and Emily M Bender. 2017. Towards linguistically generalizable nlp systems: A workshop and shared task. arXiv preprint arXiv:1711.01505 (2017).Google Scholar
- Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, 363--370. Google ScholarDigital Library
- Christopher B. Jones and Ross S. Purves. 2008. Geographical information retrieval. International Journal of Geographical Information Science 22, 3 (2008), 219--228. Google ScholarDigital Library
- Morteza Karimzadeh, Wenyi Huang, Siddhartha Banerjee, Jan Oliver Wallgrün, Frank Hardisty, Scott Pezanowski, Prasenjit Mitra, and Alan M MacEachren. 2013. GeoTxt: a web API to leverage place references in text. In Proceedings of the 7th workshop on geographic information retrieval. ACM, 72--73. Google ScholarDigital Library
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016).Google Scholar
- Thomas Lavergne, Olivier Cappé, and François Yvon. 2010. Practical Very Large Scale CRFs. In Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 504--513. http://www.aclweb.org/anthology/P10-1052 Google ScholarDigital Library
- David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Lingvisticae Investigationes 30, 1 (2007), 3--26.Google ScholarCross Ref
- Sailaja Pingali. 2009. Indian English. Edinburgh University Press.Google Scholar
- Dinesh Kumar Prabhakar, Shantanu Dubey, Bharti Goel, and Sukomal Pal. 2014. ISM@FIRE-2014: Named Entity Recognition for Indian Languages. In Proceedings of the Forum for Information Retrieval Evaluation. ACM, 98--102. Google ScholarDigital Library
- Ross S Purves, Paul Clough, Christopher B Jones, Mark H Hall, Vanessa Murdock, et al. 2018. Geographic Information Retrieval: Progress and Challenges in Spatial Search of Text. Foundations and Trends® in Information Retrieval 12, 2-3 (2018), 164--318.Google Scholar
- Radim Rehurek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Citeseer.Google Scholar
- SP Sanjay, M Anand Kumar, and KP Soman. 2015. AMRITA_CEN-NLP@ FIRE 2015: CRF Based Named Entity Extractor For Twitter Microposts.. In FIRE Workshops. 96--99.Google Scholar
- Erik F Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Association for Computational Linguistics, 142--147. Google ScholarDigital Library
Index Terms
- Towards Generalizable Place Name Recognition Systems: Analysis and Enhancement of NER Systems on English News from India
Recommendations
Biomedical Named Entity Recognition with less Supervision
ICHI '15: Proceedings of the 2015 International Conference on Healthcare InformaticsAnnotating clinical notes manually is very labor-intensive and needs expertise in the area of annotation. Thus annotation is a highly expensive task not only in human resource but also in financial aspects. Moreover mistakes, missed tags, and ...
Generalisation in named entity recognition
Quantitative study of NER performance in diverse corpora of different genres, including newswire and social media.Multiple state of the art NER approaches are tested.Possible reasons for NER failure are analysed and quantified: NE diversity, unseen NEs ...
NERA: Named Entity Recognition for Arabic
Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products revolving around natural language processing tasks. Many researchers have attacked the name identification problem in a ...
Comments