nach oben

Cognitive Computation

Erschienen in:

10.05.2020

Extracting Time Expressions and Named Entities with Constituent-Based Tagging Schemes

verfasst von: Xiaoshi Zhong, Erik Cambria, Amir Hussain

Erschienen in: Cognitive Computation | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Time expressions and named entities play important roles in data mining, information retrieval, and natural language processing. However, the conventional position-based tagging schemes (e.g., the BIO and BILOU schemes) that previous research used to model time expressions and named entities suffer from the problem of inconsistent tag assignment. To overcome the problem of inconsistent tag assignment, we designed a new type of tagging schemes to model time expressions and named entities based on their constituents. Specifically, to model time expressions, we defined a constituent-based tagging scheme termed TOMN scheme with four tags, namely T, O, M, and N, indicating the defined constituents of time expressions, namely time token, modifier, numeral, and the words outside time expressions. To model named entities, we defined a constituent-based tagging scheme termed UGTO scheme with four tags, namely U, G, T, and O, indicating the defined constituents of named entities, namely uncommon word, general modifier, trigger word, and the words outside named entities. In modeling, our TOMN and UGTO schemes model time expressions and named entities under conditional random fields with minimal features according to an in-depth analysis for the characteristics of time expressions and named entities. Experiments on diverse datasets demonstrate that our proposed methods perform equally with or more effectively than representative state-of-the-art methods on both time expression extraction and named entity extraction.

Vorheriger Artikel Cognitive Template-Clustering Improved LineMod for Efficient Multi-object Pose Estimation

Nächster Artikel An Effective Semi-fragile Watermarking Method for Image Authentication Based on Lifting Wavelet Transform and Feed-Forward Neural Network

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

In a supervised-learning procedure, tag assignment occurs in two stages: (1) feature extraction in the training stage and (2) tag prediction in the testing stage. We focus on the training stage to analyze the impact of tag assignment.

OntoNotes5’s 18 entity types include CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW, LOC, MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART.

Those removed entity types are CARDINAL, DATE, MONEY, ORDINAL, PERCENT, QUANTITY, TIME.

https://github.com/ontonotes/conll-formatted-ontonotes-5.0

The p_whole of proper nouns does not reach 100% mainly because each individual dataset is concerned with certain types of named entities and partly because some NNP* words are POS tagging errors, e.g., “SURPRISE DEFEAT” is tagged as “NNPNNP,” but it should be tagged as “JJ NN.”

The BIO scheme in this paper denotes the standard IOB2 scheme described in [67].

The BILOU scheme is also widely known as the BIOES or IOBES scheme.

https://en.wikipedia.org/wiki/Lists_of_cities_by_country and https://en.wikipedia.org/wiki/Lists_of_people_by_nationality.

Note that this kind of uncommon words are not available in the training phase because they are extracted from the unannotated test set.

We followed [82] not to use the Gigaword dataset in experiments because its labels are not ground-truth labels, but are automatically generated by other taggers.

Alex B, Haddow B, Grover C. Recognising nested named entities in biomedical text. Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing; 2007. p. 65–72.

Alonso O, Strotgen J, Baeza-Yates R, Gertz M. Temporal information retrieval: challenges and opportunities. Proceedings of 1st International Temporal Web Analytics Workshop; 2011. p. 1–8.

Angeli G, Manning CD, Jurafsky D. Parsing time: learning to interpret time expressions. Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2012. p. 446–55.

Angeli G, Uszkoreit J. Language-independent discriminative parsing of temporal expressions. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics; 2013. p. 83–92.

Bethard S. ClearTK-TimeML: a minimalist approach to TempEval 2013. Proceedings of the 7th International Workshop on Semantic Evaluation. Minneapolis: Association for Computational Linguistics; 2013. p. 10–4.

Borthwick A, Sterling J, Agichtein E, Grishman R. NYU: description of the MENE named entity system as used in MUC-7. Proceedings of the 7th Message Understanding Conference; 1998.

Campos R, Dias G, Jorge AM, Jatowt A. Survey of temporal information retrieval and related applications. ACM Comput Surv 2014;47(2):15:1–41.

Chambers N, Wang S, Jurafsky D. Classifying temporal relations between events. Proceedings of the ACL on Interactive Poster and Demonstration Sessions. Ann Arbor: Association for computational linguistics; 2007. p. 173–6.

Chang AX, Manning CD. SUTime: a library for recognizing and normalizing time expressions. Proceedings of 8th International Conference on Language Resources and Evaluation; 2012. p. 3735–40.

10.

Chang AX, Manning CD. SUTime: evaluation in TempEval-3. Proceedings of the Second Joint Conference on Lexical and Computational Semantics (SEM); 2013. p. 78–82.

11.

Chinchor NA. MUC-7 named entity task definition. Proceedings of the 7th Message Understanding Conference; 1998.

12.

Chinchor NA. Overview of MUC-7/MET-2. Proceedings of the 7th Message Understanding Conference; 1998.

13.

Collins M, Singer Y. Unsupervised models for named entity classification. Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. College Park: Association for Computational Linguistics; 1999.

14.

Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa PP. Natural language processing (almost) from scratch. J Mach Learn Res 2011;12:2493–537.MATH

15.

Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: Association for Computational Linguistics; 2019. p. 4171–86.

16.

Do QX, Lu W, Roth D. Joint inference for event timeline construction. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning; 2012. p. 677–87.

17.

Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R. The automatic content extraction (ACE) program tasks, data, and evaluation. Proceedings of the 2004 Conference on Language Resources and Evaluation; 2004 . p. 1–4.

18.

Ferro L, Gerber L, Mani I, Sundheim B, Wilson G. 2005. TIDES 2005 standard for the annotation of temporal expressions. MITRE.

19.

Filannino M, Brown G, Nenadic G. ManTIME: temporal expression identification and normalization in the TempEval-3 challenge. Proceedings of the 7th International Workshop on Semantic Evaluation; 2013. p. 53–7.

20.

Finkel JR, Grenager T, Manning C. Incorporating non-local information into information extraction systems by gibbs sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics; 2005. p. 363–70.

21.

Finkel JR, Manning C. Nested named entity recognition. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing; 2009. p. 141–50.

22.

Giuliano C. Fine-grained classification of named entities exploiting latent semantic kernels. Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Boulder: Association for Computational Linguistics; 2009. p. 201–9.

23.

Grishman R, Sundheim B. Message understanding conference - 6: a brief history. Proceedings of the 16th International Conference on Computational Linguistics; 1996.

24.

Hacioglu K, Chen Y, Douglas B. Automatic time expression labeling for English and Chinese text. Proceedings of the 6th International Conference on Intelligent Text Processing and Computational Linguistics. Mexico City: Springer; 2005 . p. 548–59.

25.

Hochreiter S, Schmidhuber J. Long short-term memory. Neur Comput 1997;9:1735–80.CrossRef

26.

Huang Z, Xu W, Yu K. 2015. Bidirectional LSTM-CRF models for sequence tagging.

27.

Ji H, Grishman R. Knowledge base population: successful approaches and challenges. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics; 2011. p. 1148–58.

28.

Kazama J, Torisawa K. Exploiting wikipedia as external knowledge for named entity recognition. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Prague: Association for Computational Linguistics; 2007. p. 698–707.

29.

Krallinger M, Leitner F, Rabal O, Vazquez M, Oyarzabal J, Valencia A. Overview of the chemical compound and drug name recognition (CHEMDNER) task. BioCreative Challenge Eval Workshop; 2015. p. 2–33.

30.

Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning. Williams College: Morgan Kaufmann Publishers; 2001. p. 281–9.

31.

Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architecture for named entity recognition. Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics; 2016. p. 260–70.

32.

Lee K, Artzi Y, Dodge J, Zettlemoyer L. Context-dependent semantic parsing for time expressions. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore: Association for Computational Linguistics; 2014 . p. 1437–47.

33.

Li J, Cardie C. Timeline generation: tracking individuals on twitter. Proceedings of the 23rd International Conference on World Wide Web; 2014. p. 643–52.

34.

Liang P. 2005. Semi-supervised learning for natural language. Master’s Thesis.

35.

Ling W, Dyer C, Black AW, Trancoso I, Fermandez R, Amir S, Marujo L, Luis T. Finding function in form: compositional character models for open vocabulary word representation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: Association for Computational Linguistics; 2015. p. 1520–30.

36.

Ling X, Singh S, Weld DS. Design challenges for entity linking. Trans Assoc Comput Linguist 2015;3: 315–28.CrossRef

37.

Ling X, Weld DS. Fine-grained entity recognition. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence. Toronto: AAAI Press; 2012. p. 94–100.

38.

Liu L, Shang J, Ren X, Xu FF, Gui H, Peng J, Han J. Empower sequence labeling with task-aware neural language model. Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press; 2018. p. 5253–60.

39.

Liu X, Zhang S, Wei F, Zhou M. Recognizing named entities in tweets. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics; 2011. p. 359–67.

40.

Llorens H, Derczynski L, Gaizauskas R, Saquete E. TIMEN: an open temporal expression normalisation resource. Proceedings of the 8th International Conference on Language Resources and Evaluation; 2012. p. 3044–51.

41.

Llorens H, Saquete E, Navarro B. TIPSem (english and spanish): evaluating CRFs and semantic roles in TempEval-2. Proceedings of the 5th International Workshop on Semantic Evaluation; 2010. p. 284–91.

42.

Luo G, Huang X, Lin C-Y, Nie Z. Joint named entity recognition and disambiguation. Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing; 2015 . p. 879–88.

43.

Ma X, Hovy E. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (volume 1: long papers). Berlin: Association for Computational Linguistics; 2016. p. 1064–74.

44.

Ma Y, Cambria E, Gao S. Label embedding for zero-shot fine-grained named entity typing. Proceedings of the 26th International Conference on Computational Linguistics; 2016. p. 171–80.

45.

Mani I, Verhagen M, Wellner B, Lee CM, Pustejovsky J. Machine learning of temporal relations. Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics; 2006. p. 753–60.

46.

Mani I, Wilson G. Robust temporal processing of news. Proceedings of the 38th annual meeting on association for computational linguistics; 2000. p. 69–76.

47.

Maynard D, Tablan V, Ursu C, Cunningham H, Wilks Y. Named entity recognition from diverse text types. Proceedings of 2001 Recent Advances in Natural Language Processing Conference; 2001. p. 257–74.

48.

Mazur P, Dale R. WikiWars: a new corpus for research on temporal expressions. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. MIT Stata Center: Association for Computational Linguistics; 2010. p. 913–22.

49.

McCallum A, Li W. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. Proceedings of the 7th Conference on Computational Natural Language Learning. Edmonton: Association for Computational Linguistics; 2003. p. 188–91.

50.

Miller S, Guinness J, Zamanian A. Name tagging with word clusters and discriminative training. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics; 2004.

51.

Nadeau D, Sekine S. A survey of named entity recognition and classification. Lingvisticae Investigationes 2007; 30(1):3–26.CrossRef

52.

Nakashole N, Tylenda T, Weikum G. Fine-grained semantic typing of emerging entities. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Sofia: Association for Computational Linguistics; 2013. p. 1488–97.

53.

Owoputi O, O’Connor B, Dyer C, Gimpel K, Schneider N, Smith NA. Improved part-of-speech tagging for online conversational text with word clusters. Proceedings of NAACL-HLT 2013; 2013. p. 380–90.

54.

Parker R, Graff D, Kong J, Chen K, Maeda K. 2011. Engilish gigaword, 5th edn.

55.

Peters ME, Ammar W, Bhagavatula C, Power R. Semi-supervised suquence tagging with bidirectional language models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics; 2017. p. 1756–65.

56.

Poibeau T, Kosseim L. Proper name extraction from non-journalistic texts. Lang Comput 2001;37:144–57.MATH

57.

Pradhan S, Moschitti A, Xue N, Ng HT, Bjorkelund A, Uryupina O, Zhang Y, Zhong Z. Towards robust linguistic analysis using OntoNotes. Proceedings of the 7th Conference on Computational Natural Language Learning. Sofia: Association for Computational Linguistics; 2013. p. 143–52.

58.

Pradhan SS, Hovy E, Marcus M, Palmer M, Ramshaw L, Weischedel R. Ontonotes: a unified relational semantic representation. Proceedings of the 2007 IEEE International Conference on Semantic Computing; 2007. p. 517–26.

59.

Pustejovsky J, Castano J, Ingria R, Sauri R, Gaizauskas R, Setzer A, Katz G, Radev D. TimeML: robust specification of event and temporal expressions in text. Direct Question Answer 2003;3:28–34.

60.

Pustejovsky J, Hanks P, Sauri R, See A, Gaizauskas R, Setzer A, Sundheim B, Radev D, Day D, Ferro L, Lazo M. The TIMEBANK corpus. Corpus Linguist 2003;2003:647–56.

61.

Pustejovsky J, Lee K, Bunt H, Romary L. ISO-TimeML: an international standard for semantic annotation. Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10); 2010. p. 394–7.

62.

Radford A, Narasimhan K, Salimans T, Sutskever I. 2018. Improving language understanding by generative pre-training.

63.

Ratinov L, Roth D. Design challenges and misconceptions in named entity recognition. Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Boulder: Association for Computational Linguistics; 2009 . p. 147–55.

64.

Ren X, He W, Qu M, Huang L, Ji H, Han J. AFET: automatic fine-grained entity typing by hierarchical partial-label embedding. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin: Association for Computational Linguistics; 2016. p. 1369–78.

65.

Ritter A, Clark S, Mausam, Etzioni O. Named entity recognition in tweets: an experimental study. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing; 2011. p. 1524–34.

66.

Sang EFTK, Meulder FD. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. Proceedings of the 7th Conference on Natural Language Learning; 2003. p. 142–7.

67.

Sang EFTK, Veenstra J. Representing text chunks. Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics; 1999. p. 173–9.

68.

Santos CND, Guimaraes V. Boosting named entity recognition with neural character embeddings. Proceedings of the 5th Named Entities Workshop. Beijing: Association for Computational Linguistics; 2015. p. 25–33.

69.

Silva JFD, Kozareva Z, Lopes JGP. Cluster analysis and classification of named entities. Proceedings of the 4th International Conference on Language Resources and Evaluation. Lisbon: European Language Resources Association; 2004. p. 321–4.

70.

Steedman M. 1996. Surface structure and interpretation. The MIT Press.

71.

Strötgen J, Gertz M. HeidelTime: high quality rule-based extraction and normalization of temporal expressions. Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval’10). Stroudsburg: Association for Computational Linguistics; 2010. p. 321–4.

72.

Strubell E, Verga P, Belanger D, McCallum A. Fast and accurate entity recognition with iterated dilated convolutions. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen: Association for Computational Linguistics; 2017. p. 2670–80.

73.

Takeuchi K, Collier N. Bio-medical entity extraction using support vector machines. Artif Intell Med 2005; 33(2):125– 37.CrossRef

74.

UzZaman N, Allen JF. TRIPS and TRIOS system for TempEval-2: Extracting temporal information from text. Proceedings of the 5th International Workshop on Semantic Evaluation; 2010 . p. 276–83.

75.

UzZaman N, Llorens H, Derczynski L, Verhagen M, Allen J, Pustejovsky J. SemEval-2013 task 1: TempEval-3: Evaluating time expressions, events, and temporal relations. Proceedings of the 7th International Workshop on Semantic Evaluation; 2013. p. 1–9.

76.

Verhagen M, Gaizauskas R, Schilder F, Hepple M, Katz G, Pustejovsky J. SemEval-2007 task 15: TempEval temporal relation identification. Proceedings of the 4th International Workshop on Semantic Evaluation; 2007. p. 75–80.

77.

Verhagen M, Mani I, Sauri R, Knippen R, Jang SB, Littman J, Rumshisky A, Phillips J, Pustejovsky J. Automating temporal annotation with TARQI. Proceedings of the ACL Interactive Poster and Demonstration Sessions. Ann Arbor: Association for Computational Linguistics; 2005. p. 81–4.

78.

Verhagen M, Sauri R, Caselli T, Pustejovsky J. SemEval-2010 task 13: TempEval-2. Proceedings of the 5th International Workshop on Semantic Evaluation; 2010. p. 57–62.

79.

Wang L-J, Li W-C, Chang C-H. Recognizing unregistered names for mandarin word identification. Proceedings of the 14th Conference on Computational Linguistics; 1992. p. 1239–43.

80.

Wong K-F, Xia Y, Li W, Yuan C. An overview of temporal information extraction. Int J Comput Process Oriental Lang 2005;18(2):137–52.CrossRef

81.

Zhong X, Cambria E. Time expression recognition using a constituent-based tagging scheme. Proceedings of the 2018 World Wide Web Conference. Lyon: Association for Computing Machinery; 2018. p. 983–92.

82.

Zhong X, Sun A, Cambria E. Time expression analysis and recognition using syntactic token types and general heuristic rules. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics; 2017. p. 420–9.

Titel: Extracting Time Expressions and Named Entities with Constituent-Based Tagging Schemes
verfasst von: Xiaoshi Zhong
Erik Cambria
Amir Hussain
Publikationsdatum: 10.05.2020
Verlag: Springer US
Erschienen in: Cognitive Computation / Ausgabe 4/2020
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI: https://doi.org/10.1007/s12559-020-09714-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 4/2020

Autism AI: a New Autism Screening System Based on Artificial Intelligence

Shaping Emotions in Negotiation: a Nash Bargaining Solution

Commentary on “On Intuitionistic Fuzzy Copula Aggregation Operators in Multiple-Attribute Decision Making”

Cognitive Template-Clustering Improved LineMod for Efficient Multi-object Pose Estimation

A Review of Shorthand Systems: From Brachygraphy to Microtext and Beyond

An Effective Semi-fragile Watermarking Method for Image Authentication Based on Lifting Wavelet Transform and Feed-Forward Neural Network

Premium Partner