nach oben

Erschienen in:

2018 | OriginalPaper | Buchkapitel

Enriching Knowledge Bases with Counting Quantifiers

verfasst von : Paramita Mirza, Simon Razniewski, Fariz Darari, Gerhard Weikum

Erschienen in: The Semantic Web – ISWC 2018

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Information extraction traditionally focuses on extracting relations between identifiable entities, such as \(\langle \)Monterey, locatedIn, California\(\rangle \). Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, “California is divided into 58 counties”. Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work.

This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Ontology Driven Extraction of Research Processes

Nächstes Kapitel QA4IE: A Question Answering Based Framework for Information Extraction

https://github.com/paramitamirza/CINEX.

http://phrontistery.info/numbers.html.

Both in their version as of March 20, 2017.

Properties having the constraint https://www.wikidata.org/wiki/Q19474404.

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52CrossRef

Brin, S.: Extracting patterns and relations from the World Wide Web. In: WebDB (1998)

Craven, M., Kumlien, J., et al.: Constructing biological knowledge bases by extracting information from text sources. In: ISMB (1999)

Dang, H.T., Kelly, D., Lin, J.J.: Overview of the TREC 2007 question answering track. TREC 7, 63 (2007)

Darari, F., Nutt, W., Pirrò, G., Razniewski, S.: Completeness statements about RDF data sources and their use for query answering. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 66–83. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_5CrossRef

Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW (2013)

Denecker, M., Cortés-Calabuig, A., Bruynooghe, M., Arieli, O.: Towards a logical reconstruction of a theory for locally closed databases. ACM Trans. Database Syst. 35(3) (2010)CrossRef

Dong, X.L., et al.: From data fusion to knowledge fusion. PVLDB 7(10), 881–892 (2014)

Dong, X.L., et al.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD (2014)

10.

Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24(6), 707–730 (2015)CrossRef

11.

Ibrahim, Y., Riedewald, M., Weikum, G.: Making sense of entities and quantities in web tables. In: CIKM (2016)

12.

Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)

13.

Koch, M., Gilmer, J., Soderland, S., Weld, D.S.: Type-aware distantly supervised relation extraction with linked arguments. In: EMNLP (2014)

14.

Kudo, T.: CRF++: Yet another CRF toolkit (2005). https://sourceforge.net/projects/crfpp/

15.

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL (2016)

16.

Ling, X., Weld, D.S.: Temporal information extraction. In: AAAI (2010)

17.

Madaan, A., Mittal, A., Mausam, G.R., Ramakrishnan, G., Sarawagi, S.: Numerical relation extraction with minimal supervision. In: AAAI (2016)

18.

Mausam: Open information extraction systems and downstream applications. In: IJCAI (2016)

19.

Mausam, Schmitz, M., Soderland, S., Bart, R., Etzioni, O.: Open language learning for information extraction. In: EMNLP (2012)

20.

Min, B., Grishman, R., Wan, L., Wang, C., Gondek, D.: Distant supervision for relation extraction with an incomplete knowledge base. In: HLT-NAACL (2013)

21.

Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL/IJCNLP (2009)

22.

Mirza, P., Razniewski, S., Darari, F., Weikum, G.: Cardinal virtues: extracting relation cardinalities from text. In: ACL 2017 (Short Papers) (2017)

23.

Mitchell, T.M., et al.: Never-ending learning. In: AAAI (2015)

24.

Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level semantic labelling of numerical values. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 428–445. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_26CrossRef

25.

Palomares, T., Ahres, Y., Kangaspunta, J., Ré, C.: Wikipedia knowledge graph with DeepDive. In: ICWSM (2016)

26.

Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP (2014)

27.

Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_10CrossRef

28.

Saha, S., Pal, H., Mausam: Bootstrapping for numerical open IE. In: ACL (2017)

29.

Speer, R., Havasi, C.: Representing general relational knowledge in ConceptNet 5. In: LREC (2012)

30.

Strötgen, J., Gertz, M.: Heideltime: high quality rule-based extraction and normalization of temporal expressions. In: SemEval Workshop (2010)

31.

Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW (2007)

32.

Suchanek, F.M., Sozio, M., Weikum, G.: SOFIE: a self-organizing framework for information extraction. In: WWW (2009)

33.

Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: ACL (2012)

34.

Tan, C.H., Agichtein, E., Ipeirotis, P., Gabrilovich, E.: Trust, but verify: predicting contribution quality for knowledge base construction and curation. In: WSDM (2014)

35.

Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. In: CACM (2014)

Titel: Enriching Knowledge Bases with Counting Quantifiers
verfasst von: Paramita Mirza
Simon Razniewski
Fariz Darari
Gerhard Weikum
Verlag: Springer International Publishing
Buch: The Semantic Web – ISWC 2018
Print ISBN: 978-3-030-00670-9

Electronic ISBN: 978-3-030-00671-6

Copyright-Jahr: 2018
DOI: https://doi.org/10.1007/978-3-030-00671-6_11

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner