Skip to main content
Top
Published in: KI - Künstliche Intelligenz 1/2018

06-10-2017 | Technical Contribution

A Quality Evaluation of Combined Search on a Knowledge Base and Text

Authors: Hannah Bast, Björn Buchhold, Elmar Haussmann

Published in: KI - Künstliche Intelligenz | Issue 1/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We provide a quality evaluation of KB+Text search, a deep integration of knowledge base search and standard full-text search. A knowledge base (KB) is a set of subject–predicate–object triples with a common naming scheme. The standard query language is SPARQL, where queries are essentially lists of triples with variables. KB+Text search extends this by a special occurs-with predicate, which can be used to express the co-occurrence of words in the text with mentions of entities from the knowledge base. Both pure KB search and standard full-text search are included as special cases. We evaluate the result quality of KB+Text search on three different query sets. The corpus is the full version of the English Wikipedia (2.4 billion word occurrences) combined with the YAGO knowledge base (26 million triples). We provide a web application to reproduce our evaluation, which is accessible via http://​ad.​informatik.​uni-freiburg.​de/​publications.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

KI - Künstliche Intelligenz

The Scientific journal "KI – Künstliche Intelligenz" is the official journal of the division for artificial intelligence within the "Gesellschaft für Informatik e.V." (GI) – the German Informatics Society - with constributions from troughout the field of artificial intelligence.

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Show more products
Footnotes
3
The choice of this outdated version has no significant impact on the insights from our evaluation: the corresponding Wikipedia data from 2017 is (only) about 50% larger but otherwise has the same characteristics and would not lead to principally different results.
 
4
There is a more recent version, called YAGO2, but most of the additions from YAGO to YAGO2 (spatial and temporal information) are not interesting for our search.
 
6
For the TREC benchmark even the number of false-negatives decreases. This is because when segmenting into contexts the document parser pre-processes Wikipedia lists by appending each list item to the preceding sentence. These are the only types of contexts that cross sentence boundaries and a rare exception. For the Wikipedia list benchmark we verified that this technique does not include results from the lists from which we created the ground truth.
 
7
This means that the words occur in the context, but with a meaning different from what was intended by the query.
 
8
The sentence parses are required to compute contexts.
 
Literature
1.
go back to reference Bast H, Bäurle F, Buchhold B, Haußmann E (2014) Semantic full-text search with broccoli. In: SIGIR, ACM, pp 1265–1266 Bast H, Bäurle F, Buchhold B, Haußmann E (2014) Semantic full-text search with broccoli. In: SIGIR, ACM, pp 1265–1266
2.
go back to reference Mihalcea R, Csomai A (2007) Wikify! Linking documents to encyclopedic knowledge. In: CIKM, pp 233–242 Mihalcea R, Csomai A (2007) Wikify! Linking documents to encyclopedic knowledge. In: CIKM, pp 233–242
3.
go back to reference Bast H, Haussmann E (2013) Open information extraction via contextual sentence decomposition. In: ICSC Bast H, Haussmann E (2013) Open information extraction via contextual sentence decomposition. In: ICSC
4.
go back to reference Bast H, Buchhold B (2013) An index for efficient semantic full-text search. In: CIKM Bast H, Buchhold B (2013) An index for efficient semantic full-text search. In: CIKM
6.
go back to reference Balog K, de Vries AP, Serdyukov P, Thomas P, Westerveld T (2009) Overview of the TREC 2009 entity track. In: TREC Balog K, de Vries AP, Serdyukov P, Thomas P, Westerveld T (2009) Overview of the TREC 2009 entity track. In: TREC
7.
go back to reference Bron M, Balog K, de Rijke M (2010) Ranking related entities: components and analyses. In: CIKM, pp 1079–1088 Bron M, Balog K, de Rijke M (2010) Ranking related entities: components and analyses. In: CIKM, pp 1079–1088
8.
go back to reference Balog K, Serdyukov P, de Vries AP (2010) Overview of the TREC 2010 entity track. In: TREC Balog K, Serdyukov P, de Vries AP (2010) Overview of the TREC 2010 entity track. In: TREC
9.
go back to reference Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) Dbpedia—a large-scale, multilingual knowledge base extracted from wikipedia. Sem Web 6(2):167–195 Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) Dbpedia—a large-scale, multilingual knowledge base extracted from wikipedia. Sem Web 6(2):167–195
10.
go back to reference Balog K, Serdyukov P, de Vries AP (2011) Overview of the TREC 2011 entity track. In: TREC Balog K, Serdyukov P, de Vries AP (2011) Overview of the TREC 2011 entity track. In: TREC
11.
go back to reference Campinas S, Ceccarelli D, Perry TE, Delbru R, Balog K, Tummarello G (2011) The sindice-2011 dataset for entity-oriented search in the web of data. In: Workshop on entity-oriented search (EOS), pp 26–32 Campinas S, Ceccarelli D, Perry TE, Delbru R, Balog K, Tummarello G (2011) The sindice-2011 dataset for entity-oriented search in the web of data. In: Workshop on entity-oriented search (EOS), pp 26–32
12.
go back to reference Halpin H, Herzig DM, Mika P, Blanco R, Pound J, Thompson HS, Tran DT (2010) Evaluating ad-hoc object retrieval. In: Workshop on evaluation of semantic technologies (WEST) Halpin H, Herzig DM, Mika P, Blanco R, Pound J, Thompson HS, Tran DT (2010) Evaluating ad-hoc object retrieval. In: Workshop on evaluation of semantic technologies (WEST)
13.
go back to reference Blanco R, Halpin H, Herzig DM, Mika P, Pound J, Thompson HS, Duc TT (2011) Entity search evaluation over structured web data. In: SIGIR workshop on entity-oriented search (JIWES) Blanco R, Halpin H, Herzig DM, Mika P, Pound J, Thompson HS, Duc TT (2011) Entity search evaluation over structured web data. In: SIGIR workshop on entity-oriented search (JIWES)
14.
go back to reference Dang HT, Kelly D, Lin JJ (2007) Overview of the TREC 2007 question answering track. In: TREC Dang HT, Kelly D, Lin JJ (2007) Overview of the TREC 2007 question answering track. In: TREC
15.
go back to reference Lopez V, Unger C, Cimiano P, Motta E (2013) Evaluating question answering over linked data. J Web Sem 21:3–13CrossRef Lopez V, Unger C, Cimiano P, Motta E (2013) Evaluating question answering over linked data. J Web Sem 21:3–13CrossRef
16.
go back to reference Cimiano P, Lopez V, Unger C, Cabrio E, Ngomo ACN, Walter S (2013) Multilingual question answering over linked data (QALD-3): lab overview. In: CLEF, pp 321–332 Cimiano P, Lopez V, Unger C, Cabrio E, Ngomo ACN, Walter S (2013) Multilingual question answering over linked data (QALD-3): lab overview. In: CLEF, pp 321–332
17.
go back to reference Unger C, Forascu C, López V, Ngomo AN, Cabrio E, Cimiano P, Walter S (2014) Question answering over linked data (QALD-4). In: Working notes for CLEF 2014 conference, Sheffield, 15–18 Sept 2014, pp 1172–1180 Unger C, Forascu C, López V, Ngomo AN, Cabrio E, Cimiano P, Walter S (2014) Question answering over linked data (QALD-4). In: Working notes for CLEF 2014 conference, Sheffield, 15–18 Sept 2014, pp 1172–1180
18.
go back to reference Unger C, Forascu C, López V, Ngomo AN, Cabrio E, Cimiano P, Walter S (2015) Question answering over linked data (QALD-5). In: Working notes of CLEF 2015—conference and labs of the evaluation forum, Toulouse, 8–11 Sept 2015 Unger C, Forascu C, López V, Ngomo AN, Cabrio E, Cimiano P, Walter S (2015) Question answering over linked data (QALD-5). In: Working notes of CLEF 2015—conference and labs of the evaluation forum, Toulouse, 8–11 Sept 2015
19.
go back to reference Bast H, Chitea A, Suchanek FM, Weber I (2007) Ester: efficient search on text, entities, and relations. In: SIGIR, pp 671–678 Bast H, Chitea A, Suchanek FM, Weber I (2007) Ester: efficient search on text, entities, and relations. In: SIGIR, pp 671–678
20.
go back to reference Bhagdev R, Chapman S, Ciravegna F, Lanfranchi V, Petrelli D (2008) Hybrid search: effectively combining keywords and semantic searches. In: ESWC, pp 554–568 Bhagdev R, Chapman S, Ciravegna F, Lanfranchi V, Petrelli D (2008) Hybrid search: effectively combining keywords and semantic searches. In: ESWC, pp 554–568
21.
go back to reference Tablan V, Bontcheva K, Roberts I, Cunningham H (2015) Mímir: an open-source semantic search framework for interactive information seeking and discovery. J Web Sem 30:52–68CrossRef Tablan V, Bontcheva K, Roberts I, Cunningham H (2015) Mímir: an open-source semantic search framework for interactive information seeking and discovery. J Web Sem 30:52–68CrossRef
22.
go back to reference Wang H, Liu Q, Penin T, Fu L, Zhang L, Tran T, Yu Y, Pan Y (2009) Semplore: a scalable IR approach to search the web of data. J Web Sem 7(3):177–188CrossRef Wang H, Liu Q, Penin T, Fu L, Zhang L, Tran T, Yu Y, Pan Y (2009) Semplore: a scalable IR approach to search the web of data. J Web Sem 7(3):177–188CrossRef
23.
go back to reference Giunchiglia F, Kharkevich U, Zaihrayeu I (2009) Concept search. In: ESWC, pp 429–444 Giunchiglia F, Kharkevich U, Zaihrayeu I (2009) Concept search. In: ESWC, pp 429–444
24.
go back to reference Tran T, Mika P, Wang H, Grobelnik M (2011) SemSearch’11: the 4th semantic search workshop. In: WWW (companion volume) Tran T, Mika P, Wang H, Grobelnik M (2011) SemSearch’11: the 4th semantic search workshop. In: WWW (companion volume)
25.
go back to reference Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp 1247–1250 Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp 1247–1250
26.
go back to reference Sanderson M (2010) Test collection based evaluation of information retrieval systems. Found Trends Inf Retr 4(4):247–375CrossRefMATH Sanderson M (2010) Test collection based evaluation of information retrieval systems. Found Trends Inf Retr 4(4):247–375CrossRefMATH
Metadata
Title
A Quality Evaluation of Combined Search on a Knowledge Base and Text
Authors
Hannah Bast
Björn Buchhold
Elmar Haussmann
Publication date
06-10-2017
Publisher
Springer Berlin Heidelberg
Published in
KI - Künstliche Intelligenz / Issue 1/2018
Print ISSN: 0933-1875
Electronic ISSN: 1610-1987
DOI
https://doi.org/10.1007/s13218-017-0513-9

Other articles of this Issue 1/2018

KI - Künstliche Intelligenz 1/2018 Go to the issue

Editorial

Editorial

Technical Contribution

Big Data Science

Doctoral and Postdoctoral Dissertations

Randomized Primitives for Big Data Processing

Premium Partner