Skip to main content
Top

2016 | OriginalPaper | Chapter

Knowledge Extraction for Information Retrieval

Authors : Francesco Corcoglioniti , Mauro Dragoni, Marco Rospocher, Alessio Palmero Aprosio

Published in: The Semantic Web. Latest Advances and New Domains

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Document retrieval is the task of returning relevant textual resources for a given user query. In this paper, we investigate whether the semantic analysis of the query and the documents, obtained exploiting state-of-the-art Natural Language Processing techniques (e.g., Entity Linking, Frame Detection) and Semantic Web resources (e.g., YAGO, DBpedia), can improve the performances of the traditional term-based similarity approach. Our experiments, conducted on a recently released document collection, show that Mean Average Precision (MAP) increases of 3.5 % points when combining textual and semantic analysis, thus suggesting that semantic content can effectively improve the performances of Information Retrieval systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
6
We report in this section the main NLP/KE tasks for the extraction of semantic terms. Some of them typically build on additional NLP analyses, such as Tokenization, Part-of-Speech tagging, Dependency Parsing and Constituency Parsing.
 
7
Given the lack of normalization in sim(dq), our scheme can be roughly classified as ltn.ntn using the SMART notation; see http://​bit.​ly/​weighting_​schemes [22].
 
9
Subset of FrameBase ontology used in PIKES. Mapping-based properties with xsd:date, xsd:dateTime, xsd:gYear, and xsd:gYearMonth objects, YAGO types and type hierarchy from DBpedia 2015-04. All data available on KE4IR website.
 
16
To give an idea, the impact of each semantic layer on the whole processing time for the document collection of Sect. 5 is: uri (3.5 %), type (16.3 %), time (2.9 %), frame (77.3 %). Note also that substantial improvements of KE4IR indexing throughput can be achieved with further engineering and optimization, out-of-scope here.
 
17
For comparison, on KE4IR website we make available for download an instance of SOLR (a popular search engine based on Lucene) indexing the same document collection used in our evaluation, and we report on its performances on the test queries.
 
Literature
1.
go back to reference Gangemi, A., Draicchio, F., Presutti, V., Nuzzolese, A.G., Recupero, D.R.: A machine reader for the semantic web. In: Demos of ISWC, pp. 149–152 (2013) Gangemi, A., Draicchio, F., Presutti, V., Nuzzolese, A.G., Recupero, D.R.: A machine reader for the semantic web. In: Demos of ISWC, pp. 149–152 (2013)
2.
go back to reference Rospocher, M., van Erp, M., Vossen, P., Fokkens, A., Aldabe, I., Rigau, G., Soroa, A., Ploeger, T., Bogaard, T.: Building event-centric knowledge graphs from news. J. Web Semant. (to appear) Rospocher, M., van Erp, M., Vossen, P., Fokkens, A., Aldabe, I., Rigau, G., Soroa, A., Ploeger, T., Bogaard, T.: Building event-centric knowledge graphs from news. J. Web Semant. (to appear)
3.
go back to reference Corcoglioniti, F., Rospocher, M., Palmero Aprosio, A.: A 2-phase frame-based knowledge extraction framework. In: Proceedings of ACM Symposium on Applied Computing (SAC 2016) (2016, to appear) Corcoglioniti, F., Rospocher, M., Palmero Aprosio, A.: A 2-phase frame-based knowledge extraction framework. In: Proceedings of ACM Symposium on Applied Computing (SAC 2016) (2016, to appear)
4.
go back to reference Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015) Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2), 167–195 (2015)
5.
go back to reference Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefMATH Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefMATH
6.
go back to reference Waitelonis, J., Exeler, C., Sack, H.: Linked data enabled generalized vector space model to improve document retrieval. In: Proceedings of NLP & DBpedia 2015 Workshop in Conjunction with 14th International Semantic Web Conference (ISWC 2015). CEUR Workshop Proceedings (2015) Waitelonis, J., Exeler, C., Sack, H.: Linked data enabled generalized vector space model to improve document retrieval. In: Proceedings of NLP & DBpedia 2015 Workshop in Conjunction with 14th International Semantic Web Conference (ISWC 2015). CEUR Workshop Proceedings (2015)
7.
go back to reference Croft, W.B.: User-specified domain knowledge for document retrieval. In: Bernardi, L.R., Rabitti, F. (eds.) SIGIR, pp. 201–206. ACM (1986) Croft, W.B.: User-specified domain knowledge for document retrieval. In: Bernardi, L.R., Rabitti, F. (eds.) SIGIR, pp. 201–206. ACM (1986)
8.
go back to reference Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. CoRR (1998) Gonzalo, J., Verdejo, F., Chugur, I., Cigarrán, J.: Indexing with WordNet synsets can improve text retrieval. CoRR (1998)
9.
go back to reference Fellbaum, C. (ed.): WordNet: An Electonic Lexical Database. MIT Press, Cambridge (1998)MATH Fellbaum, C. (ed.): WordNet: An Electonic Lexical Database. MIT Press, Cambridge (1998)MATH
10.
go back to reference Dridi, O.: Ontology-based information retrieval: overview and new proposition. In: RCIS, pp. 421–426 (2008) Dridi, O.: Ontology-based information retrieval: overview and new proposition. In: RCIS, pp. 421–426 (2008)
11.
go back to reference Tomassen, S.L.: Research on ontology-driven information retrieval. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM 2006 Workshops. LNCS, vol. 4278, pp. 1460–1468. Springer, Heidelberg (2006)CrossRef Tomassen, S.L.: Research on ontology-driven information retrieval. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM 2006 Workshops. LNCS, vol. 4278, pp. 1460–1468. Springer, Heidelberg (2006)CrossRef
12.
go back to reference Castells, P., Fernández, M., Vallet, D.: An adaptation of the vector-space model for ontology-based information retrieval. IEEE Trans. Knowl. Data Eng. 19(2), 261–272 (2007)CrossRef Castells, P., Fernández, M., Vallet, D.: An adaptation of the vector-space model for ontology-based information retrieval. IEEE Trans. Knowl. Data Eng. 19(2), 261–272 (2007)CrossRef
13.
go back to reference Vallet, D., Fernández, M., Castells, P.: An ontology-based information retrieval model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005)CrossRef Vallet, D., Fernández, M., Castells, P.: An ontology-based information retrieval model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005)CrossRef
14.
go back to reference Jimeno-Yepes, A., Llavori, R.B., Rebholz-Schuhmann, D.: Ontology refinement for improved information retrieval. Inf. Process. Manage. 46(4), 426–435 (2010)CrossRef Jimeno-Yepes, A., Llavori, R.B., Rebholz-Schuhmann, D.: Ontology refinement for improved information retrieval. Inf. Process. Manage. 46(4), 426–435 (2010)CrossRef
15.
go back to reference Fernández, M., Cantador, I., Lopez, V., Vallet, D., Castells, P., Motta, E.: Semantically enhanced information retrieval: an ontology-based approach. J. Web Sem. 9(4), 434–452 (2011)CrossRef Fernández, M., Cantador, I., Lopez, V., Vallet, D., Castells, P., Motta, E.: Semantically enhanced information retrieval: an ontology-based approach. J. Web Sem. 9(4), 434–452 (2011)CrossRef
16.
go back to reference Spink, A., Jansen, B., Blakely, C., Koshman, S.: A study of results overlap and uniqueness among major web search engines. Inf. Process. Manage. 42(5), 1379–1391 (2006)CrossRef Spink, A., Jansen, B., Blakely, C., Koshman, S.: A study of results overlap and uniqueness among major web search engines. Inf. Process. Manage. 42(5), 1379–1391 (2006)CrossRef
17.
go back to reference Stojanovic, N.: An approach for defining relevance in the ontology-based information retrieval. In: Web Intelligence, pp. 359–365 (2005) Stojanovic, N.: An approach for defining relevance in the ontology-based information retrieval. In: Web Intelligence, pp. 359–365 (2005)
18.
go back to reference Baziz, M., Boughanem, M., Pasi, G., Prade, H.: An information retrieval driven by ontology: from query to document expansion. In: RIAO (2007) Baziz, M., Boughanem, M., Pasi, G., Prade, H.: An information retrieval driven by ontology: from query to document expansion. In: RIAO (2007)
19.
go back to reference Rouces, J., de Melo, G., Hose, K.: FrameBase: representing n-ary relations using semantic frames. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 505–521. Springer, Heidelberg (2015)CrossRef Rouces, J., de Melo, G., Hose, K.: FrameBase: representing n-ary relations using semantic frames. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 505–521. Springer, Heidelberg (2015)CrossRef
20.
go back to reference Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefMATH Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefMATH
21.
go back to reference da Costa Pereira, C., Dragoni, M., Pasi, G.: Multidimensional relevance: prioritized aggregation in a personalized information retrieval setting. Inf. Process. Manage. 48(2), 340–357 (2012)CrossRef da Costa Pereira, C., Dragoni, M., Pasi, G.: Multidimensional relevance: prioritized aggregation in a personalized information retrieval setting. Inf. Process. Manage. 48(2), 340–357 (2012)CrossRef
22.
go back to reference Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schütze, H., et al.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)CrossRefMATH
23.
go back to reference Corcoglioniti, F., Rospocher, M., Mostarda, M., Amadori, M.: Processing billions of RDF triples on a single machine using streaming and sorting. In: ACM SAC, pp. 368–375 (2015) Corcoglioniti, F., Rospocher, M., Mostarda, M., Amadori, M.: Processing billions of RDF triples on a single machine using streaming and sorting. In: ACM SAC, pp. 368–375 (2015)
24.
go back to reference Voorhees, E., Harman, D.: Overview of the sixth text retrieval conference (trec-6). In: TREC, pp. 1–24 (1997) Voorhees, E., Harman, D.: Overview of the sixth text retrieval conference (trec-6). In: TREC, pp. 1–24 (1997)
25.
go back to reference Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)CrossRef Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)CrossRef
26.
go back to reference Sanderson, M., Zobel, J.: Information retrieval system evaluation: effort, sensitivity, and reliability. In: SIGIR, pp. 162–169. ACM (2005) Sanderson, M., Zobel, J.: Information retrieval system evaluation: effort, sensitivity, and reliability. In: SIGIR, pp. 162–169. ACM (2005)
27.
go back to reference Noreen, E.W.: Computer-Intensive Methods for Testing Hypotheses: An Introduction. Wiley, New York (1989) Noreen, E.W.: Computer-Intensive Methods for Testing Hypotheses: An Introduction. Wiley, New York (1989)
28.
go back to reference Abdelali, A., Cowie, J., Soliman, H.: Improving query precision using semantic expansion. Inf. Process. Manage. 43(3), 705–716 (2007)CrossRef Abdelali, A., Cowie, J., Soliman, H.: Improving query precision using semantic expansion. Inf. Process. Manage. 43(3), 705–716 (2007)CrossRef
Metadata
Title
Knowledge Extraction for Information Retrieval
Authors
Francesco Corcoglioniti
Mauro Dragoni
Marco Rospocher
Alessio Palmero Aprosio
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-34129-3_20