Skip to main content
Top

2016 | OriginalPaper | Chapter

Entity-Based Keyword Search in Web Documents

Authors : Enrico Sartori, Yannis Velegrakis, Francesco Guerra

Published in: Transactions on Computational Collective Intelligence XXI

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In document search, documents are typically seen as a flat list of keywords. To deal with the syntactic interoperability, i.e., the use of different keywords to refer to the same real world entity, entity linkage has been used to replace keywords in the text with a unique identifier of the entity to which they are referring. Yet, the flat list of entities fails to capture the actual relationships that exist among the entities, information that is significant for a more effective document search. In this work we propose to go one step further from entity linkage in text, and model the documents as a set of structures that describe relationships among the entities mentioned in the text. We show that this kind of representation is significantly improving the effectiveness of document search. We describe the details of the implementation of the above idea and we present an extensive set of experimental results that prove our point.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
The meaning of a “noun phrase” is the one used in linguistics.
 
2
Note that a “raw document” is what we defined as the document that the user provided, while a “document” is a set of statements containing identifiers and verbs.
 
Literature
1.
go back to reference Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, S.: Banks: browsing and keyword searching in relational databases. In: VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, Hong Kong, China, 20-23 August 2002, pp. 1083–1086 (2002) Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, S.: Banks: browsing and keyword searching in relational databases. In: VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, Hong Kong, China, 20-23 August 2002, pp. 1083–1086 (2002)
2.
go back to reference Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February - 1 March 2002, pp. 5–16 (2002) Agrawal, S., Chaudhuri, S., Das, G.: Dbxplorer: A system for keyword-based search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February - 1 March 2002, pp. 5–16 (2002)
3.
go back to reference Ando, R.K., Lee, L.: Iterative residual rescaling: an analysis and generalization of lsi. In: Proceedings of the 24st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2001) Ando, R.K., Lee, L.: Iterative residual rescaling: an analysis and generalization of lsi. In: Proceedings of the 24st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2001)
4.
go back to reference Arguello, J., Elsas, J.L., Callan, J., Carbonell, J.G.: Document representation and query expansion models for blog recommendation. In: Association for the Advancement of Artificial Intelligence Conference (2008) Arguello, J., Elsas, J.L., Callan, J., Carbonell, J.G.: Document representation and query expansion models for blog recommendation. In: Association for the Advancement of Artificial Intelligence Conference (2008)
5.
go back to reference Bergamaschi, S., Guerra, F., Interlandi, M., Trillo-Lado, R., Velegrakis, Y.: Combining user and database perspective for solving keyword queries over relational databases. Inf. Syst. 2016(55), 1–19 (2016)CrossRef Bergamaschi, S., Guerra, F., Interlandi, M., Trillo-Lado, R., Velegrakis, Y.: Combining user and database perspective for solving keyword queries over relational databases. Inf. Syst. 2016(55), 1–19 (2016)CrossRef
6.
go back to reference Bergamaschi, S., Domnori, E., Guerra, F., Trillo-Lado, R., Velegrakis, Y.: Keyword search over relational databases: a metadata approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 12-16 June 2011, pp. 565–576 (2011) Bergamaschi, S., Domnori, E., Guerra, F., Trillo-Lado, R., Velegrakis, Y.: Keyword search over relational databases: a metadata approach. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, 12-16 June 2011, pp. 565–576 (2011)
7.
go back to reference Bouquet, P., Stoermer, H., Niederee, C., Mana, A.: Entity name system: The backbone of an open and scalable web of data. In: Proceedings of the IEEE International Conference on Semantic Computing, pp. 554–561 (2008) Bouquet, P., Stoermer, H., Niederee, C., Mana, A.: Entity name system: The backbone of an open and scalable web of data. In: Proceedings of the IEEE International Conference on Semantic Computing, pp. 554–561 (2008)
9.
go back to reference Cao, T.H., Tang, T.M., Chau, C.K.: Text clustering with named entities: a model, experimentation and realization. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. ISRL, vol. 23, pp. 267–287. Springer, Heidelberg (2012)CrossRef Cao, T.H., Tang, T.M., Chau, C.K.: Text clustering with named entities: a model, experimentation and realization. In: Holmes, D.E., Jain, L.C. (eds.) Data Mining: Foundations and Intelligent Paradigms. ISRL, vol. 23, pp. 267–287. Springer, Heidelberg (2012)CrossRef
10.
go back to reference Caputo, A., Basile, P., Semerato, G.: Integrating named entities in a semantic search engine. In: Proceedings of the 1st Italian Information Retrieval Workshop (2010) Caputo, A., Basile, P., Semerato, G.: Integrating named entities in a semantic search engine. In: Proceedings of the 1st Italian Information Retrieval Workshop (2010)
11.
go back to reference Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: implementing the semantic web recommendations. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters (2004) Carroll, J.J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., Wilkinson, K.: Jena: implementing the semantic web recommendations. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters (2004)
12.
go back to reference Guo, J., Xu, G., Cheng, X., Li, H.: Named entity recognition in query. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, 19-23 July 2009, pp. 267–274 (2009). http://doi.acm.org/10.1145/1571941.1571989 Guo, J., Xu, G., Cheng, X., Li, H.: Named entity recognition in query. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, 19-23 July 2009, pp. 267–274 (2009). http://​doi.​acm.​org/​10.​1145/​1571941.​1571989
13.
go back to reference Hensman, S.: Construction of conceptual graph representation of texts. In: HLT-SRWS 2004 Proceedings of the Student Research Workshop at HLT-NAACL (2004) Hensman, S.: Construction of conceptual graph representation of texts. In: HLT-SRWS 2004 Proceedings of the Student Research Workshop at HLT-NAACL (2004)
14.
go back to reference Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, Hong Kong, China, 20-23 August 2002, pp. 670–681 (2002) Hristidis, V., Papakonstantinou, Y.: Discover: keyword search in relational databases. In: VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, Hong Kong, China, 20-23 August 2002, pp. 670–681 (2002)
15.
go back to reference Ioannou, E., Nejdl, W., Niedere, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. Proc. VLDB Endowment 3, 429–438 (2010)CrossRef Ioannou, E., Nejdl, W., Niedere, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. Proc. VLDB Endowment 3, 429–438 (2010)CrossRef
16.
go back to reference Ioannou, E., Rassadko, N., Velegrakis, Y.: On generating benchmark data for entity matching. J. Data Semant. 2(1), 37–56 (2013)CrossRef Ioannou, E., Rassadko, N., Velegrakis, Y.: On generating benchmark data for entity matching. J. Data Semant. 2(1), 37–56 (2013)CrossRef
17.
go back to reference Leskovec, J., Grobelnik, M., Milic-Frayling, N.: Learning sub-structures of document semantic graphs for document summarization. In: Workshop on Link Analysis and Group Detection (LinkKDD) (2004) Leskovec, J., Grobelnik, M., Milic-Frayling, N.: Learning sub-structures of document semantic graphs for document summarization. In: Workshop on Link Analysis and Group Detection (LinkKDD) (2004)
19.
go back to reference Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: top-k keyword query in relational databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, 12-14 June 2007, pp. 115–126. ACM (2007) Luo, Y., Lin, X., Wang, W., Zhou, X.: Spark: top-k keyword query in relational databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Beijing, China, 12-14 June 2007, pp. 115–126. ACM (2007)
20.
go back to reference Mihalcea, R., Moldovan, D.: Document indexing using named entities (2001) Mihalcea, R., Moldovan, D.: Document indexing using named entities (2001)
21.
go back to reference Mihalcea, R., Moldovan, D.I.: Document indexing using named entities. In: Studies in Informatics and Control (2001) Mihalcea, R., Moldovan, D.I.: Document indexing using named entities. In: Studies in Informatics and Control (2001)
22.
go back to reference Mottin, D., Lissandrini, M., Velegrakis, Y., Palpanas, T.: Exemplar queries: Give me an example of what you need. PVLDB 7(5), 365–376 (2014) Mottin, D., Lissandrini, M., Velegrakis, Y., Palpanas, T.: Exemplar queries: Give me an example of what you need. PVLDB 7(5), 365–376 (2014)
23.
go back to reference Mottin, D., Marascu, A., Roy, S.B., Das, G., Palpanas, T., Velegrakis, Y.: A probabilistic optimization framework for the empty-answer problem. PVLDB 6(14), 1762–1773 (2013) Mottin, D., Marascu, A., Roy, S.B., Das, G., Palpanas, T., Velegrakis, Y.: A probabilistic optimization framework for the empty-answer problem. PVLDB 6(14), 1762–1773 (2013)
24.
go back to reference Roelleke, T., Wang, J.: Tf-idf uncovered: a study of theories and probabilities. In: Proceedings of the 31st Annual International ACM SIGIR conference on Research and Development in Information Retrieval (2008) Roelleke, T., Wang, J.: Tf-idf uncovered: a study of theories and probabilities. In: Proceedings of the 31st Annual International ACM SIGIR conference on Research and Development in Information Retrieval (2008)
25.
go back to reference Steven, B., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)MATH Steven, B., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)MATH
26.
go back to reference Tata, S., Lohman, G.M.: SQAK: doing more with keywords. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 10-12 June 2008, pp. 889–902. ACM (2008) Tata, S., Lohman, G.M.: SQAK: doing more with keywords. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, 10-12 June 2008, pp. 889–902. ACM (2008)
28.
go back to reference Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endowment 1, 1008–1019 (2008)CrossRef Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endowment 1, 1008–1019 (2008)CrossRef
29.
go back to reference Zhang, L., Yu, Y.: Learning to generate CGs from domain specific sentences. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 44–57. Springer, Heidelberg (2001)CrossRef Zhang, L., Yu, Y.: Learning to generate CGs from domain specific sentences. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, pp. 44–57. Springer, Heidelberg (2001)CrossRef
Metadata
Title
Entity-Based Keyword Search in Web Documents
Authors
Enrico Sartori
Yannis Velegrakis
Francesco Guerra
Copyright Year
2016
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-49521-6_2

Premium Partner