Skip to main content
Erschienen in: Knowledge and Information Systems 2/2015

01.05.2015 | Regular Paper

Finding top-\(k\, r\)-cliques for keyword search from graphs in polynomial delay

verfasst von: Mehdi Kargar, Aijun An

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Keyword search over structured data offers an alternative method to explore and query databases for users that are not familiar with the structure of the data and/or a query language. Structured data are usually modeled as graphs. In this context, an answer is a substructure of the graph that contains all or some of the query keywords. In most of the previous works, a minimal tree that covers all the query keywords are found as the answer. Some recent works offer to find subgraphs rather than minimal trees and show that subgraphs might be more informative for the users. However, current methods suffer from the following problems. Although some of the content nodes (i.e., nodes that contain input keywords) are close to each other in an answer, others might be far from each other. While searching for the best answer, current methods explore the whole graph rather than only the content nodes. This might increase the run time and leads to poor performance. To address these problems, we propose to find top-\(k\, r\)-cliques as the answers to the graph keyword search problem. An \(r\)-clique is a set of content nodes that cover all the input keywords, and the distance between each pair of nodes is less than or equal to \(r\). We propose a new weight function that is the sum of distances between each pair of content nodes. We prove that minimizing the new weight function is NP-hard and propose an approximation algorithm that produces \(r\)-cliques with 2-approximation ratio in polynomial delay. We further improve the run time of the approximation algorithm with the cost of increasing the approximation ratio. Extensive performance studies using three large real datasets confirm the efficiency and accuracy of finding \(r\)-cliques in graphs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
A content node is a node in the input graph that contains at least one of the input keywords.
 
2
The shortest distance between two nodes in a graph is the sum of the weights of a path between the two nodes such that the sum of the weights of its constituent edges is minimized.
 
3
The weight of the edges depend on the application’s dataset.
 
4
A complete survey on keyword search in databases and graphs can be found in [25].
 
5
For directed graphs, the shortest distance between two nodes in an \(r\)-clique should be no larger than \(r\) in both directions.
 
6
3-satisfiability (3-SAT) is the problem of determining whether there exists an interpretation that satisfies a given boolean formula in a conjunctive normal form where each clause is limited to at most three literals. See [24] for more details.
 
7
There are other ways to prove this theorem, such as by reducing the problem to the multiple-choice cover problem. See [2] for the definition of the multiple-choice cover problem.
 
8
It should be noted that the same approach is used in [2] for proving the NP-hardness of multiple-choice cover problem.
 
9
The reason to choose this value will become clear later in the proof.
 
10
Note that since the distance between \(u_i\) and \(\overline{u}_i\) is set to \(2\times w+\epsilon \) (where \(\epsilon >0\)) which is larger than \(r\) (set to \(2\times w\)), \(u_i\) and \(\overline{u}_i\) cannot be part of the same answer.
 
11
Since the distance between other variables is set to \(\frac{w}{\small \big (\begin{array}{c}n+m\\ 2\end{array}\big )}\), and we want to choose \(n+m\) keywords, if a feasible solution exists, its weight is less than \(w\).
 
12
Our approach to dividing a search space is similar to the idea used in [23].
 
13
As we formally describe later, our approximation algorithms find an approximation of an \(r\)-clique. Therefore, in the rest of the paper, an approximation of an \(r\)-clique is called approx-\(r\)-clique.
 
14
An inverted index is commonly used in information retrieval [3] and graph keyword search [23].
 
15
This is also valid for approx-\(r\)-cliques.
 
16
Note that the systems’s architecture is previously discussed in our demo paper published in [13]. It is briefly described here due to the completeness of this paper.
 
17
In our implementation, we used quick sort.
 
21
We choose 8 users because it is mentioned in [21] that “For really low-overhead projects, it’s often optimal to test as little as 2 users per study. For some other projects, 8 users—or sometimes even more—might be better.”
 
Literatur
1.
Zurück zum Zitat Anagnostopoulos A, Becchetti L, Castillo C, Gionis A, Leonardi S (2012) Online team formation in social networks. In: Proceedings of the WWW’12, pp 839–848 Anagnostopoulos A, Becchetti L, Castillo C, Gionis A, Leonardi S (2012) Online team formation in social networks. In: Proceedings of the WWW’12, pp 839–848
3.
Zurück zum Zitat Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MA Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MA
4.
Zurück zum Zitat Bhalotia G, Nakhe C, Hulgeri A, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using banks. In: Proceedings of ICDE’02, pp 431–440 Bhalotia G, Nakhe C, Hulgeri A, Chakrabarti S, Sudarshan S (2002) Keyword searching and browsing in databases using banks. In: Proceedings of ICDE’02, pp 431–440
5.
Zurück zum Zitat Dalvi B, Kshirsagar M, Sudarshan S (2008) Keyword search on external memory data graphs. In: Proceedings of VLDB’08, pp 1189–1204 Dalvi B, Kshirsagar M, Sudarshan S (2008) Keyword search on external memory data graphs. In: Proceedings of VLDB’08, pp 1189–1204
6.
Zurück zum Zitat Datta S, Majumder A, Naidu K (2012) Capacitated team formation problem on social networks. In: Proceedings of KDD’12, pp 1005–1013 Datta S, Majumder A, Naidu K (2012) Capacitated team formation problem on social networks. In: Proceedings of KDD’12, pp 1005–1013
7.
Zurück zum Zitat Ding B, Yu J, Wang S, Qin L, Zhang X, Lin X (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of ICDE’07, pp 836–845 Ding B, Yu J, Wang S, Qin L, Zhang X, Lin X (2007) Finding top-k min-cost connected trees in databases. In: Proceedings of ICDE’07, pp 836–845
8.
Zurück zum Zitat Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y (2010) Graph pattern matching: from intractable to polynomial time. In: Proceedings of VLDB’10, pp 264–275 Fan W, Li J, Ma S, Tang N, Wu Y, Wu Y (2010) Graph pattern matching: from intractable to polynomial time. In: Proceedings of VLDB’10, pp 264–275
9.
Zurück zum Zitat Golenberg K, Kimelfeld B, Sagiv Y (2008) Keyword proximity search in complex data graphs. In: Proceedings of SIGMOD’08, pp 927–940 Golenberg K, Kimelfeld B, Sagiv Y (2008) Keyword proximity search in complex data graphs. In: Proceedings of SIGMOD’08, pp 927–940
10.
Zurück zum Zitat He H, Wang H, Yang J, Yu P (2007) Blinks: ranked keyword searches on graphs. In: Proceedings of SIGMOD’07, pp 305–316 He H, Wang H, Yang J, Yu P (2007) Blinks: ranked keyword searches on graphs. In: Proceedings of SIGMOD’07, pp 305–316
11.
Zurück zum Zitat Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H (2005) Bidirectional expansion for keyword search on graph databases. In: Proceedings of VLDB’05, pp 505–516 Kacholia V, Pandit S, Chakrabarti S, Sudarshan S, Desai R, Karambelkar H (2005) Bidirectional expansion for keyword search on graph databases. In: Proceedings of VLDB’05, pp 505–516
12.
Zurück zum Zitat Kargar M, An A (2011) Keyword search in graphs: Finding \(r\)-cliques. In: Proceedings of VLDB’11, pp 681–692 Kargar M, An A (2011) Keyword search in graphs: Finding \(r\)-cliques. In: Proceedings of VLDB’11, pp 681–692
13.
Zurück zum Zitat Kargar M, An A (2012) Efficient top-k keyword search in graphs with polynomial delay. In: Proceedings of ICDE’12, pp 1269–1272 Kargar M, An A (2012) Efficient top-k keyword search in graphs with polynomial delay. In: Proceedings of ICDE’12, pp 1269–1272
14.
Zurück zum Zitat Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW (eds) Complexity of computer computations. Plenum, NY, pp 85–103 Karp RM (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW (eds) Complexity of computer computations. Plenum, NY, pp 85–103
15.
Zurück zum Zitat Karypis G, Kumar V (1995) Analysis of multilevel graph partitioning: supercomputing’95. In: Proceedings of the 1995 ACM/IEEE conference on supercomputing Karypis G, Kumar V (1995) Analysis of multilevel graph partitioning: supercomputing’95. In: Proceedings of the 1995 ACM/IEEE conference on supercomputing
16.
Zurück zum Zitat Koren Y, North SC, Volinsky C (2006) Measuring and extracting proximity in networks. In: Proceedings of KDD’06, pp 245–255 Koren Y, North SC, Volinsky C (2006) Measuring and extracting proximity in networks. In: Proceedings of KDD’06, pp 245–255
18.
Zurück zum Zitat Lappas T, Liu K, Terzi E (2009) Finding a team of experts in social networks. In: Proceedings of KDD’09, pp 467–475 Lappas T, Liu K, Terzi E (2009) Finding a team of experts in social networks. In: Proceedings of KDD’09, pp 467–475
19.
Zurück zum Zitat Lawler E (1972) A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Manag Sci 18(7):401–405CrossRefMATHMathSciNet Lawler E (1972) A procedure for computing the k best solutions to discrete optimization problems and its application to the shortest path problem. Manag Sci 18(7):401–405CrossRefMATHMathSciNet
20.
Zurück zum Zitat Li G, Ooi BC, Feng J, Wang J, Zhou L (2008) Ease: Efficient and adaptive keyword search on unstructured, semi-structured and structured data. In: Proceedings of SIGMOD’08, pp 903–914 Li G, Ooi BC, Feng J, Wang J, Zhou L (2008) Ease: Efficient and adaptive keyword search on unstructured, semi-structured and structured data. In: Proceedings of SIGMOD’08, pp 903–914
22.
Zurück zum Zitat Park J, Lee S (2011) Keyword search in relational databases. Knowl Inf Syst 26:175–193CrossRef Park J, Lee S (2011) Keyword search in relational databases. Knowl Inf Syst 26:175–193CrossRef
23.
Zurück zum Zitat Qin L, Yu J, Chang L, Tao Y (2009) Querying communities in relational databases. In: Proceedings of ICDE’09, pp 724–735 Qin L, Yu J, Chang L, Tao Y (2009) Querying communities in relational databases. In: Proceedings of ICDE’09, pp 724–735
24.
Zurück zum Zitat Vazirani V (2001) Approximation algorithms. Springer, Berlin Vazirani V (2001) Approximation algorithms. Springer, Berlin
25.
Zurück zum Zitat Yu J, Qin L, Chang L (eds) (2010) Keyword search in databases. Morgan and Claypool Publisher, NY Yu J, Qin L, Chang L (eds) (2010) Keyword search in databases. Morgan and Claypool Publisher, NY
Metadaten
Titel
Finding top--cliques for keyword search from graphs in polynomial delay
verfasst von
Mehdi Kargar
Aijun An
Publikationsdatum
01.05.2015
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0736-0

Weitere Artikel der Ausgabe 2/2015

Knowledge and Information Systems 2/2015 Zur Ausgabe

Premium Partner