Skip to main content
Erschienen in: The VLDB Journal 3/2015

01.06.2015 | Regular Paper

Efficient \(k\)-closest pair queries in general metric spaces

verfasst von: Yunjun Gao, Lu Chen, Xinhan Li, Bin Yao, Gang Chen

Erschienen in: The VLDB Journal | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Given two object sets \(P\) and \(Q\), a k-closest pair \((k\hbox {CP})\) query finds \(k\) closest object pairs from \(P\times Q\). This operation is common in many real-life applications such as GIS, data mining, and recommender systems. Although it has received much attention in the Euclidean space, there is little prior work on the metric space. In this paper, we study the problem of kCP query processing in general metric spaces, namely Metric kCP \((\hbox {M}k\hbox {CP})\) search, and propose several efficient algorithms using dynamic disk-based metric indexes (e.g., M-tree), which can be applied to arbitrary type of data as long as a certain metric distance is defined and satisfies the triangle inequality. Our approaches follow depth-first and/or best-first traversal paradigm(s), employ effective pruning rules based on metric space properties and the counting information preserved in the metric index, take advantage of aggressive pruning and compensation to further boost query efficiency, and derive a node-based cost model for \(\hbox {M}k\hbox {CP}\) retrieval. In addition, we extend our techniques to tackle two interesting variants of \(\hbox {M}k\hbox {CP}\) queries. Extensive experiments with both real and synthetic data sets demonstrate the performance of our proposed algorithms, the effectiveness of our developed pruning rules, and the accuracy of our presented cost model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Achtert, E., Kriegel, H.P., Kroger, P., Renz, M., Zufle, A.: Reverse \(k\)-nearest neighbor search in dynamic and general metric databases. In: EDBT, pp. 886–897 (2009) Achtert, E., Kriegel, H.P., Kroger, P., Renz, M., Zufle, A.: Reverse \(k\)-nearest neighbor search in dynamic and general metric databases. In: EDBT, pp. 886–897 (2009)
2.
Zurück zum Zitat Alvarez, M., Pan, A., Raposo, J., Bellas, F., Cacheda, F.: Using clustering and edit distance techniques for automatic web data extraction. In: WISE, pp. 212–224 (2007) Alvarez, M., Pan, A., Raposo, J., Bellas, F., Cacheda, F.: Using clustering and edit distance techniques for automatic web data extraction. In: WISE, pp. 212–224 (2007)
3.
Zurück zum Zitat Angiulli, F., Pizzuti, C.: An approximate algorithm for top-\(k\) closest pairs join query in large high dimensional data. Data Knowl. Eng. 53(3), 263–281 (2005)CrossRef Angiulli, F., Pizzuti, C.: An approximate algorithm for top-\(k\) closest pairs join query in large high dimensional data. Data Knowl. Eng. 53(3), 263–281 (2005)CrossRef
4.
Zurück zum Zitat Arumugam, S., Jermaine, C.: Closest-point-of-approach join for moving object histories. In: ICDE, pp. 86–95 (2006) Arumugam, S., Jermaine, C.: Closest-point-of-approach join for moving object histories. In: ICDE, pp. 86–95 (2006)
5.
Zurück zum Zitat Bohm, C.: A cost model for query processing in high dimensional data spaces. ACM Trans. Database Syst. 25(2), 129–178 (2000) Bohm, C.: A cost model for query processing in high dimensional data spaces. ACM Trans. Database Syst. 25(2), 129–178 (2000)
6.
Zurück zum Zitat Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD, pp. 93–104 (2000) Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD, pp. 93–104 (2000)
7.
Zurück zum Zitat Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24(14), 2357–2366 (2003)CrossRefMATH Bustos, B., Navarro, G., Chavez, E.: Pivot selection techniques for proximity searching in metric spaces. Pattern Recognit. Lett. 24(14), 2357–2366 (2003)CrossRefMATH
8.
Zurück zum Zitat Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRef Chavez, E., Navarro, G., Baeza-Yates, R., Marroquin, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRef
9.
Zurück zum Zitat Cheema, M.A., Lin, X., Wang, H., Wang, J., Zhang, W.: A unified approach for computing top-\(k\) pairs in multi-dimensional space. In: ICDE, pp. 1031–1042 (2011) Cheema, M.A., Lin, X., Wang, H., Wang, J., Zhang, W.: A unified approach for computing top-\(k\) pairs in multi-dimensional space. In: ICDE, pp. 1031–1042 (2011)
10.
Zurück zum Zitat Chen, C., Sun, W., Zheng, B., Mao, D., Liu, W.: An incremental approach to closest pair queries in spatial networks using best-first search. In: DEXA, pp. 136–143 (2011) Chen, C., Sun, W., Zheng, B., Mao, D., Liu, W.: An incremental approach to closest pair queries in spatial networks using best-first search. In: DEXA, pp. 136–143 (2011)
11.
Zurück zum Zitat Chen, L., Lian, X.: Efficient processing of metric skyline queries. IEEE Trans. Knowl. Data Eng. 21(3), 351–365 (2009)CrossRef Chen, L., Lian, X.: Efficient processing of metric skyline queries. IEEE Trans. Knowl. Data Eng. 21(3), 351–365 (2009)CrossRef
12.
Zurück zum Zitat Ciaccia, P., Nanni, A., Patella, M.: A query-sensitive cost model for similarity queries with M-tree. In: ADC, pp. 65–76 (1999) Ciaccia, P., Nanni, A., Patella, M.: A query-sensitive cost model for similarity queries with M-tree. In: ADC, pp. 65–76 (1999)
13.
Zurück zum Zitat Ciaccia, P., Patella, M.: PAC nearest neighbor queries: approximate and controlled search in high dimensional and metric spaces. In: ICDE, pp. 244–255 (2000) Ciaccia, P., Patella, M.: PAC nearest neighbor queries: approximate and controlled search in high dimensional and metric spaces. In: ICDE, pp. 244–255 (2000)
14.
Zurück zum Zitat Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997) Ciaccia, P., Patella, M., Zezula, P.: M-tree: an efficient access method for similarity search in metric spaces. In: VLDB, pp. 426–435 (1997)
15.
Zurück zum Zitat Ciaccia, P., Patella, M., Zezula, P.: A cost model for similarity queries in metric spaces. In: PODS, pp. 59–68 (1998) Ciaccia, P., Patella, M., Zezula, P.: A cost model for similarity queries in metric spaces. In: PODS, pp. 59–68 (1998)
16.
Zurück zum Zitat Corral, A., Almendros-Jimnez, J.: A performance comparison of distance-based query algorithms using R-trees in spatial databases. Inf. Sci. 177(11), 2207–2237 (2007)CrossRef Corral, A., Almendros-Jimnez, J.: A performance comparison of distance-based query algorithms using R-trees in spatial databases. Inf. Sci. 177(11), 2207–2237 (2007)CrossRef
17.
Zurück zum Zitat Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD, pp. 189–200 (2000) Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Closest pair queries in spatial databases. In: SIGMOD, pp. 189–200 (2000)
18.
Zurück zum Zitat Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for processing \(k\)-closest-pair queries in spatial databases. Data Knowl. Eng. 49(1), 67–104 (2004)CrossRef Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Algorithms for processing \(k\)-closest-pair queries in spatial databases. Data Knowl. Eng. 49(1), 67–104 (2004)CrossRef
19.
Zurück zum Zitat Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Cost models for distance joins queries using R-trees. Data Knowl. Eng. 57(1), 1–36 (2006)CrossRef Corral, A., Manolopoulos, Y., Theodoridis, Y., Vassilakopoulos, M.: Cost models for distance joins queries using R-trees. Data Knowl. Eng. 57(1), 1–36 (2006)CrossRef
20.
Zurück zum Zitat Corral, A., Vassilakopoulos, M.: On approximate algorithms for distance-based queries using R-trees. Comput. J. 48(2), 220–238 (2005)CrossRef Corral, A., Vassilakopoulos, M.: On approximate algorithms for distance-based queries using R-trees. Comput. J. 48(2), 220–238 (2005)CrossRef
21.
Zurück zum Zitat Eppstein, D.: Fast hierarchical clustering and other applications of dynamic closest pairs. J. Exp. Algorithm. 5, article 1 (2000) Eppstein, D.: Fast hierarchical clustering and other applications of dynamic closest pairs. J. Exp. Algorithm. 5, article 1 (2000)
22.
Zurück zum Zitat Fredriksson, K., Braithwaite, B.: Quicker similarity joins in metric spaces. In: SISAP, pp. 127–140 (2013) Fredriksson, K., Braithwaite, B.: Quicker similarity joins in metric spaces. In: SISAP, pp. 127–140 (2013)
23.
Zurück zum Zitat Fuhry, D., Jin, R., D.Zhang: Efficient skyline computation in metric space. In: EDBT, pp. 1042–1051 (2009) Fuhry, D., Jin, R., D.Zhang: Efficient skyline computation in metric space. In: EDBT, pp. 1042–1051 (2009)
24.
Zurück zum Zitat Gutierrez, G., Saez, P.: The \(k\) closest pairs in spatial databases. GeoInformatica 17(4), 543–565 (2013)CrossRef Gutierrez, G., Saez, P.: The \(k\) closest pairs in spatial databases. GeoInformatica 17(4), 543–565 (2013)CrossRef
25.
Zurück zum Zitat Hjaltason, G.R., Samet, H.: Incremental distance join algorithms for spatial databases. In: SIGMOD, pp. 237–248 (1998) Hjaltason, G.R., Samet, H.: Incremental distance join algorithms for spatial databases. In: SIGMOD, pp. 237–248 (1998)
26.
Zurück zum Zitat Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Trans. Database Syst. 28(4), 517–580 (2003)CrossRef Hjaltason, G.R., Samet, H.: Index-driven similarity search in metric spaces. ACM Trans. Database Syst. 28(4), 517–580 (2003)CrossRef
27.
Zurück zum Zitat Jacox, E.H., Samet, H.: Metric space similarity joins. ACM Trans. Database Syst. 33(2), article 7 (2008) Jacox, E.H., Samet, H.: Metric space similarity joins. ACM Trans. Database Syst. 33(2), article 7 (2008)
28.
Zurück zum Zitat Kim, Y.J., Patel, J.M.: Performance comparison of the R*-tree and the quadtree for \(k\)nn and distance join queries. IEEE Trans. Knowl. Data Eng. 22(7), 1014–1027 (2010)CrossRef Kim, Y.J., Patel, J.M.: Performance comparison of the R*-tree and the quadtree for \(k\)nn and distance join queries. IEEE Trans. Knowl. Data Eng. 22(7), 1014–1027 (2010)CrossRef
29.
Zurück zum Zitat Kurasawa, H., Takasu, A., Adachi, J.: Finding the \(k\)-closest pairs in metric spaces. In: NTSS, pp. 8–13 (2011) Kurasawa, H., Takasu, A., Adachi, J.: Finding the \(k\)-closest pairs in metric spaces. In: NTSS, pp. 8–13 (2011)
30.
Zurück zum Zitat Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C2P: Clustering based on closest pairs. In: VLDB, pp. 331–340 (2001) Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C2P: Clustering based on closest pairs. In: VLDB, pp. 331–340 (2001)
31.
Zurück zum Zitat Papadopoulos, A.N., Nanopoulos, A., Manolopoulos, Y.: Processing distance join queries with constraints. Comput. J. 49(3), 281–296 (2006)CrossRef Papadopoulos, A.N., Nanopoulos, A., Manolopoulos, Y.: Processing distance join queries with constraints. Comput. J. 49(3), 281–296 (2006)CrossRef
32.
Zurück zum Zitat Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. J. Discrete Algorithms 7(1), 18–35 (2009)CrossRefMATHMathSciNet Paredes, R., Reyes, N.: Solving similarity joins and range queries in metric spaces with the list of twin clusters. J. Discrete Algorithms 7(1), 18–35 (2009)CrossRefMATHMathSciNet
33.
Zurück zum Zitat Pearson, S.S., Silva, Y.N.: Index-based R-S similarity joins. In: SISAP, pp. 106–112 (2014) Pearson, S.S., Silva, Y.N.: Index-based R-S similarity joins. In: SISAP, pp. 106–112 (2014)
34.
Zurück zum Zitat Ristad, E.S., Yianilos, P.N.: Learning string-edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 522–532 (1998)CrossRef Ristad, E.S., Yianilos, P.N.: Learning string-edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 522–532 (1998)CrossRef
35.
Zurück zum Zitat Roumelis, G., Vassilakopoulos, M., Corral, A., Manolopoulos, Y.: A new plane-sweep algorithm for the \(k\)-closest-pairs query. In: SOFSEM, pp. 478–490 (2014) Roumelis, G., Vassilakopoulos, M., Corral, A., Manolopoulos, Y.: A new plane-sweep algorithm for the \(k\)-closest-pairs query. In: SOFSEM, pp. 478–490 (2014)
36.
Zurück zum Zitat Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)MATH Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)MATH
37.
Zurück zum Zitat Sarma, A.D., He, Y., Chaudhuri, S.: Clusterjoin: a similarity joins framework using map-reduce. PVLDB 7(12), 1059–1070 (2014) Sarma, A.D., He, Y., Chaudhuri, S.: Clusterjoin: a similarity joins framework using map-reduce. PVLDB 7(12), 1059–1070 (2014)
38.
Zurück zum Zitat Shan, J., Zhang, D., Salzberg, B.: On spatial-range closest-pair query. In: SSTD, pp. 252–269 (2003) Shan, J., Zhang, D., Salzberg, B.: On spatial-range closest-pair query. In: SSTD, pp. 252–269 (2003)
39.
Zurück zum Zitat Shin, H., Moon, B., Lee, S.: Adaptive multi-stage distance join processing. In: SIGMOD, pp. 343–354 (2000) Shin, H., Moon, B., Lee, S.: Adaptive multi-stage distance join processing. In: SIGMOD, pp. 343–354 (2000)
40.
Zurück zum Zitat Shin, H., Moon, B., Lee, S.: Adaptive and incremental processing for distance join queries. IEEE Trans. Knowl. Data Eng. 15(6), 1561–1578 (2003)CrossRef Shin, H., Moon, B., Lee, S.: Adaptive and incremental processing for distance join queries. IEEE Trans. Knowl. Data Eng. 15(6), 1561–1578 (2003)CrossRef
41.
Zurück zum Zitat Silva, Y.N., Pearson, S.: Exploiting database similarity joins for metric spaces. In: VLDB, pp. 1922–1925 (2012) Silva, Y.N., Pearson, S.: Exploiting database similarity joins for metric spaces. In: VLDB, pp. 1922–1925 (2012)
42.
Zurück zum Zitat Silva, Y.N., Pearson, S., Cheney, J.A.: Database similarity join for metric spaces. In: SISAP, pp. 266–279 (2013) Silva, Y.N., Pearson, S., Cheney, J.A.: Database similarity join for metric spaces. In: SISAP, pp. 266–279 (2013)
43.
Zurück zum Zitat Skopal, T., Lokoc, J.: Answering metric skyline queries by PM-tree. In: DATESO, pp. 22–37 (2010) Skopal, T., Lokoc, J.: Answering metric skyline queries by PM-tree. In: DATESO, pp. 22–37 (2010)
44.
Zurück zum Zitat Skopal, T., Pokorny, J., Snasel, V.: PM-tree: pivoting metric tree for similarity search in multimedia databases. In: ADBIS, pp. 803–815 (2004) Skopal, T., Pokorny, J., Snasel, V.: PM-tree: pivoting metric tree for similarity search in multimedia databases. In: ADBIS, pp. 803–815 (2004)
45.
Zurück zum Zitat Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. 35(3), article 20 (2010) Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst. 35(3), article 20 (2010)
46.
Zurück zum Zitat Tao, Y., Yiu, M.L., Mamoulis, N.: Reverse nearest neighbor search in metric spaces. IEEE Trans. Knowl. Data Eng. 18(9), 1239–1252 (2006)CrossRef Tao, Y., Yiu, M.L., Mamoulis, N.: Reverse nearest neighbor search in metric spaces. IEEE Trans. Knowl. Data Eng. 18(9), 1239–1252 (2006)CrossRef
47.
Zurück zum Zitat Tao, Y., Zhang, J., Papadias, D., Mamoulis, N.: An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. TKDE 16(10), 1169–1184 (2004) Tao, Y., Zhang, J., Papadias, D., Mamoulis, N.: An efficient cost model for optimization of nearest neighbor search in low and medium dimensional spaces. TKDE 16(10), 1169–1184 (2004)
48.
Zurück zum Zitat U, L.H., Mamoulis, N., Yiu, M.L.: Computation and monitoring of exclusive closest pairs. IEEE Trans. Knowl. Data Eng. 20(12), 1641–1654 (2008)CrossRef U, L.H., Mamoulis, N., Yiu, M.L.: Computation and monitoring of exclusive closest pairs. IEEE Trans. Knowl. Data Eng. 20(12), 1641–1654 (2008)CrossRef
49.
Zurück zum Zitat Vlachou, A., Doulkeridis, C., Kotidis, Y.: Metric-based similarity search in unstructured peer-to-peer systems. Trans. Large Scale Data Knowl. Cent. Syst. 7100, 28–48 (2012) Vlachou, A., Doulkeridis, C., Kotidis, Y.: Metric-based similarity search in unstructured peer-to-peer systems. Trans. Large Scale Data Knowl. Cent. Syst. 7100, 28–48 (2012)
50.
Zurück zum Zitat Wang, Y., Metwally, A., Parthasarathy, S.: Scalable all-pairs similarity search in metric spaces. In: KDD, pp. 829–837 (2013) Wang, Y., Metwally, A., Parthasarathy, S.: Scalable all-pairs similarity search in metric spaces. In: KDD, pp. 829–837 (2013)
51.
Zurück zum Zitat Xiao, C., Wang, W., Lin, X., Shang, H.: Top-\(k\) set similarity joins. In: ICDE, pp. 916–927 (2009) Xiao, C., Wang, W., Lin, X., Shang, H.: Top-\(k\) set similarity joins. In: ICDE, pp. 916–927 (2009)
52.
Zurück zum Zitat Yang, C., Lin, K.I.: An index structure for improving closest pairs and related join queries in spatial databases. In: IDEAS, pp. 140–149 (2002) Yang, C., Lin, K.I.: An index structure for improving closest pairs and related join queries in spatial databases. In: IDEAS, pp. 140–149 (2002)
53.
Zurück zum Zitat Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with M-trees. VLDB J. 7(4), 275–293 (1998)CrossRef Zezula, P., Savino, P., Amato, G., Rabitti, F.: Approximate similarity retrieval with M-trees. VLDB J. 7(4), 275–293 (1998)CrossRef
54.
Zurück zum Zitat Zhou, P., Zhang, D., Salzberg, B., Cooperman, G., Kollios, G.: Close pair queries in moving object databases. In: GIS, pp. 2–11 (2005) Zhou, P., Zhang, D., Salzberg, B., Cooperman, G., Kollios, G.: Close pair queries in moving object databases. In: GIS, pp. 2–11 (2005)
Metadaten
Titel
Efficient -closest pair queries in general metric spaces
verfasst von
Yunjun Gao
Lu Chen
Xinhan Li
Bin Yao
Gang Chen
Publikationsdatum
01.06.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal / Ausgabe 3/2015
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-015-0383-4

Weitere Artikel der Ausgabe 3/2015

The VLDB Journal 3/2015 Zur Ausgabe