Skip to main content
Erschienen in: GeoInformatica 1/2010

01.01.2010

High-dimensional kNN joins with incremental updates

verfasst von: Cui Yu, Rui Zhang, Yaochun Huang, Hui Xiong

Erschienen in: GeoInformatica | Ausgabe 1/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The k Nearest Neighbor (kNN) join operation associates each data object in one data set with its k nearest neighbors from the same or a different data set. The kNN join on high-dimensional data (high-dimensional kNN join) is a very expensive operation. Existing high-dimensional kNN join algorithms were designed for static data sets and therefore cannot handle updates efficiently. In this article, we propose a novel kNN join method, named kNNJoin +, which supports efficient incremental computation of kNN join results with updates on high-dimensional data. As a by-product, our method also provides answers for the reverse kNN queries with very little overhead. We have performed an extensive experimental study. The results show the effectiveness of kNNJoin+ for processing high-dimensional kNN joins in dynamic workloads.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In this paper, an update means an insertion, a deletion or a change to an existing data object.
 
2
Here kNN query only needs to find a new k-th NN.
 
Literatur
2.
Zurück zum Zitat Achtert E, Böhm C, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In: SIGMOD’06: proceedings of the 2006 ACM SIGMOD international conference on management of data, pp 515–526 Achtert E, Böhm C, Kröger P, Kunath P, Pryakhin A, Renz M (2006) Efficient reverse k-nearest neighbor search in arbitrary metric spaces. In: SIGMOD’06: proceedings of the 2006 ACM SIGMOD international conference on management of data, pp 515–526
3.
Zurück zum Zitat Berchtold S, Keim DA (1998) High-dimensional index structures database support for next decade’s applications (tutorial). In: SIGMOD ’98: proceedings of the 1998 ACM SIGMOD international conference on management of data, p 501 Berchtold S, Keim DA (1998) High-dimensional index structures database support for next decade’s applications (tutorial). In: SIGMOD ’98: proceedings of the 1998 ACM SIGMOD international conference on management of data, p 501
4.
Zurück zum Zitat Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Proceeding of the 7th international conference on database theory (ICDT), pp 217–235 Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is nearest neighbor meaningful? In: Proceeding of the 7th international conference on database theory (ICDT), pp 217–235
5.
Zurück zum Zitat Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373CrossRef Böhm C, Berchtold S, Keim DA (2001) Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput Surv 33(3):322–373CrossRef
6.
Zurück zum Zitat Böhm C, Krebs F (2004) The k-nearest neighbor join: turbo charging the kdd process. Knowl Inf Syst (KAIS) 6(6):728–749CrossRef Böhm C, Krebs F (2004) The k-nearest neighbor join: turbo charging the kdd process. Knowl Inf Syst (KAIS) 6(6):728–749CrossRef
7.
Zurück zum Zitat Böhm C, Kriegel H-P (2000) Dynamically optimizing high-dimensional index structures. In: Proceedings of the 7th international conference on extending database technology (EDBT), pp 36–50 Böhm C, Kriegel H-P (2000) Dynamically optimizing high-dimensional index structures. In: Proceedings of the 7th international conference on extending database technology (EDBT), pp 36–50
8.
Zurück zum Zitat Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: VLDB ’97: proceedings of the 23rd international conference on very large data bases, pp 426–435 Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: VLDB ’97: proceedings of the 23rd international conference on very large data bases, pp 426–435
9.
Zurück zum Zitat Dasarathy BV (1991) Nearest neighbor (nn) norms - nn pattern classification techniques. IEEE Computer Society, Silver Spring Dasarathy BV (1991) Nearest neighbor (nn) norms - nn pattern classification techniques. IEEE Computer Society, Silver Spring
10.
Zurück zum Zitat Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57 Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: proceedings of the 1984 ACM SIGMOD international conference on management of data, pp 47–57
11.
Zurück zum Zitat Hartigan J, Wong M (1979) A K-means clustering algorithm. Appl Stat 28:100–108CrossRef Hartigan J, Wong M (1979) A K-means clustering algorithm. Appl Stat 28:100–108CrossRef
12.
Zurück zum Zitat Huang X, Jensen CS, Saltenis S (2006) Multiple k nearest neighbor query processing in spatial network databases. In: ADBIS ’06: proceedings of 10th East European conference of advances in databases and information systems, pp 266–281 Huang X, Jensen CS, Saltenis S (2006) Multiple k nearest neighbor query processing in spatial network databases. In: ADBIS ’06: proceedings of 10th East European conference of advances in databases and information systems, pp 266–281
13.
Zurück zum Zitat Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst (TODS) 30(2):364–397CrossRef Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst (TODS) 30(2):364–397CrossRef
14.
Zurück zum Zitat Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: SIGMOD ’00: proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 201–212 Korn F, Muthukrishnan S (2000) Influence sets based on reverse nearest neighbor queries. In: SIGMOD ’00: proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 201–212
15.
Zurück zum Zitat Lin K-I, Jagadish HV, Faloutsos C (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3:517–542CrossRef Lin K-I, Jagadish HV, Faloutsos C (1994) The TV-tree: an index structure for high-dimensional data. VLDB J 3:517–542CrossRef
16.
Zurück zum Zitat Rafiei D, Mendelzon A (2000) Querying time series data based on similarity. IEEE Trans Knowl Data Eng 12(5):675–693CrossRef Rafiei D, Mendelzon A (2000) Querying time series data based on similarity. IEEE Trans Knowl Data Eng 12(5):675–693CrossRef
17.
Zurück zum Zitat Tao Y, Papadias D, Lian X (2004) Reverse knn search in arbitrary dimensionality. In: VLDB ’04: Proceedings of the 30th international conference on very large data bases, pp 744–755 Tao Y, Papadias D, Lian X (2004) Reverse knn search in arbitrary dimensionality. In: VLDB ’04: Proceedings of the 30th international conference on very large data bases, pp 744–755
18.
Zurück zum Zitat Tao Y, Yiu, ML, Mamoulis N (2006) Reverse nearest neighbor search in metric spaces. IEEE Trans Knowl Data Eng 18(9):1239–1252CrossRef Tao Y, Yiu, ML, Mamoulis N (2006) Reverse nearest neighbor search in metric spaces. IEEE Trans Knowl Data Eng 18(9):1239–1252CrossRef
19.
Zurück zum Zitat Weber R, Schek H, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: proceedings of the 24rd international conference on very large data bases, pp 194–205 Weber R, Schek H, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: proceedings of the 24rd international conference on very large data bases, pp 194–205
20.
Zurück zum Zitat Wong R, Tao Y, Fu A, Xiao X, Pryakhin A, Renz M (2007) On efficient spatial matching. In: VLDB ’07: proceedings of the 33rd international conference on very large data bases, pp 579–590 Wong R, Tao Y, Fu A, Xiao X, Pryakhin A, Renz M (2007) On efficient spatial matching. In: VLDB ’07: proceedings of the 33rd international conference on very large data bases, pp 579–590
21.
Zurück zum Zitat Xia C, Lu H, Ooi BC, Hu J (2004) Gorder: an efficient method for knn join processing. In: VLDB ’04: proceedings of the 30th international conference on very large data bases, pp 756–767 Xia C, Lu H, Ooi BC, Hu J (2004) Gorder: an efficient method for knn join processing. In: VLDB ’04: proceedings of the 30th international conference on very large data bases, pp 756–767
22.
Zurück zum Zitat Yang C, Lin K (2001) An index structure for efficient reverse nearest neighbor queries In: Proceedings of the 17th international conference on data engineering (ICDE), pp 485–492 Yang C, Lin K (2001) An index structure for efficient reverse nearest neighbor queries In: Proceedings of the 17th international conference on data engineering (ICDE), pp 485–492
23.
Zurück zum Zitat Yu C, Cui B, Wang S, Su J (2007) Efficient index-based knn join processing for high-dimensional data. Inf Softw Technol 49(4):332–344CrossRef Yu C, Cui B, Wang S, Su J (2007) Efficient index-based knn join processing for high-dimensional data. Inf Softw Technol 49(4):332–344CrossRef
24.
Zurück zum Zitat Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to knn processing. In: VLDB ’01: proceedings of the 27th international conference on very large data bases, pp 166–174 Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to knn processing. In: VLDB ’01: proceedings of the 27th international conference on very large data bases, pp 166–174
25.
Zurück zum Zitat Zhang R, Koudas N, Ooi BC, Srivastava D (2005) Multiple aggregations over data streams. In: SIGMOD ’05: proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 299–310 Zhang R, Koudas N, Ooi BC, Srivastava D (2005) Multiple aggregations over data streams. In: SIGMOD ’05: proceedings of the 2005 ACM SIGMOD international conference on management of data, pp 299–310
Metadaten
Titel
High-dimensional kNN joins with incremental updates
verfasst von
Cui Yu
Rui Zhang
Yaochun Huang
Hui Xiong
Publikationsdatum
01.01.2010
Verlag
Springer US
Erschienen in
GeoInformatica / Ausgabe 1/2010
Print ISSN: 1384-6175
Elektronische ISSN: 1573-7624
DOI
https://doi.org/10.1007/s10707-009-0076-5

Weitere Artikel der Ausgabe 1/2010

GeoInformatica 1/2010 Zur Ausgabe