Skip to main content
Erschienen in: International Journal of Multimedia Information Retrieval 4/2013

01.11.2013 | Trends and Surveys

Very large scale nearest neighbor search: ideas, strategies and challenges

verfasst von: Erik Gast, Ard Oerlemans, Michael S. Lew

Erschienen in: International Journal of Multimedia Information Retrieval | Ausgabe 4/2013

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Web-scale databases and big data collections are computationally challenging to analyze and search. Similarity or more precisely nearest neighbor searches are thus crucial in the analysis, indexing and utilization of these massive multimedia databases. In this work, we begin by reviewing the top approaches from the research literature in the past decade. Furthermore, we evaluate the scalability and computational complexity as the feature complexity and database size vary. For the experiments, we used two different data sets with different dimensionalities. The results reveal interesting insights regarding the index structures and their behavior when the data set size is increased. We also summarized the ideas, strategies and challenges for the future.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: MIR ’10: Proceedings of the 2010 ACM international conference on multimedia information retrieval. ACM Press, New York, pp 527–536 Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: MIR ’10: Proceedings of the 2010 ACM international conference on multimedia information retrieval. ACM Press, New York, pp 527–536
2.
Zurück zum Zitat Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: Proceedings of the 1984 ACM SIGMOD international conference on management of data, Boston, pp 47–57 Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: Proceedings of the 1984 ACM SIGMOD international conference on management of data, Boston, pp 47–57
3.
Zurück zum Zitat Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec 19(2):322–331CrossRef Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec 19(2):322–331CrossRef
4.
Zurück zum Zitat Berchtold S, Keim DA, Kriegel H.-P. (1996) The X-tree: an index structure for high-dimensional data. In: VLDB ’96: Proceedings of the 22th international conference on very large data bases, San Francisco, pp 28–39 Berchtold S, Keim DA, Kriegel H.-P. (1996) The X-tree: an index structure for high-dimensional data. In: VLDB ’96: Proceedings of the 22th international conference on very large data bases, San Francisco, pp 28–39
5.
Zurück zum Zitat White DA, Jain R (1996) Similarity Indexing with the SS-tree. In: ICDE ’96: Proceedings of the twelfth international conference on data engineering, Washington, pp 516–523 White DA, Jain R (1996) Similarity Indexing with the SS-tree. In: ICDE ’96: Proceedings of the twelfth international conference on data engineering, Washington, pp 516–523
6.
Zurück zum Zitat Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, Tucson, pp 369–380 Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, Tucson, pp 369–380
7.
Zurück zum Zitat Ramaswamy S, Rose K (2011) Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans Knowl Data Eng 23(6):815–830 Ramaswamy S, Rose K (2011) Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans Knowl Data Eng 23(6):815–830
8.
Zurück zum Zitat Henrich A, Six H-W, Widmayer P (1986) The LSD tree: spatial access to multidimensional and non-point objects. In: VLDB ’89: Proceedings of the 15th international conference on very large data bases, Amsterdam, pp 45–53 Henrich A, Six H-W, Widmayer P (1986) The LSD tree: spatial access to multidimensional and non-point objects. In: VLDB ’89: Proceedings of the 15th international conference on very large data bases, Amsterdam, pp 45–53
9.
Zurück zum Zitat Henrich A (1998) The LSDh-Tree: an access structure for feature vectors. In: ICDE ’98: Proceedings of the fourteenth international conference on data engineering, Washington, pp 362–369 Henrich A (1998) The LSDh-Tree: an access structure for feature vectors. In: ICDE ’98: Proceedings of the fourteenth international conference on data engineering, Washington, pp 362–369
10.
Zurück zum Zitat Chakrabarti K, Mehrotra S (1999) The hybrid tree: an index structure for high dimensional feature spaces. In: ICDE ’99: Proceedings of the 15th international conference on data engineering, Washington, pp 440–447 Chakrabarti K, Mehrotra S (1999) The hybrid tree: an index structure for high dimensional feature spaces. In: ICDE ’99: Proceedings of the 15th international conference on data engineering, Washington, pp 440–447
11.
Zurück zum Zitat Dang TK, Küng J, Wagner R (2001) The SH-tree: a super hybrid index structure for multidimensional data. In: DEXA ’01: Proceedings of the 12th international conference on database and expert systems applications, London, pp 340–349 Dang TK, Küng J, Wagner R (2001) The SH-tree: a super hybrid index structure for multidimensional data. In: DEXA ’01: Proceedings of the 12th international conference on database and expert systems applications, London, pp 340–349
12.
Zurück zum Zitat Berchtold S, Böhm C, Kriegal H (1998) The pyramid-technique: towards breaking the curse of dimensionality. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Seattle, pp 142–153 Berchtold S, Böhm C, Kriegal H (1998) The pyramid-technique: towards breaking the curse of dimensionality. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Seattle, pp 142–153
13.
Zurück zum Zitat Kamel I, Faloutsos C (1994) Hilbert R-tree: an improved R-tree using fractals. In: VLDB ’94: Proceedings of the 20th international conference on very large data bases, San Francisco, pp 500–509 Kamel I, Faloutsos C (1994) Hilbert R-tree: an improved R-tree using fractals. In: VLDB ’94: Proceedings of the 20th international conference on very large data bases, San Francisco, pp 500–509
14.
Zurück zum Zitat Fonseca MJ, Jorge JA (2003) Indexing high-dimensional data for content-based retrieval in large databases. In: DASFAA ’03: Proceedings of the eighth international conference on database systems for advanced applications, Washington Fonseca MJ, Jorge JA (2003) Indexing high-dimensional data for content-based retrieval in large databases. In: DASFAA ’03: Proceedings of the eighth international conference on database systems for advanced applications, Washington
15.
Zurück zum Zitat Cu J, An Z, Guo Y, Zhou S (2010) Efficient nearest neighbor query based on extended B+-tree in high-dimensional space. Pattern Recogn Lett Cu J, An Z, Guo Y, Zhou S (2010) Efficient nearest neighbor query based on extended B+-tree in high-dimensional space. Pattern Recogn Lett
16.
Zurück zum Zitat Jolliffe IT (1986) Principal component analysis. Springer, New York Jolliffe IT (1986) Principal component analysis. Springer, New York
17.
Zurück zum Zitat Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to KNN processing. In: VLDB ’01: Proceedings of the 27th international conference on very large data bases, San Francisco, pp 421–430 Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to KNN processing. In: VLDB ’01: Proceedings of the 27th international conference on very large data bases, San Francisco, pp 421–430
18.
Zurück zum Zitat Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: Proceedings of the 24rd international conference on very large data bases, San Francisco, pp 194–205 Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: Proceedings of the 24rd international conference on very large data bases, San Francisco, pp 194–205
19.
Zurück zum Zitat Cui J, Huang Z, Wang B, Liu Y (2013) Near-optimal partial linear scan for nearest neighbor search in high-dimensional space. Lect Notes Comput Sci 7825:101–115CrossRef Cui J, Huang Z, Wang B, Liu Y (2013) Near-optimal partial linear scan for nearest neighbor search in high-dimensional space. Lect Notes Comput Sci 7825:101–115CrossRef
20.
Zurück zum Zitat Ferhatosmanoglu H, Tuncel E, Agrawal D, El Abbadi A (2006) High dimensional nearest neighbor searching. Inf Syst J 31(6):512–540CrossRef Ferhatosmanoglu H, Tuncel E, Agrawal D, El Abbadi A (2006) High dimensional nearest neighbor searching. Inf Syst J 31(6):512–540CrossRef
21.
Zurück zum Zitat Bellman R (1961) Adaptive control processes—a guided tour. Princeton University Press, PrincetonMATH Bellman R (1961) Adaptive control processes—a guided tour. Princeton University Press, PrincetonMATH
22.
Zurück zum Zitat Muja M, Lowe D (2012) Fast matching of binary features. In: Conference on computer and robot vision (CRV) Muja M, Lowe D (2012) Fast matching of binary features. In: Conference on computer and robot vision (CRV)
23.
Zurück zum Zitat Macqueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297 Macqueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297
Metadaten
Titel
Very large scale nearest neighbor search: ideas, strategies and challenges
verfasst von
Erik Gast
Ard Oerlemans
Michael S. Lew
Publikationsdatum
01.11.2013
Verlag
Springer London
Erschienen in
International Journal of Multimedia Information Retrieval / Ausgabe 4/2013
Print ISSN: 2192-6611
Elektronische ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-013-0046-4

Weitere Artikel der Ausgabe 4/2013

International Journal of Multimedia Information Retrieval 4/2013 Zur Ausgabe

Regular Paper

Bundle min-Hashing