Skip to main content
Top
Published in: International Journal of Multimedia Information Retrieval 4/2013

01-11-2013 | Trends and Surveys

Very large scale nearest neighbor search: ideas, strategies and challenges

Authors: Erik Gast, Ard Oerlemans, Michael S. Lew

Published in: International Journal of Multimedia Information Retrieval | Issue 4/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Web-scale databases and big data collections are computationally challenging to analyze and search. Similarity or more precisely nearest neighbor searches are thus crucial in the analysis, indexing and utilization of these massive multimedia databases. In this work, we begin by reviewing the top approaches from the research literature in the past decade. Furthermore, we evaluate the scalability and computational complexity as the feature complexity and database size vary. For the experiments, we used two different data sets with different dimensionalities. The results reveal interesting insights regarding the index structures and their behavior when the data set size is increased. We also summarized the ideas, strategies and challenges for the future.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: MIR ’10: Proceedings of the 2010 ACM international conference on multimedia information retrieval. ACM Press, New York, pp 527–536 Huiskes MJ, Thomee B, Lew MS (2010) New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative. In: MIR ’10: Proceedings of the 2010 ACM international conference on multimedia information retrieval. ACM Press, New York, pp 527–536
2.
go back to reference Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: Proceedings of the 1984 ACM SIGMOD international conference on management of data, Boston, pp 47–57 Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: SIGMOD ’84: Proceedings of the 1984 ACM SIGMOD international conference on management of data, Boston, pp 47–57
3.
go back to reference Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec 19(2):322–331CrossRef Beckmann N, Kriegel H-P, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec 19(2):322–331CrossRef
4.
go back to reference Berchtold S, Keim DA, Kriegel H.-P. (1996) The X-tree: an index structure for high-dimensional data. In: VLDB ’96: Proceedings of the 22th international conference on very large data bases, San Francisco, pp 28–39 Berchtold S, Keim DA, Kriegel H.-P. (1996) The X-tree: an index structure for high-dimensional data. In: VLDB ’96: Proceedings of the 22th international conference on very large data bases, San Francisco, pp 28–39
5.
go back to reference White DA, Jain R (1996) Similarity Indexing with the SS-tree. In: ICDE ’96: Proceedings of the twelfth international conference on data engineering, Washington, pp 516–523 White DA, Jain R (1996) Similarity Indexing with the SS-tree. In: ICDE ’96: Proceedings of the twelfth international conference on data engineering, Washington, pp 516–523
6.
go back to reference Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, Tucson, pp 369–380 Katayama N, Satoh S (1997) The SR-tree: an index structure for high-dimensional nearest neighbor queries. In: SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD international conference on Management of data, Tucson, pp 369–380
7.
go back to reference Ramaswamy S, Rose K (2011) Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans Knowl Data Eng 23(6):815–830 Ramaswamy S, Rose K (2011) Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans Knowl Data Eng 23(6):815–830
8.
go back to reference Henrich A, Six H-W, Widmayer P (1986) The LSD tree: spatial access to multidimensional and non-point objects. In: VLDB ’89: Proceedings of the 15th international conference on very large data bases, Amsterdam, pp 45–53 Henrich A, Six H-W, Widmayer P (1986) The LSD tree: spatial access to multidimensional and non-point objects. In: VLDB ’89: Proceedings of the 15th international conference on very large data bases, Amsterdam, pp 45–53
9.
go back to reference Henrich A (1998) The LSDh-Tree: an access structure for feature vectors. In: ICDE ’98: Proceedings of the fourteenth international conference on data engineering, Washington, pp 362–369 Henrich A (1998) The LSDh-Tree: an access structure for feature vectors. In: ICDE ’98: Proceedings of the fourteenth international conference on data engineering, Washington, pp 362–369
10.
go back to reference Chakrabarti K, Mehrotra S (1999) The hybrid tree: an index structure for high dimensional feature spaces. In: ICDE ’99: Proceedings of the 15th international conference on data engineering, Washington, pp 440–447 Chakrabarti K, Mehrotra S (1999) The hybrid tree: an index structure for high dimensional feature spaces. In: ICDE ’99: Proceedings of the 15th international conference on data engineering, Washington, pp 440–447
11.
go back to reference Dang TK, Küng J, Wagner R (2001) The SH-tree: a super hybrid index structure for multidimensional data. In: DEXA ’01: Proceedings of the 12th international conference on database and expert systems applications, London, pp 340–349 Dang TK, Küng J, Wagner R (2001) The SH-tree: a super hybrid index structure for multidimensional data. In: DEXA ’01: Proceedings of the 12th international conference on database and expert systems applications, London, pp 340–349
12.
go back to reference Berchtold S, Böhm C, Kriegal H (1998) The pyramid-technique: towards breaking the curse of dimensionality. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Seattle, pp 142–153 Berchtold S, Böhm C, Kriegal H (1998) The pyramid-technique: towards breaking the curse of dimensionality. In: SIGMOD ’98: Proceedings of the 1998 ACM SIGMOD international conference on Management of data, Seattle, pp 142–153
13.
go back to reference Kamel I, Faloutsos C (1994) Hilbert R-tree: an improved R-tree using fractals. In: VLDB ’94: Proceedings of the 20th international conference on very large data bases, San Francisco, pp 500–509 Kamel I, Faloutsos C (1994) Hilbert R-tree: an improved R-tree using fractals. In: VLDB ’94: Proceedings of the 20th international conference on very large data bases, San Francisco, pp 500–509
14.
go back to reference Fonseca MJ, Jorge JA (2003) Indexing high-dimensional data for content-based retrieval in large databases. In: DASFAA ’03: Proceedings of the eighth international conference on database systems for advanced applications, Washington Fonseca MJ, Jorge JA (2003) Indexing high-dimensional data for content-based retrieval in large databases. In: DASFAA ’03: Proceedings of the eighth international conference on database systems for advanced applications, Washington
15.
go back to reference Cu J, An Z, Guo Y, Zhou S (2010) Efficient nearest neighbor query based on extended B+-tree in high-dimensional space. Pattern Recogn Lett Cu J, An Z, Guo Y, Zhou S (2010) Efficient nearest neighbor query based on extended B+-tree in high-dimensional space. Pattern Recogn Lett
16.
go back to reference Jolliffe IT (1986) Principal component analysis. Springer, New York Jolliffe IT (1986) Principal component analysis. Springer, New York
17.
go back to reference Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to KNN processing. In: VLDB ’01: Proceedings of the 27th international conference on very large data bases, San Francisco, pp 421–430 Yu C, Ooi BC, Tan K-L, Jagadish HV (2001) Indexing the distance: an efficient method to KNN processing. In: VLDB ’01: Proceedings of the 27th international conference on very large data bases, San Francisco, pp 421–430
18.
go back to reference Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: Proceedings of the 24rd international conference on very large data bases, San Francisco, pp 194–205 Weber R, Schek H-J, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98: Proceedings of the 24rd international conference on very large data bases, San Francisco, pp 194–205
19.
go back to reference Cui J, Huang Z, Wang B, Liu Y (2013) Near-optimal partial linear scan for nearest neighbor search in high-dimensional space. Lect Notes Comput Sci 7825:101–115CrossRef Cui J, Huang Z, Wang B, Liu Y (2013) Near-optimal partial linear scan for nearest neighbor search in high-dimensional space. Lect Notes Comput Sci 7825:101–115CrossRef
20.
go back to reference Ferhatosmanoglu H, Tuncel E, Agrawal D, El Abbadi A (2006) High dimensional nearest neighbor searching. Inf Syst J 31(6):512–540CrossRef Ferhatosmanoglu H, Tuncel E, Agrawal D, El Abbadi A (2006) High dimensional nearest neighbor searching. Inf Syst J 31(6):512–540CrossRef
21.
go back to reference Bellman R (1961) Adaptive control processes—a guided tour. Princeton University Press, PrincetonMATH Bellman R (1961) Adaptive control processes—a guided tour. Princeton University Press, PrincetonMATH
22.
go back to reference Muja M, Lowe D (2012) Fast matching of binary features. In: Conference on computer and robot vision (CRV) Muja M, Lowe D (2012) Fast matching of binary features. In: Conference on computer and robot vision (CRV)
23.
go back to reference Macqueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297 Macqueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297
Metadata
Title
Very large scale nearest neighbor search: ideas, strategies and challenges
Authors
Erik Gast
Ard Oerlemans
Michael S. Lew
Publication date
01-11-2013
Publisher
Springer London
Published in
International Journal of Multimedia Information Retrieval / Issue 4/2013
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-013-0046-4

Other articles of this Issue 4/2013

International Journal of Multimedia Information Retrieval 4/2013 Go to the issue

Regular Paper

Bundle min-Hashing

Premium Partner