Skip to main content
Erschienen in: Multimedia Systems 1/2017

24.12.2014 | Special Issue Paper

Effective optimizations of cluster-based nearest neighbor search in high-dimensional space

verfasst von: Xiaokang Feng, Jiangtao Cui, Yingfan Liu, Hui Li

Erschienen in: Multimedia Systems | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nearest neighbor (NN) search in high-dimensional space plays a fundamental role in large-scale image retrieval. It seeks efficient indexing and search techniques, both of which are simultaneously essential for similarity search and semantic analysis. However, in recent years, there has been a rare breakthrough. Achievement of current techniques for NN search is far from satisfactory, especially for exact NN search. A recently proposed method, HB, addresses the exact NN search efficiently in high-dimensional space. It benefits from cluster-based techniques which can generate more compact representation of the data set than other techniques by exploiting interdimensional correlations. However, HB suffers from huge cost for lower bound computations and provides no further pruning scheme for points in candidate clusters. In this paper, we extend the HB method to address exact NN search in correlated, high-dimensional vector data sets extracted from large-scale image database by introducing two new pruning/selection techniques and we call it HB+. The first approach aims at selecting more quickly the subset of hyperplanes/clusters that must be considered. The second technique prunes irrelevant points in the selected subset of clusters. Performed experiments show the improvement of HB+ with respect to HB in terms of efficiency (I/O cost and CPU response time) and also demonstrate the superiority over other exact NN indexes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In the sequel, we shall use “\(k\)-NN search” to refer to “exact \(k\)-NN search” for simplicity.
 
2
In the sequel, we shall use the term “lower bound” to refer to the lower bound distance between a query and a cluster for simplicity.
 
3
This concept refers to our suggestion of partially selecting separating hyperplanes. In fact, in experimentation we have tested the empirical results by varying the value of \(\alpha\), the proportion of selected separating hyperplanes, during the experiments.
 
Literatur
1.
Zurück zum Zitat Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Patt. Anal. Mach. Intell. IEEE. Trans. 29(3), 394–410 (2007)CrossRef Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Patt. Anal. Mach. Intell. IEEE. Trans. 29(3), 394–410 (2007)CrossRef
2.
Zurück zum Zitat Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco 518–529 1999 Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: Proceedings of the 25th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco 518–529 1999
3.
Zurück zum Zitat Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Washington, IEEE Computer Society 327–336 (2008) Athitsos, V., Potamias, M., Papapetrou, P., Kollios, G.: Nearest neighbor retrieval using distance-based hashing. In: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Washington, IEEE Computer Society 327–336 (2008)
4.
Zurück zum Zitat Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: Proceedings of the 35th SIGMOD international conference on Management of data, New York, USA SIGMOD ’09, pp. 563–576, ACM (2009) Tao, Y., Yi, K., Sheng, C., Kalnis, P.: Quality and efficiency in high dimensional nearest neighbor search. In: Proceedings of the 35th SIGMOD international conference on Management of data, New York, USA SIGMOD ’09, pp. 563–576, ACM (2009)
5.
Zurück zum Zitat Gan, J., Feng, J., Fang, Q., Ng, W.: Locality sensitive hashing scheme based on dyanmic collision counting. In: Proceedings of the 38th SIGMOD international conference on Management of data. SIGMOD’12, pp. 541–552, ACM (2012) Gan, J., Feng, J., Fang, Q., Ng, W.: Locality sensitive hashing scheme based on dyanmic collision counting. In: Proceedings of the 38th SIGMOD international conference on Management of data. SIGMOD’12, pp. 541–552, ACM (2012)
6.
Zurück zum Zitat Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011)CrossRef Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33, 117–128 (2011)CrossRef
7.
Zurück zum Zitat Weber, R., Schek, H-Jörg., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98 Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc. San Francisco, USA pp. 194–205 (1998) Weber, R., Schek, H-Jörg., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB ’98 Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc. San Francisco, USA pp. 194–205 (1998)
8.
Zurück zum Zitat Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data, New York, USA pp. 47–57, ACM (1984) Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data, New York, USA pp. 47–57, ACM (1984)
9.
Zurück zum Zitat Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec. 19(2), 322–331 (1990)CrossRef Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec. 19(2), 322–331 (1990)CrossRef
10.
Zurück zum Zitat David A., Ramesh, W.J.: Similarity indexing with the ss-tree. In: Proceedings of the 12th International Conference on Data Engineering. ICDE’96, pp. 516–523 (1996) David A., Ramesh, W.J.: Similarity indexing with the ss-tree. In: Proceedings of the 12th International Conference on Data Engineering. ICDE’96, pp. 516–523 (1996)
11.
Zurück zum Zitat Katayama, N., Satoh, S.: The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the 23rd ACM SIGMOD international conference on Management of data, SIGMOD’97, pp. 369–380 (1997) Katayama, N., Satoh, S.: The sr-tree: an index structure for high-dimensional nearest neighbor queries. In: Proceedings of the 23rd ACM SIGMOD international conference on Management of data, SIGMOD’97, pp. 369–380 (1997)
12.
Zurück zum Zitat Ramaswamy S., Rose, K.: Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans. Knowl. Data. Eng., vol. 23, no. 6, pp. 815–830 June (2011) Ramaswamy S., Rose, K.: Adaptive cluster distance bounding for high-dimensional indexing. IEEE Trans. Knowl. Data. Eng., vol. 23, no. 6, pp. 815–830 June (2011)
13.
Zurück zum Zitat Jagadish, H.V.: Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: idistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database. Syst. 30(2), 364–397 (2005)CrossRef Jagadish, H.V.: Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: idistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans. Database. Syst. 30(2), 364–397 (2005)CrossRef
14.
Zurück zum Zitat Hwang, Y., Han, B., Ahn, H.-K.: A fast nearest neighbor search algorithm by nonlinear embedding, in Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE pp. 3053–3060 (2012) Hwang, Y., Han, B., Ahn, H.-K.: A fast nearest neighbor search algorithm by nonlinear embedding, in Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE pp. 3053–3060 (2012)
15.
Zurück zum Zitat Achlioptas, D.: Database-friendly random projections:johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)MathSciNetCrossRefMATH Achlioptas, D.: Database-friendly random projections:johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)MathSciNetCrossRefMATH
16.
Zurück zum Zitat Bingham E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data in SIGKDD pp. 245–250 (2001) Bingham E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data in SIGKDD pp. 245–250 (2001)
17.
Zurück zum Zitat Li, P., Hastie, T.J., Church K.W.: Very sparse random projections in SIGKDD, pp. 287–296 (2006) Li, P., Hastie, T.J., Church K.W.: Very sparse random projections in SIGKDD, pp. 287–296 (2006)
18.
Zurück zum Zitat Jeffrey Scott Vitter: Algorithms and data structures for external memory. Found. Trends. Theor. Comput. Sci. 2(4), 305–474 (2008)MathSciNetCrossRefMATH Jeffrey Scott Vitter: Algorithms and data structures for external memory. Found. Trends. Theor. Comput. Sci. 2(4), 305–474 (2008)MathSciNetCrossRefMATH
19.
Zurück zum Zitat Chakrabarti K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces in VLDB ’00 In: Proceedings of the 26th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 89–100 (2000) Chakrabarti K., Mehrotra, S.: Local dimensionality reduction: A new approach to indexing high dimensional spaces in VLDB ’00 In: Proceedings of the 26th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 89–100 (2000)
20.
Zurück zum Zitat Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)CrossRef Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)CrossRef
21.
Zurück zum Zitat Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In VLDB ’96: Proceedings of the 22th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 28–39 (1996) Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In VLDB ’96: Proceedings of the 22th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, USA pp. 28–39 (1996)
22.
Zurück zum Zitat Robinson, J.T. The k-d-b-tree: a search structure for large multidimensional dynamic indexes. In SIGMOD ’81: Proceedings of the 1981 ACM SIGMOD international conference on Management of data, ACM, New York, USA pp. 10–18 (1981) Robinson, J.T. The k-d-b-tree: a search structure for large multidimensional dynamic indexes. In SIGMOD ’81: Proceedings of the 1981 ACM SIGMOD international conference on Management of data, ACM, New York, USA pp. 10–18 (1981)
23.
Zurück zum Zitat Jon Louis Bentley: Multidimensional binary search trees used for associative searching. Commun. ACM. 18(9), 509–517 (1975)CrossRefMATH Jon Louis Bentley: Multidimensional binary search trees used for associative searching. Commun. ACM. 18(9), 509–517 (1975)CrossRefMATH
24.
Zurück zum Zitat Berchtold, S., Bohm, C., Jagadish, H.V., Kriegel, H.-P., Sander, J.: Independent quantization: an index compression technique for high-dimensional data spaces In ICDE ’00: Proceedings of the 16th International Conference on Data Engineering, IEEE Comput. Soc. Washington, USA 2000, p. 577 (2000) Berchtold, S., Bohm, C., Jagadish, H.V., Kriegel, H.-P., Sander, J.: Independent quantization: an index compression technique for high-dimensional data spaces In ICDE ’00: Proceedings of the 16th International Conference on Data Engineering, IEEE Comput. Soc. Washington, USA 2000, p. 577 (2000)
25.
Zurück zum Zitat Ferhatosmanoglu, H., Tuncel, E., Agrawal, D.: Vector approximation based indexing for non-uniform high dimensional data sets. In CIKM ’00: Proceedings of the ninth international conference on Information and knowledge management, ACM. New York, USA pp. 202–209 (2000) Ferhatosmanoglu, H., Tuncel, E., Agrawal, D.: Vector approximation based indexing for non-uniform high dimensional data sets. In CIKM ’00: Proceedings of the ninth international conference on Information and knowledge management, ACM. New York, USA pp. 202–209 (2000)
26.
Zurück zum Zitat Cui, J., Zhou, S., Sun, J.: Efficient high-dimensional indexing by sorting principal component. Pattern Recogn. Lett. 28(16), 2412–2418 (2007)CrossRef Cui, J., Zhou, S., Sun, J.: Efficient high-dimensional indexing by sorting principal component. Pattern Recogn. Lett. 28(16), 2412–2418 (2007)CrossRef
27.
Zurück zum Zitat Ravi, K.V., Divyakant Agrawal, K., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. SIGMOD Rec. 27(2), 166–176 (1998)CrossRef Ravi, K.V., Divyakant Agrawal, K., Singh, A.: Dimensionality reduction for similarity searching in dynamic databases. SIGMOD Rec. 27(2), 166–176 (1998)CrossRef
28.
Zurück zum Zitat Van der Maaten, L.J.P., Postma, E.O., Van Den Herik, H.J., Dimensionality reduction : a comparative review. October, vol. 10, no. February, pp. 35, (2009) Van der Maaten, L.J.P., Postma, E.O., Van Den Herik, H.J., Dimensionality reduction : a comparative review. October, vol. 10, no. February, pp. 35, (2009)
29.
Zurück zum Zitat Lian X., Chen, L.: A general cost model for dimensionality reduction in high dimensional spaces. In ICDE ’07: Proceedings of the 23th International Conference on Data Engineering, pp. 66–75 (2007) Lian X., Chen, L.: A general cost model for dimensionality reduction in high dimensional spaces. In ICDE ’07: Proceedings of the 23th International Conference on Data Engineering, pp. 66–75 (2007)
30.
Zurück zum Zitat Berchtold, S., Böhm, C., Kriegal, H.-P.: The pyramid-technique: towards breaking the curse of dimensionality. SIGMOD Rec. 27(2), 142–153 (1998)CrossRef Berchtold, S., Böhm, C., Kriegal, H.-P.: The pyramid-technique: towards breaking the curse of dimensionality. SIGMOD Rec. 27(2), 142–153 (1998)CrossRef
31.
Zurück zum Zitat Christos F., Searching multimedia databases by content, vol. 3, Springer, New York (1996) Christos F., Searching multimedia databases by content, vol. 3, Springer, New York (1996)
32.
Zurück zum Zitat Nick K., Beng C.O., Heng T.S., Tung A.K.H.: Ldc: Enabling search by partial distance in a hyper-dimensional space. In ICDE ’04: Proceedings of the 20th International Conference on Data Engineering, IEEE Computer Society, Washington, USA p. 6 (2004) Nick K., Beng C.O., Heng T.S., Tung A.K.H.: Ldc: Enabling search by partial distance in a hyper-dimensional space. In ICDE ’04: Proceedings of the 20th International Conference on Data Engineering, IEEE Computer Society, Washington, USA p. 6 (2004)
33.
Zurück zum Zitat Gray, R.M.: Vector quantization. ASSP Magazine. IEEE. vol. 1, no. 2, pp. 4–29 (1984) Gray, R.M.: Vector quantization. ASSP Magazine. IEEE. vol. 1, no. 2, pp. 4–29 (1984)
34.
Zurück zum Zitat Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE. transac. Image Processing: a publication of the IEEE Signal Processing Society, vol. 22, no. 12, pp. 5071–5084 (2013) Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE. transac. Image Processing: a publication of the IEEE Signal Processing Society, vol. 22, no. 12, pp. 5071–5084 (2013)
35.
Zurück zum Zitat Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. Cybernetics. IEEE. Trans. 44(8), 1408–1419 (2013)CrossRef Zhang, L., Gao, Y., Hong, C., Feng, Y., Zhu, J., Cai, D.: Feature correlation hypergraph: exploiting high-order potentials for multimodal recognition. Cybernetics. IEEE. Trans. 44(8), 1408–1419 (2013)CrossRef
36.
Zurück zum Zitat Zhang, L., Yang, Y., Gao, Y., Yi, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly-supervised images. Image. Processing. IEEE. Trans.23(9), 4150–4159 (2014)MathSciNetCrossRef Zhang, L., Yang, Y., Gao, Y., Yi, Y., Wang, C., Li, X.: A probabilistic associative model for segmenting weakly-supervised images. Image. Processing. IEEE. Trans.23(9), 4150–4159 (2014)MathSciNetCrossRef
37.
Zurück zum Zitat Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. Ind Electron. IEEE. Trans. (2014) Zhang, L., Gao, Y., Xia, Y., Dai, Q., Li, X.: A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. Ind Electron. IEEE. Trans. (2014)
Metadaten
Titel
Effective optimizations of cluster-based nearest neighbor search in high-dimensional space
verfasst von
Xiaokang Feng
Jiangtao Cui
Yingfan Liu
Hui Li
Publikationsdatum
24.12.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Multimedia Systems / Ausgabe 1/2017
Print ISSN: 0942-4962
Elektronische ISSN: 1432-1882
DOI
https://doi.org/10.1007/s00530-014-0444-3

Weitere Artikel der Ausgabe 1/2017

Multimedia Systems 1/2017 Zur Ausgabe

Neuer Inhalt