Skip to main content
Top
Published in: International Journal of Multimedia Information Retrieval 1/2018

18-11-2017 | Regular Paper

Balancing search space partitions by sparse coding for distributed redundant media indexing and retrieval

Authors: André Mourão, João Magalhães

Published in: International Journal of Multimedia Information Retrieval | Issue 1/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Effective partitioning multimedia indexes is key for efficient kNN search. But existing algorithms are based on document similarity, without partition size or redundancy constraints. Our goal is to create an index partitioning algorithm that addresses the specific properties of a distributed system: load balancing across nodes, redundancy in node failure and efficient node usage under concurrent querying. We propose the representation of data with overcomplete codebooks. Each document is quantized into a small set of codewords and indexed on per-codeword partitions. Quantization algorithms are designed to fit data as best as possible, leading to a bias toward codewords that fit the principal directions of data in the original space. In this paper, we propose the balanced KSVD (B-KSVD) algorithm: It distributes data uniformly across codewords, according to the distribution in the original space. The comprehensive experiments focused on measuring the effectiveness of partition size balancing and retrieval quality. Results show that B-KSVD better balances partition sizes (i.e., lower SD in partition size distribution), compared to k-means and KSVD baselines. B-KSVD achieves 38% 1-recall by inspecting only 1% of the full index, distributed over 10 partitions. k-means creates partitions with higher size variation and requires either larger codebooks or the inspection of larger portions of the index to achieve similar retrieval performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322CrossRefMATH Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322CrossRefMATH
2.
go back to reference Aly M, Munich M, Perona P (2011) Distributed Kd-trees for retrieval from very large image collections. In: Proceedings of BMVC Aly M, Munich M, Perona P (2011) Distributed Kd-trees for retrieval from very large image collections. In: Proceedings of BMVC
3.
go back to reference Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. ACM Commun 51(1):117–122CrossRef Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. ACM Commun 51(1):117–122CrossRef
4.
go back to reference Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of ACM-SIAM SODA, pp 1027–1035 Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of ACM-SIAM SODA, pp 1027–1035
6.
go back to reference Babenko A, Lempitsky V (2016) Efficient indexing of billion-scale datasets of deep descriptors. In: Proceedings of IEEE CVPR Babenko A, Lempitsky V (2016) Efficient indexing of billion-scale datasets of deep descriptors. In: Proceedings of IEEE CVPR
7.
go back to reference Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2010) Building a web-scale image similarity search system. Multimed Tool Appl 47(3):599–629CrossRef Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2010) Building a web-scale image similarity search system. Multimed Tool Appl 47(3):599–629CrossRef
8.
go back to reference Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, NorwellCrossRefMATH Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, NorwellCrossRefMATH
9.
go back to reference Borges P, Mourão A, Magalhães J (2015) High-dimensional indexing by sparse approximation. In: ACM ICMR’15, ACM Borges P, Mourão A, Magalhães J (2015) High-dimensional indexing by sparse approximation. In: ACM ICMR’15, ACM
10.
go back to reference Büttcher S, Clarke CL, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. MIT Press, CambridgeMATH Büttcher S, Clarke CL, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. MIT Press, CambridgeMATH
11.
go back to reference Cherian A, Sra S, Morellas V, Papanikolopoulos N (2014) Efficient nearest neighbors via robust sparse hashing. IEEE TIP 23(8):3646–3655MathSciNetMATH Cherian A, Sra S, Morellas V, Papanikolopoulos N (2014) Efficient nearest neighbors via robust sparse hashing. IEEE TIP 23(8):3646–3655MathSciNetMATH
12.
go back to reference Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: Proceedings of BMVC Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: Proceedings of BMVC
13.
go back to reference Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of SCG, ACM, pp 253–262 Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of SCG, ACM, pp 253–262
14.
go back to reference Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
15.
go back to reference Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp 226–231 Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp 226–231
20.
go back to reference Jégou H, Tavenard R, Douze M, Amsaleg L (2011) Searching in one billion vectors: re-rank with source coding. arXiv e-prints arXiv:1102.3828 Jégou H, Tavenard R, Douze M, Amsaleg L (2011) Searching in one billion vectors: re-rank with source coding. arXiv e-prints arXiv:​1102.​3828
23.
go back to reference Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of ACM STOC, STOC ’97, pp 654–663 Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of ACM STOC, STOC ’97, pp 654–663
26.
go back to reference Li Z, Ning H, Cao L, Zhan T, Gong Y, Huang TS (2011) Learning to search efficiently in high dimensions. In: Neural information processing systems Li Z, Ning H, Cao L, Zhan T, Gong Y, Huang TS (2011) Learning to search efficiently in high dimensions. In: Neural information processing systems
31.
go back to reference Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans PAMI 36:2227–2240CrossRef Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans PAMI 36:2227–2240CrossRef
32.
go back to reference Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609CrossRef Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609CrossRef
33.
go back to reference Pati Y, Rezaiifar R, Krishnaprasad P (1993) Orthogonal Matching Pursuit : recursive function approximation with application to wavelet decomposition. In: Asilomar Conference on Signals, Systems and Computer Pati Y, Rezaiifar R, Krishnaprasad P (1993) Orthogonal Matching Pursuit : recursive function approximation with application to wavelet decomposition. In: Asilomar Conference on Signals, Systems and Computer
34.
go back to reference Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: NIPS, pp 1509–1517 Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: NIPS, pp 1509–1517
37.
39.
go back to reference Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of VLDB, pp 194–205 Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of VLDB, pp 194–205
40.
go back to reference Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. NIPS 9(1):6 Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. NIPS 9(1):6
41.
go back to reference Yang Z, Kamata SI, Ahrary A (2009) NIR: content based image retrieval on cloud computing. Proc ICIS 3:556–559 Yang Z, Kamata SI, Ahrary A (2009) NIR: content based image retrieval on cloud computing. Proc ICIS 3:556–559
Metadata
Title
Balancing search space partitions by sparse coding for distributed redundant media indexing and retrieval
Authors
André Mourão
João Magalhães
Publication date
18-11-2017
Publisher
Springer London
Published in
International Journal of Multimedia Information Retrieval / Issue 1/2018
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI
https://doi.org/10.1007/s13735-017-0140-0

Other articles of this Issue 1/2018

International Journal of Multimedia Information Retrieval 1/2018 Go to the issue

Premium Partner