Top

International Journal of Multimedia Information Retrieval

Published in:

18-11-2017 | Regular Paper

Balancing search space partitions by sparse coding for distributed redundant media indexing and retrieval

Authors: André Mourão, João Magalhães

Published in: International Journal of Multimedia Information Retrieval | Issue 1/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Effective partitioning multimedia indexes is key for efficient kNN search. But existing algorithms are based on document similarity, without partition size or redundancy constraints. Our goal is to create an index partitioning algorithm that addresses the specific properties of a distributed system: load balancing across nodes, redundancy in node failure and efficient node usage under concurrent querying. We propose the representation of data with overcomplete codebooks. Each document is quantized into a small set of codewords and indexed on per-codeword partitions. Quantization algorithms are designed to fit data as best as possible, leading to a bias toward codewords that fit the principal directions of data in the original space. In this paper, we propose the balanced KSVD (B-KSVD) algorithm: It distributes data uniformly across codewords, according to the distribution in the original space. The comprehensive experiments focused on measuring the effectiveness of partition size balancing and retrieval quality. Results show that B-KSVD better balances partition sizes (i.e., lower SD in partition size distribution), compared to k-means and KSVD baselines. B-KSVD achieves 38% 1-recall by inspecting only 1% of the full index, distributed over 10 partitions. k-means creates partitions with higher size variation and requires either larger codebooks or the inspection of larger portions of the index to achieve similar retrieval performance.

previous article Estimating the information gap between textual and visual representations

next article Detection and visualization of misleading content on Twitter

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

http://corpus-texmex.irisa.fr/.

Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322CrossRefMATH

Aly M, Munich M, Perona P (2011) Distributed Kd-trees for retrieval from very large image collections. In: Proceedings of BMVC

Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. ACM Commun 51(1):117–122CrossRef

Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of ACM-SIAM SODA, pp 1027–1035

Babenko A, Lempitsky V (2015) The inverted multi-index. IEEE Trans PAMI 37(6):1247–1260. https://doi.org/10.1109/TPAMI.2014.2361319 CrossRef

Babenko A, Lempitsky V (2016) Efficient indexing of billion-scale datasets of deep descriptors. In: Proceedings of IEEE CVPR

Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2010) Building a web-scale image similarity search system. Multimed Tool Appl 47(3):599–629CrossRef

Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, NorwellCrossRefMATH

Borges P, Mourão A, Magalhães J (2015) High-dimensional indexing by sparse approximation. In: ACM ICMR’15, ACM

10.

Büttcher S, Clarke CL, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. MIT Press, CambridgeMATH

11.

Cherian A, Sra S, Morellas V, Papanikolopoulos N (2014) Efficient nearest neighbors via robust sparse hashing. IEEE TIP 23(8):3646–3655MathSciNetMATH

12.

Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: Proceedings of BMVC

13.

Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of SCG, ACM, pp 253–262

14.

Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef

15.

Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp 226–231

16.

Grauman K, Fergus R (2013) Chap: Learning binary hash codes for large-scale image search. In: Machine learning for computer vision. Springer, Berlin, pp 49–87. https://doi.org/10.1007/978-3-642-28661-2_3

17.

Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. https://doi.org/10.1126/science.1127647 MathSciNetCrossRefMATH

18.

Hoerl AE, Kennard RW (2004) Ridge regression. Wiley, New York. https://doi.org/10.1002/0471667196.ess2280 CrossRefMATH

19.

Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans PAMI 33(1):117–128. https://doi.org/10.1109/TPAMI.2010.57 CrossRef

20.

Jégou H, Tavenard R, Douze M, Amsaleg L (2011) Searching in one billion vectors: re-rank with source coding. arXiv e-prints arXiv:1102.3828

21.

Ji R, Duan LY, Chen J, Xie L, Yao H, Gao W (2013) Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans Multimed 15(1):153–166. https://doi.org/10.1109/TMM.2012.2225035 CrossRef

22.

Kalantidis Y, Avrithis Y (2014) Locally optimized product quantization for approximate nearest neighbor search. In: Proceedings of IEEE CVPR, pp 2329–2336. https://doi.org/10.1109/CVPR.2014.298

23.

Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of ACM STOC, STOC ’97, pp 654–663

24.

Kulkarni A, Callan J (2010) Document allocation policies for selective searching of distributed indexes. In: Proceedings of ACM CIKM, CIKM ’10, pp 449–458. https://doi.org/10.1145/1871437.1871497

25.

Lewicki MS, Sejnowski TJ (2000) Learning overcomplete representations. Neural Comput 12(2):337–365. https://doi.org/10.1162/089976600300015826 CrossRef

26.

Li Z, Ning H, Cao L, Zhan T, Gong Y, Huang TS (2011) Learning to search efficiently in high dimensions. In: Neural information processing systems

27.

Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137MathSciNetCrossRefMATH

28.

Magalhaes J, Rueger S (2007) High-dimensional visual vocabularies for image retrieval. In: ACM SIGIR’07, ACM, New York, NY, USA, pp 815–816. https://doi.org/10.1145/1277741.1277923

29.

Moise D, Shestakov D, Gudmundsson G, Amsaleg L (2013) Indexing and searching 100m images with map-reduce. In: ICMR’13, pp 17–24. https://doi.org/10.1145/2461466.2461470

30.

Mourão A, Magalhães Ja (2015) Scalable multimodal search with distributed indexing by sparse hashing. In: ACM ICMR’15, ACM, New York, NY, USA, pp 283–290. https://doi.org/10.1145/2671188.2749310

31.

Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans PAMI 36:2227–2240CrossRef

32.

Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609CrossRef

33.

Pati Y, Rezaiifar R, Krishnaprasad P (1993) Orthogonal Matching Pursuit : recursive function approximation with application to wavelet decomposition. In: Asilomar Conference on Signals, Systems and Computer

34.

Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: NIPS, pp 1509–1517

35.

Tavenard R, Jégou H, Amsaleg L (2011) Balancing clusters to reduce response time variability in large scale image search. In: International workshop on content-based multimedia indexing (CBMI 2011), Madrid, Spain. http://hal.inria.fr/inria-00576886,qUAERO

36.

Thaler DG, Ravishankar CV (1998) Using name-based mappings to increase hit rates. Trans Netw 6(1):1–14. https://doi.org/10.1109/90.663936 CrossRef

37.

Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288MathSciNetMATH

38.

Torralba A, Fergus R, Freeman W (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans PAMI 30(11):1958–1970. https://doi.org/10.1109/TPAMI.2008.128 CrossRef

39.

Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of VLDB, pp 194–205

40.

Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. NIPS 9(1):6

41.

Yang Z, Kamata SI, Ahrary A (2009) NIR: content based image retrieval on cloud computing. Proc ICIS 3:556–559

42.

Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320MathSciNetCrossRefMATH

Title: Balancing search space partitions by sparse coding for distributed redundant media indexing and retrieval
Authors: André Mourão
João Magalhães
Publication date: 18-11-2017
Publisher: Springer London
Published in: International Journal of Multimedia Information Retrieval / Issue 1/2018
Print ISSN: 2192-6611
Electronic ISSN: 2192-662X
DOI: https://doi.org/10.1007/s13735-017-0140-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2018

Digital watermarking for deep neural networks

MSRC: multimodal spatial regression with semantic context for phrase grounding

Estimating the information gap between textual and visual representations

Editorial for the ICMR 2017 special issue

Multimodal analysis of user behavior and browsed content under different image search intents

Detection and visualization of misleading content on Twitter

Premium Partner