research-article

Scalable Multimodal Search with Distributed Indexing by Sparse Hashing

Authors:
André Mourão

Universidade Nova de Lisboa - Faculdade de Ciências e Tecnologia, Monte de Caparica, Portugal

Universidade Nova de Lisboa - Faculdade de Ciências e Tecnologia, Monte de Caparica, Portugal
View Profile

,
João Magalhães

Universidade Nova de Lisboa - Faculdade de Ciências e Tecnologia, Monte de Caparica, Portugal

Universidade Nova de Lisboa - Faculdade de Ciências e Tecnologia, Monte de Caparica, Portugal
View Profile

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia RetrievalJune 2015Pages 283–290https://doi.org/10.1145/2671188.2749310

Published:22 June 2015Publication History

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Pages 283–290

ABSTRACT

Multimedia search systems must deal with an increasingly large and heterogeneous amount of data. Several challenges exist when deploying real-world search engines for such data. Existing literature does not properly tackle the many efficiency issues that such task requires. In this paper, we address several of the key efficiency aspects required to deploy a distributed search engine, capable of handling several millions of multimedia documents. The search engine builds on a framework designed to: first, ease the distribution of documents and queries across cluster-nodes, second, index media efficiently for fast similarity search and third aggregate ranked results from several heterogeneous sources. Moreover, the proposed framework is flexible enough to support several state-of-the-art indexing and aggregation techniques.

At the heart of the indexing architecture lies an inverse index structure optimized for sparse hashes, that speeds up the retrieval of similar descriptors. To leverage the distributed nature of the search framework, the proposed aggregation technique offers a low temporal complexity overhead and it is agnostic to the index type (a key aspect to support simultaneous modalities). A comprehensive evaluation with both general IR metrics and efficiency metrics, provides a unique assessment of the several efficiency bottlenecks faced by a search engine. In addition, we test the scalability of the search framework to multiple index sizes, i.e., up to 5 million documents per cluster-node.

References

M. Abedini, L. Cao, N. Codella, J. H. Connell, A. Garnavi, Rahiland Geva, M. Merler, Q.-B. Nguyen, J. U. Pankanti, Sharathchandraand R. Smith, and A. Sun, Xingzhiand Tzadok. Ibm research at imageclef 2013 medical tasks. In WN of CLEF'13, 2013.Google Scholar
M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Trans. on Signal Processing, 54(11):4311--4322, 2006. Google ScholarDigital Library
R. Aly, D. Hiemstra, and T. Demeester. Taily: Shard selection using the tail of score distributions. In SIGIR'13, pages 673--682, 2013. Google ScholarDigital Library
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117--122, 2008. Google ScholarDigital Library
C. Badue, R. Baeza-Yates, B. Ribeiro-neto, and N. Ziviani. Distributed query processing using partitioned inverted files. In SPIRE'01, pages 10--20. IEEE CS Press, 2001.Google ScholarCross Ref
R. Baeza-Yates, V. Murdock, and C. Hauff. Efficiency trade-offs in two-tier web search systems. In SIGIR'09, pages 163--170, 2009. Google ScholarDigital Library
M. Batko, F. Falchi, C. Lucchese, D. Novak, R. Perego, F. Rabitti, J. Sedmidubsky, and P. Zezula. Building a web-scale image similarity search system. Multimedia Tools and Applications, 47(3):599--629, 2010. Google ScholarDigital Library
P. Borges, A. Mourão, and J. Magalhães. High-dimensional indexing by sparse approximation. In ICMR'15. ACM, 2015. Google ScholarDigital Library
S. Büttcher, C. L. Clarke, and G. V. Cormack. Information retrieval: Implementing and evaluating search engines. Mit Press, 2010. Google ScholarDigital Library
G. V. Cormack, C. L. A. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc of. SIGIR '09, pages 758--759, 2009. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarDigital Library
A. Garcia Seco de Herrera, J. Kalpathy-Cramer, D. Demner Fushman, S. Antani, and H. Müller. Overview of the imageclef 2013 medical tasks. In WN on CLEF'13, 2013.Google Scholar
J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon. Spherical hashing. In CVPR'12, pages 2957--2964, 2012. Google ScholarDigital Library
H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. on PAMI, 33(1):117--128, 2011. Google ScholarDigital Library
H. Jégou, T. Furon, and J.-J. Fuchs. Anti-sparse coding for approximate nearest neighbor search. In Proc. on ICASSP'98, pages 2029--2032, 2012.Google ScholarCross Ref
H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. Searching in one billion vectors: re-rank with source coding. ArXiv e-prints, 2011.Google Scholar
R. Ji, L.-Y. Duan, J. Chen, L. Xie, H. Yao, and W. Gao. Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans. on Multimedia, 15(1):153--166, 2013. Google ScholarDigital Library
H. Lee, A. Battle, R. Raina, and A. Ng. Efficient sparse coding algorithms. NIPS'06, pages 801--808, 2006.Google Scholar
M. S. Lewicki and T. J. Sejnowski. Learning overcomplete representations. Neural Comput., 12(2):337--365, 2000. Google ScholarDigital Library
C. Macdonald and I. Ounis. Voting for candidates: adapting data fusion techniques for an expert search task. In CIKM'06, 2006. Google ScholarDigital Library
H. Mohamed and S. Marchand-Maillet. Distributed media indexing based on mpi and mapreduce. Multimedia Tools and Applications, 69(2):513--537, 2014. Google ScholarDigital Library
D. Moise, D. Shestakov, G. Gudmundsson, and L. Amsaleg. Indexing and searching 100m images with map-reduce. In ICMR'13, pages 17--24, 2013. Google ScholarDigital Library
M. Montague and J. A. Aslam. Relevance score normalization for metasearch. In CIKM'01, pages 427--433, 2001. Google ScholarDigital Library
A. Mourão, F. Martins, and J. Magalhães. Inverse square rank fusion for multimodal search. In CBMI'14, pages 1--6. IEEE, 2014.Google ScholarCross Ref
M. Muja and D. G. Lowe. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. on PAMI, 36, 2014.Google Scholar
D. Novak and P. Zezula. M-chord: A scalable distributed similarity search structure. In InfoScale'06, 2006. Google ScholarDigital Library
Y. Pati, R. Rezaiifar, and P. Krishnaprasad. Orthogonal Matching Pursuit: recursive function approximation with application to wavelet decomposition. In ASILOMAR, pages 40--44, 1993.Google ScholarCross Ref
J. A. Shaw and E. A. Fox. Combination of multiple searches. In TREC'94, pages 243--252, 1994.Google Scholar
T. Skopal, J. Pokornỳ, and V. Snasel. Pm-tree: Pivoting metric tree for similarity search in multimedia databases. In ADBIS'04, 2004.Google Scholar
E. Tiakas, D. Rafailidis, A. Dimou, and P. Daras. Msidx: Multi-sort indexing for efficient content-based image search and retrieval. Trans. on Multimedia, 15(6):1415--1430, 2013. Google ScholarDigital Library
A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. Trans. on PAMI, 30(11):1958--1970, 2008. Google ScholarDigital Library
A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In CVPR'08, pages 1--8, 2008.Google ScholarCross Ref
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS'08, 9(1):1--8, 2008.Google Scholar
A. Y. Yang, A. Ganesh, Z. Zhou, S. Sastry, and Y. Ma. A review of fast l1-minimization algorithms for robust face recognition. Arxiv CoRR, abs/1007.3753, 2010.Google Scholar
X. Zhu, Z. Huang, H. Cheng, J. Cui, and H. T. Shen. Sparse hashing for fast multimedia search. ACM Trans. Inf. Syst., 31(2):1--9, 2013. Google ScholarDigital Library

Index Terms

Scalable Multimodal Search with Distributed Indexing by Sparse Hashing
1. Information systems
  1. Information retrieval

Recommendations

Scalable Supervised Discrete Hashing for Large-Scale Search
WWW '18: Proceedings of the 2018 World Wide Web Conference

Supervised hashing methods have attracted much attention in these years. However, most existing supervised hashing algorithms have some of the following problems. First, most of them leverage the pairwise similarity matrix, whose size is quadratic to ...
Read More
Effective rank aggregation for metasearching

Nowadays, mashup services and especially metasearch engines play an increasingly important role on the Web. Most of users use them directly or indirectly to access and aggregate information from more than one data sources. Similarly to the rest of the ...
Read More
High-Dimensional Sparse Cross-Modal Hashing with Fine-Grained Similarity Embedding
WWW '21: Proceedings of the Web Conference 2021

Recently, with the discoveries in neurobiology, high-dimensional sparse hashing has attracted increasing attention. In contrast with general hashing that generates low-dimensional hash codes, the high-dimensional sparse hashing maps inputs into a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval
June 2015
700 pages
ISBN:9781450332743
DOI:10.1145/2671188
General Chairs:
Alex Hauptmann
Carnegie Mellon University, USA
,
Chong-Wah Ngo
City University of Hong Kong, China
,
Xiangyang Xue
Fudan University, China
,
Program Chairs:
Yu-Gang Jiang
Fudan University, China
,
Cees Snoek
University of Amsterdam and Qualcomm Research Netherlands
,
Nuno Vasconcelos
University of California, San Diego, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 June 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed search
rank fusion
scalable search
sparse hashing
Qualifiers
- research-article
Conference

Acceptance Rates
ICMR '15 Paper Acceptance Rate48of127submissions,38%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 165
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable Multimodal Search with Distributed Indexing by Sparse Hashing

ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scalable Supervised Discrete Hashing for Large-Scale Search

Effective rank aggregation for metasearching

High-Dimensional Sparse Cross-Modal Hashing with Fine-Grained Similarity Embedding