research-article

Indexing and searching 100M images with map-reduce

Authors:
Diana Moise

INRIA, Rennes, France

INRIA, Rennes, France
View Profile

,
Denis Shestakov

INRIA, Rennes, France

INRIA, Rennes, France
View Profile

,
Gylfi Gudmundsson

INRIA, Rennes, France

INRIA, Rennes, France
View Profile

,
Laurent Amsaleg

IRISA - CNRS, Rennes, France

IRISA - CNRS, Rennes, France
View Profile

ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrievalApril 2013Pages 17–24https://doi.org/10.1145/2461466.2461470

Published:16 April 2013Publication History

ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Pages 17–24

ABSTRACT

Most researchers working on high-dimensional indexing agree on the following three trends: (i) the size of the multimedia collections to index are now reaching millions if not billions of items, (ii) the computers we use every day now come with multiple cores and (iii) hardware becomes more available, thanks to easier access to Grids and/or Clouds. This paper shows how the Map-Reduce paradigm can be applied to indexing algorithms and demonstrates that great scalability can be achieved using Hadoop, a popular Map-Reduce-based framework. Dramatic performance improvements are not however guaranteed a priori: such frameworks are rigid, they severely constrain the possible access patterns to data and scares resource RAM has to be shared. Furthermore, algorithms require major redesign, and may have to settle for sub-optimal behavior. The benefits, however, are many: simplicity for programmers, automatic distribution, fault tolerance, failure detection and automatic re-runs and, last but not least, scalability. We share our experience of adapting a clustering-based high-dimensional indexing algorithm to the Map-Reduce model, and of testing it at large scale with Hadoop as we index 30 billion SIFT descriptors. We foresee that lessons drawn from our work could minimize time, effort and energy invested by other researchers and practitioners working in similar directions.

References

M. Batko, F. Falchi, C. Lucchese, D. Novak, R. Perego, F. Rabitti, J. Sedmidubský, and P. Zezula. Building a web-scale image similarity search system. Multimedia Tools Appl., 47(3), 2010. Google ScholarDigital Library
F. Chierichetti, A. Panconesi, P. Raghavan, M. Sozio, A. Tiberi, and E. Upfal. Finding near neighbors through cluster pruning. In PODS, 2007. Google ScholarDigital Library
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, 2004. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1), 2008. Google ScholarDigital Library
M. Douze, H. Jégou, H. Singh, L. Amsaleg, and C. Schmid. Evaluation of gist descriptors for web-scale image search. In CIVR, 2009. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP, 2003. Google ScholarDigital Library
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarDigital Library
G. Gudmundsson, B. T. Jónsson, and L. Amsaleg. A large-scale performance study of cluster-based high-dimensional indexing. In VLS-MCMR Workshop with ACM MM, 2010. Google ScholarDigital Library
J. S. Hare, S. Samangooei, D. P. Dupplaw, and P. H. Lewis. Imageterrier: an extensible platform for scalable high-performance image retrieval. In ICMR, 2012. Google ScholarDigital Library
H. Jégou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.Google ScholarDigital Library
H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. on PAMI, 2011. Google ScholarDigital Library
H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Trans. on PAMI, 2011.Google Scholar
H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. Searching in one billion vectors: Re-rank with source coding. In ICASSP, 2011.Google ScholarCross Ref
Y. Jégou, S. Lantéri, J. Leduc, and all. Grid'5000: a large scale and highly reconfigurable experimental Grid testbed. Intl. Journal of HPC Applications, 20(4), 2006. Google ScholarDigital Library
A. Joly and O. Buisson. A posteriori multi-probe locality sensitive hashing. In MM, 2008. Google ScholarDigital Library
H. Lejsek, F. H. Amundsson, B. T. Jónsson, and L. Amsaleg. NV-Tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE Trans. on PAMI, 2009. Google ScholarDigital Library
H. Lejsek, B. T. Jónsson, and L. Amsaleg. NV-Tree: nearest neighbors at the billion scale. In ICMR, 2011. Google ScholarDigital Library
D. Lowe. Distinctive image features from scale invariant keypoints. IJCV, 60(2), 2004. Google ScholarDigital Library
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, 2007. Google ScholarDigital Library
L. Paulevé, H. Jégou, and L. Amsaleg. Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters, 2010. Google ScholarDigital Library
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008.Google ScholarCross Ref
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In MSST, 2010. Google ScholarDigital Library
J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In ICCV, 2003. Google ScholarDigital Library

Index Terms

Indexing and searching 100M images with map-reduce
1. Information systems

Recommendations

Scale-out beyond map-reduce
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

The amount and variety of data being collected in the enterprise is growing at a staggering pace. The default now is to capture and store any and all data, in anticipation of potential future strategic value, and vast amounts of data are being generated ...
Read More
Comparison of Map-Reduce and SQL on Large-Scale Data Processing
ISPA '10: Proceedings of the International Symposium on Parallel and Distributed Processing with Applications

Popularity for the term ‘Cloud-Computing’ has been increasing in recent years. There are many great companies such as Yahoo, Google etc. tried to provide related services to business community, even through public users. In addition to the SQL technique,...
Read More
Rainfall Prediction using Artificial Neural Network on Map-Reduce Framework
WCI '15: Proceedings of the Third International Symposium on Women in Computing and Informatics

Big data is a celebrated topic in Business as well as research community for several years. With the revolution of Big Data, it is becoming easy and less expensive to store tremendous amount of data for future analysis. Weather data gets accumulated very ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
April 2013
362 pages
ISBN:9781450320337
DOI:10.1145/2461466
General Chairs:
Ramesh Jain
University of California, Irvine, USA
,
Balakrisknan Prabhakaran
University of Texas at Dallas, USA
,
Program Chairs:
Marcel Worring
University of Amsterdam, The Netherlands
,
John Smith
IBM Research, New York, USA
,
Tat-Seng Chua
National University of Singapore
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 April 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
hadoop
high-dimensional indexing
map-reduce
Qualifiers
- research-article
Conference

Acceptance Rates
ICMR '13 Paper Acceptance Rate38of96submissions,40%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 50
  Total Citations
  View Citations
- 742
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Indexing and searching 100M images with map-reduce

ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scale-out beyond map-reduce

Comparison of Map-Reduce and SQL on Large-Scale Data Processing

Rainfall Prediction using Artificial Neural Network on Map-Reduce Framework

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Indexing and searching 100M images with map-reduce

ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Scale-out beyond map-reduce

Comparison of Map-Reduce and SQL on Large-Scale Data Processing

Rainfall Prediction using Artificial Neural Network on Map-Reduce Framework

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media