skip to main content
10.1145/2461466.2461470acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Indexing and searching 100M images with map-reduce

Published:16 April 2013Publication History

ABSTRACT

Most researchers working on high-dimensional indexing agree on the following three trends: (i) the size of the multimedia collections to index are now reaching millions if not billions of items, (ii) the computers we use every day now come with multiple cores and (iii) hardware becomes more available, thanks to easier access to Grids and/or Clouds. This paper shows how the Map-Reduce paradigm can be applied to indexing algorithms and demonstrates that great scalability can be achieved using Hadoop, a popular Map-Reduce-based framework. Dramatic performance improvements are not however guaranteed a priori: such frameworks are rigid, they severely constrain the possible access patterns to data and scares resource RAM has to be shared. Furthermore, algorithms require major redesign, and may have to settle for sub-optimal behavior. The benefits, however, are many: simplicity for programmers, automatic distribution, fault tolerance, failure detection and automatic re-runs and, last but not least, scalability. We share our experience of adapting a clustering-based high-dimensional indexing algorithm to the Map-Reduce model, and of testing it at large scale with Hadoop as we index 30 billion SIFT descriptors. We foresee that lessons drawn from our work could minimize time, effort and energy invested by other researchers and practitioners working in similar directions.

References

  1. M. Batko, F. Falchi, C. Lucchese, D. Novak, R. Perego, F. Rabitti, J. Sedmidubský, and P. Zezula. Building a web-scale image similarity search system. Multimedia Tools Appl., 47(3), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Chierichetti, A. Panconesi, P. Raghavan, M. Sozio, A. Tiberi, and E. Upfal. Finding near neighbors through cluster pruning. In PODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In SCG, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Douze, H. Jégou, H. Singh, L. Amsaleg, and C. Schmid. Evaluation of gist descriptors for web-scale image search. In CIVR, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Gudmundsson, B. T. Jónsson, and L. Amsaleg. A large-scale performance study of cluster-based high-dimensional indexing. In VLS-MCMR Workshop with ACM MM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. S. Hare, S. Samangooei, D. P. Dupplaw, and P. H. Lewis. Imageterrier: an extensible platform for scalable high-performance image retrieval. In ICMR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Jégou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. on PAMI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Trans. on PAMI, 2011.Google ScholarGoogle Scholar
  13. H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. Searching in one billion vectors: Re-rank with source coding. In ICASSP, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  14. Y. Jégou, S. Lantéri, J. Leduc, and all. Grid'5000: a large scale and highly reconfigurable experimental Grid testbed. Intl. Journal of HPC Applications, 20(4), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Joly and O. Buisson. A posteriori multi-probe locality sensitive hashing. In MM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Lejsek, F. H. Amundsson, B. T. Jónsson, and L. Amsaleg. NV-Tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE Trans. on PAMI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Lejsek, B. T. Jónsson, and L. Amsaleg. NV-Tree: nearest neighbors at the billion scale. In ICMR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Lowe. Distinctive image features from scale invariant keypoints. IJCV, 60(2), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Multi-probe lsh: efficient indexing for high-dimensional similarity search. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Paulevé, H. Jégou, and L. Amsaleg. Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  22. K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The hadoop distributed file system. In MSST, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In ICCV, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Indexing and searching 100M images with map-reduce

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                ICMR '13: Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
                April 2013
                362 pages
                ISBN:9781450320337
                DOI:10.1145/2461466

                Copyright © 2013 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 16 April 2013

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                ICMR '13 Paper Acceptance Rate38of96submissions,40%Overall Acceptance Rate254of830submissions,31%

                Upcoming Conference

                ICMR '24
                International Conference on Multimedia Retrieval
                June 10 - 14, 2024
                Phuket , Thailand

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader