skip to main content
10.1145/2671188.2749310acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Scalable Multimodal Search with Distributed Indexing by Sparse Hashing

Authors Info & Claims
Published:22 June 2015Publication History

ABSTRACT

Multimedia search systems must deal with an increasingly large and heterogeneous amount of data. Several challenges exist when deploying real-world search engines for such data. Existing literature does not properly tackle the many efficiency issues that such task requires. In this paper, we address several of the key efficiency aspects required to deploy a distributed search engine, capable of handling several millions of multimedia documents. The search engine builds on a framework designed to: first, ease the distribution of documents and queries across cluster-nodes, second, index media efficiently for fast similarity search and third aggregate ranked results from several heterogeneous sources. Moreover, the proposed framework is flexible enough to support several state-of-the-art indexing and aggregation techniques.

At the heart of the indexing architecture lies an inverse index structure optimized for sparse hashes, that speeds up the retrieval of similar descriptors. To leverage the distributed nature of the search framework, the proposed aggregation technique offers a low temporal complexity overhead and it is agnostic to the index type (a key aspect to support simultaneous modalities). A comprehensive evaluation with both general IR metrics and efficiency metrics, provides a unique assessment of the several efficiency bottlenecks faced by a search engine. In addition, we test the scalability of the search framework to multiple index sizes, i.e., up to 5 million documents per cluster-node.

References

  1. M. Abedini, L. Cao, N. Codella, J. H. Connell, A. Garnavi, Rahiland Geva, M. Merler, Q.-B. Nguyen, J. U. Pankanti, Sharathchandraand R. Smith, and A. Sun, Xingzhiand Tzadok. Ibm research at imageclef 2013 medical tasks. In WN of CLEF'13, 2013.Google ScholarGoogle Scholar
  2. M. Aharon, M. Elad, and A. Bruckstein. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Trans. on Signal Processing, 54(11):4311--4322, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Aly, D. Hiemstra, and T. Demeester. Taily: Shard selection using the tail of score distributions. In SIGIR'13, pages 673--682, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117--122, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Badue, R. Baeza-Yates, B. Ribeiro-neto, and N. Ziviani. Distributed query processing using partitioned inverted files. In SPIRE'01, pages 10--20. IEEE CS Press, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  6. R. Baeza-Yates, V. Murdock, and C. Hauff. Efficiency trade-offs in two-tier web search systems. In SIGIR'09, pages 163--170, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Batko, F. Falchi, C. Lucchese, D. Novak, R. Perego, F. Rabitti, J. Sedmidubsky, and P. Zezula. Building a web-scale image similarity search system. Multimedia Tools and Applications, 47(3):599--629, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Borges, A. Mourão, and J. Magalhães. High-dimensional indexing by sparse approximation. In ICMR'15. ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Büttcher, C. L. Clarke, and G. V. Cormack. Information retrieval: Implementing and evaluating search engines. Mit Press, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. V. Cormack, C. L. A. Clarke, and S. Buettcher. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proc of. SIGIR '09, pages 758--759, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Garcia Seco de Herrera, J. Kalpathy-Cramer, D. Demner Fushman, S. Antani, and H. Müller. Overview of the imageclef 2013 medical tasks. In WN on CLEF'13, 2013.Google ScholarGoogle Scholar
  13. J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon. Spherical hashing. In CVPR'12, pages 2957--2964, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. on PAMI, 33(1):117--128, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Jégou, T. Furon, and J.-J. Fuchs. Anti-sparse coding for approximate nearest neighbor search. In Proc. on ICASSP'98, pages 2029--2032, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  16. H. Jégou, R. Tavenard, M. Douze, and L. Amsaleg. Searching in one billion vectors: re-rank with source coding. ArXiv e-prints, 2011.Google ScholarGoogle Scholar
  17. R. Ji, L.-Y. Duan, J. Chen, L. Xie, H. Yao, and W. Gao. Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans. on Multimedia, 15(1):153--166, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Lee, A. Battle, R. Raina, and A. Ng. Efficient sparse coding algorithms. NIPS'06, pages 801--808, 2006.Google ScholarGoogle Scholar
  19. M. S. Lewicki and T. J. Sejnowski. Learning overcomplete representations. Neural Comput., 12(2):337--365, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Macdonald and I. Ounis. Voting for candidates: adapting data fusion techniques for an expert search task. In CIKM'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Mohamed and S. Marchand-Maillet. Distributed media indexing based on mpi and mapreduce. Multimedia Tools and Applications, 69(2):513--537, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Moise, D. Shestakov, G. Gudmundsson, and L. Amsaleg. Indexing and searching 100m images with map-reduce. In ICMR'13, pages 17--24, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Montague and J. A. Aslam. Relevance score normalization for metasearch. In CIKM'01, pages 427--433, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Mourão, F. Martins, and J. Magalhães. Inverse square rank fusion for multimodal search. In CBMI'14, pages 1--6. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Muja and D. G. Lowe. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. on PAMI, 36, 2014.Google ScholarGoogle Scholar
  26. D. Novak and P. Zezula. M-chord: A scalable distributed similarity search structure. In InfoScale'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Pati, R. Rezaiifar, and P. Krishnaprasad. Orthogonal Matching Pursuit: recursive function approximation with application to wavelet decomposition. In ASILOMAR, pages 40--44, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  28. J. A. Shaw and E. A. Fox. Combination of multiple searches. In TREC'94, pages 243--252, 1994.Google ScholarGoogle Scholar
  29. T. Skopal, J. Pokornỳ, and V. Snasel. Pm-tree: Pivoting metric tree for similarity search in multimedia databases. In ADBIS'04, 2004.Google ScholarGoogle Scholar
  30. E. Tiakas, D. Rafailidis, A. Dimou, and P. Daras. Msidx: Multi-sort indexing for efficient content-based image search and retrieval. Trans. on Multimedia, 15(6):1415--1430, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. Trans. on PAMI, 30(11):1958--1970, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. Torralba, R. Fergus, and Y. Weiss. Small codes and large image databases for recognition. In CVPR'08, pages 1--8, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  33. Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. NIPS'08, 9(1):1--8, 2008.Google ScholarGoogle Scholar
  34. A. Y. Yang, A. Ganesh, Z. Zhou, S. Sastry, and Y. Ma. A review of fast l1-minimization algorithms for robust face recognition. Arxiv CoRR, abs/1007.3753, 2010.Google ScholarGoogle Scholar
  35. X. Zhu, Z. Huang, H. Cheng, J. Cui, and H. T. Shen. Sparse hashing for fast multimedia search. ACM Trans. Inf. Syst., 31(2):1--9, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable Multimodal Search with Distributed Indexing by Sparse Hashing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMR '15: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval
      June 2015
      700 pages
      ISBN:9781450332743
      DOI:10.1145/2671188

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 June 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      ICMR '15 Paper Acceptance Rate48of127submissions,38%Overall Acceptance Rate254of830submissions,31%

      Upcoming Conference

      ICMR '24
      International Conference on Multimedia Retrieval
      June 10 - 14, 2024
      Phuket , Thailand

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader