skip to main content
10.1145/2857546.2857607acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

An Index Scheme for Similarity Search on Cloud Computing using MapReduce over Docker Container

Authors Info & Claims
Published:04 January 2016Publication History

ABSTRACT

We consider the problem of similarity search over the large datasets in the distributed environment. The proposed framework employs the Vp-Tree algorithm that integrated on top of the MapReduce framework to achieve good performance as well as meet the scalability and fault tolerance requirements for the system while data scale up. Since VP-Tree algorithm was implemented initially for partition and searching data in the local disk access, we proposed a new approach to using it in the parallel environment. The key point of the Vp-Tree algorithm is that it distributed the similar data points into groups, thereby reducing number of data need to scan during the searching stage. Consequently, the response time of the entire system has been improved. Otherwise, we used an open source computer vision library OpenCV for detect the similarity among images in the dataset. We evaluate the performance of our proposed framework using a synthetic data to show the positive of our approach. The experiment shows that our proposed framework achieves 57% improvement in response time in comparison with running searching job in tradition Hadoop framework. We also compared our application running time on Docker container against VM-based environment. The result points out that deploy our system over Docker container provide higher performance than VM-based environment in term of response time.

References

  1. OpenCV Library Tutorials. http://docs.opencv.org/doc/tutorials/tutorials.html.Google ScholarGoogle Scholar
  2. H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (surf). Comput. Vis. Image Underst., 110(3):346--359, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Berchtold, D. A. Keim, and H.-P. Kriegel. The x-tree: An index structure for high-dimensional data. In Proceedings of the 22th International Conference on Very Large Data Bases, VLDB '96, pages 28--39, San Francisco, CA, USA, 1996. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Chakrabarti and S. Mehrotra. The hybrid tree: an index structure for high dimensional feature spaces. In Data Engineering, 1999. Proceedings., 15th International Conference on, pages 440--447, Mar 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. J. Crandall, L. Backstrom, D. Huttenlocher, and J. Kleinberg. Mapping the world's photos. In Proceedings of the 18th International Conference on World Wide Web, WWW '09, pages 761--770, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, Jan. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pages 518--529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Kumar, L. Zhang, and S. Nayar. What is a good nearest neighbors algorithm for finding similar patches in images? In Proceedings of the 10th European Conference on Computer Vision: Part II, ECCV '08, pages 364--378, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Liao, J. Han, and J. Fang. Multi-dimensional index on hadoop distributed file system. In Proceedings of the 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage, NAS '10, pages 240--249, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Merkel. Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239):2, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Moise, D. Shestakov, G. Gudmundsson, and L. Amsaleg. Indexing and searching 100m images with map-reduce. In Proceedings of the 3rd ACM Conference on International Conference on Multimedia Retrieval, ICMR '13, pages 17--24, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. VISAPP (1), 2, 2009.Google ScholarGoogle Scholar
  13. M. Slaney and M. Casey. Locality-sensitive hashing for finding nearest neighbors {lecture notes}. Signal Processing Magazine, IEEE, 25(2):128--131, March 2008.Google ScholarGoogle ScholarCross RefCross Ref
  14. P. N. Yianilos. Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '93, pages 311--321, Philadelphia, PA, USA, 1993. Society for Industrial and Applied Mathematics. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Index Scheme for Similarity Search on Cloud Computing using MapReduce over Docker Container

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        IMCOM '16: Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication
        January 2016
        658 pages
        ISBN:9781450341424
        DOI:10.1145/2857546

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 January 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate213of621submissions,34%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader