skip to main content
10.1145/3123266.3123429acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open Access

Cross-Domain Image Retrieval with Attention Modeling

Authors Info & Claims
Published:23 October 2017Publication History

ABSTRACT

With the proliferation of e-commerce websites and the ubiquitousness of smart phones, cross-domain image retrieval using images taken by smart phones as queries to search products on e-commerce websites is emerging as a popular application. One challenge of this task is to locate the attention of both the query and database images. In particular, database images, e.g. of fashion products, on e-commerce websites are typically displayed with other accessories, and the images taken by users contain noisy background and large variations in orientation and lighting. Consequently, their attention is difficult to locate. In this paper, we exploit the rich tag information available on the e-commerce websites to locate the attention of database images. For query images, we use each candidate image in the database as the context to locate the query attention. Novel deep convolutional neural network architectures, namely TagYNet and CtxYNet, are proposed to learn the attention weights and then extract effective representations of the images. Experimental results on public datasets confirm that our approaches have significant improvement over the existing methods in terms of the retrieval accuracy and efficiency.

References

  1. Artem Babenko and Victor Lempitsky. 2015. Aggregating local deep features for image retrieval 2015 IEEE International Conference on Computer Vision (ICCV). 1269--1277. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).Google ScholarGoogle Scholar
  3. Aurélien Bellet, Amaury Habrard, and Marc Sebban. 2013. A survey on metric learning for feature vectors and structured data. arXiv preprint arXiv:1306.6709 (2013).Google ScholarGoogle Scholar
  4. Jiewei Cao, Lingqiao Liu, Peng Wang, Zi Huang, Chunhua Shen, and Heng Tao Shen. 2016. Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps. arXiv preprint arXiv:1606.06811 (2016).Google ScholarGoogle Scholar
  5. Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014).Google ScholarGoogle Scholar
  6. Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), Vol. Vol. 1. 539--546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z Wang. 2008. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys (Csur) Vol. 40, 2 (2008), 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 248--255.Google ScholarGoogle Scholar
  9. Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847 (2016).Google ScholarGoogle Scholar
  10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  11. Junshi Huang, Rogerio S Feris, Qiang Chen, and Shuicheng Yan. 2015. Cross-domain image retrieval with a dual attribute-aware ranking network 2015 IEEE International Conference on Computer Vision (ICCV). 1062--1070. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). 3304--3311.Google ScholarGoogle Scholar
  13. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks Advances in neural information processing systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network in network. arXiv preprint arXiv:1312.4400 (2013).Google ScholarGoogle Scholar
  15. Hongye Liu, Yonghong Tian, Yaowei Yang, Lu Pang, and Tiejun Huang. 2016 b. Deep relative distance learning: Tell the difference between similar vehicles 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2167--2175.Google ScholarGoogle Scholar
  16. Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang. 2016 a. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1096--1104.Google ScholarGoogle Scholar
  17. David G Lowe. 1999. Object recognition from local scale-invariant features Proceedings of the Seventh IEEE International Conference on Computer Vision (ICCV), Vol. Vol. 2. 1150--1157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hyun Oh Song, Yu Xiang, Stefanie Jegelka, and Silvio Savarese. 2016. Deep metric learning via lifted structured feature embedding 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4004--4012.Google ScholarGoogle Scholar
  19. Beng Chin Ooi, Kian-Lee Tan, Sheng Wang, Wei Wang, Qingchao Cai, Gang Chen, Jinyang Gao, Zhaojing Luo, Anthony KH Tung, Yuan Wang, et almbox.. 2015. SINGA: A distributed deep learning platform. In Proceedings of the 23rd ACM international conference on Multimedia. ACM, 685--688. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 815--823.Google ScholarGoogle Scholar
  21. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2013. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013).Google ScholarGoogle Scholar
  22. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  23. Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et almbox.. 2015. End-to-end memory networks. In Advances in neural information processing systems. 2440--2448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jinhui Tang, Xiangbo Shu, Zechao Li, Guo-Jun Qi, and Jingdong Wang. 2016. Generalized deep transfer networks for knowledge propagation in heterogeneous domains. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 12, 4s (2016), 68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Daksh Varshneya and G Srinivasaraghavan. 2017. Human Trajectory Prediction using Spatially aware Deep Attention Models. arXiv preprint arXiv:1705.09436 (2017).Google ScholarGoogle Scholar
  26. Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yongdong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study Proceedings of the 22nd ACM international conference on Multimedia. ACM, 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wei Wang, Gang Chen, Haibo Chen, Tien Tuan Anh Dinh, Jinyang Gao, Beng Chin Ooi, Kian-Lee Tan, Sheng Wang, and Meihui Zhang. 2016 a. Deep learning at scale and at ease. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 12, 4s (2016), 69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Wei Wang, Beng Chin Ooi, Xiaoyan Yang, Dongxiang Zhang, and Yueting Zhuang. 2014. Effective multi-modal retrieval based on stacked auto-encoders. Proceedings of the VLDB Endowment Vol. 7, 8 (2014), 649--660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Wei Wang, Xiaoyan Yang, Beng Chin Ooi, Dongxiang Zhang, and Yueting Zhuang. 2016 c. Effective deep learning-based multi-modal retrieval. The VLDB Journal, Vol. 25, 1 (2016), 79--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wei Wang, Meihui Zhang, Gang Chen, HV Jagadish, Beng Chin Ooi, and Kian-Lee Tan. 2016 d. Database Meets Deep Learning: Challenges and Opportunities. ACM SIGMOD Record, Vol. 45, 2 (2016), 17--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xi Wang, Zhenfeng Sun, Wenqiang Zhang, Yu Zhou, and Yu-Gang Jiang. 2016 b. Matching user photos to online products with robust deep features Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 7--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention International Conference on Machine Learning. 2048--2057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yang Yang, Yadan Luo, Weilun Chen, Fumin Shen, Jie Shao, and Heng Tao Shen. 2016. Zero-shot hashing via transferring supervised knowledge Proceedings of the 2016 ACM on Multimedia Conference. ACM, 1286--1295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yang Yang, Zheng-Jun Zha, Yue Gao, Xiaofeng Zhu, and Tat-Seng Chua. 2014. Exploiting web images for semantic video indexing via robust sample-specific loss. IEEE Transactions on Multimedia Vol. 16, 6 (2014), 1677--1689.Google ScholarGoogle ScholarCross RefCross Ref
  35. Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? Advances in neural information processing systems. 3320--3328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yuhui Yuan, Kuiyuan Yang, and Chao Zhang. 2016. Hard-Aware Deeply Cascaded Embedding. arXiv preprint arXiv:1611.05720 (2016).Google ScholarGoogle Scholar

Index Terms

  1. Cross-Domain Image Retrieval with Attention Modeling

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '17: Proceedings of the 25th ACM international conference on Multimedia
        October 2017
        2028 pages
        ISBN:9781450349062
        DOI:10.1145/3123266

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 October 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        MM '17 Paper Acceptance Rate189of684submissions,28%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader