Skip to main content
Log in

A cross-media distance metric learning framework based on multi-view correlation mining and matching

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

With the explosion of multimedia data, it is usual that different multimedia data often coexist in web repositories. Accordingly, it is more and more important to explore underlying intricate cross-media correlation instead of single-modality distance measure so as to improve multimedia semantics understanding. Cross-media distance metric learning focuses on correlation measure between multimedia data of different modalities. However, the existence of content heterogeneity and semantic gap makes it very challenging to measure cross-media distance. In this paper, we propose a novel cross-media distance metric learning framework based on sparse feature selection and multi-view matching. First, we employ sparse feature selection to select a subset of relevant features and remove redundant features for high-dimensional image features and audio features. Secondly, we maximize the canonical coefficient during image-audio feature dimension reduction for cross-media correlation mining. Thirdly, we further construct a Multi-modal Semantic Graph to find embedded manifold cross-media correlation. Moreover, we fuse the canonical correlation and the manifold information into multi-view matching which harmonizes different correlations with an iteration process and build Cross-media Semantic Space for cross-media distance measure. The experiments are conducted on image-audio dataset for cross-media retrieval. Experiment results are encouraging and show that the performance of our approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

Notes

  1. http://image.baidu.com

  2. http://en.wikipedia.org/wiki/Encarta

  3. http://www.aboutus.org/AnimalBehaviorArchive.org

References

  1. Bao, L., Cao, J., Zhang, Y., Li, J., Chen, M., Hauptmann, A.G.: Explicit and implicit concept-based video retrieval with bipartite graph propagation model. In: Proceedings of the 18th International Conference on Multimedia, pp 939–942 (2010)

  2. Barnard, K., Duygulu, P., Forsyth, D.A., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)

    MATH  Google Scholar 

  3. Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: CVPR (2), pp 1002–1009 (2004)

  4. Feng, Y.F., Xiao, J., Zhuang, Y.T., Liu, X.M.: Adaptive unsupervised mutli-view feature selection for visual concept recognition. In: Proceedings of the 11-th Asian Conference on Computer Vision (ACCV) (2012)

  5. Gupta, S.K., Phung, D.Q., Adams, B., Tran, T., Venkatesh, S.: Nonnegative shared subspace learning and its application to social media retrieval. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1169–1178 (2010)

  6. Han, Y.H., Wu, F., Tao, D.C., Shao, J., Zhuang, Y.T., Jiang, J.M.: Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans. Circuits Syst. Video Technol. 22(10), 1485–1496 (2012)

    Article  Google Scholar 

  7. Han, Y.H., Wu, F., Zhuang, Y.T., He, X.F.: Multi-label transfer learning with sparse representation. IEEE Trans. Circuits Syst. Video Technol. (IEEE T-CSVT) 20(8), 1110–1121 (2010)

    Article  Google Scholar 

  8. Han, Y.H., Yang, Y., Ma, Z.G., Shen, H.Q., Sebe, N., Zhou, X.F.: Image attribute adaptation. IEEE Trans. Multimedia (IEEE T-MM) 16(4), 1115–1126 (2014)

    Article  Google Scholar 

  9. Hardoon, D.R., Shawe-Taylor, J.: Sparse canonical correlation analysis. Mach. Learn. 83(3), 331–353 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Hardoon, D.R., Szedmàk, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)

    Article  MATH  Google Scholar 

  11. Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. TOMCCAP 2(1), 1–19 (2006)

    Article  Google Scholar 

  12. Liu, Y., Wu, F., Zhuang, Y., Xiao, J.: Active post-refined multimodality video semantic concept detection with tensor representation. In: Proceedings of the 16th International Conference on Multimedia, pp 91–100 (2008)

  13. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)

    Article  Google Scholar 

  14. Ma, Q., Nadamoto, A., Tanaka, K.: Complementary information retrieval for cross-media news content. Inf. Syst. 31(7), 659–678 (2006)

    Article  Google Scholar 

  15. Shrager, J., Hogg, T., Huberman, B.A.: Observation of phase transitions in spreading activation networks. Science 236(4805), 1092–1094 (1987)

    Article  Google Scholar 

  16. Snoek, C., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: Proceedings of the 13th ACM International Conference on Multimedia, pp 399–402 (2005)

  17. Sun, T., Chen, S.: Locality preserving cca with applications to data visualization and pose estimation. Image Vis. Comput. 25(5), 531–543 (2007)

    Article  MATH  Google Scholar 

  18. Tan, M., Wang, L., Tsang, I.W.: Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp 1047–1054 (2010)

  19. Tang, J., Yan, S., Hong, R., Qi, G., Chua, T.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of the 17th International Conference on Multimedia, pp 223–232 (2009)

  20. Vogt, C.C., Cottrell, G.W.: Fusion via a linear combination of scores. Inf. Retr. 1(3), 151–173 (1999)

    Article  Google Scholar 

  21. Wang, Z., Feng, Y.F., Yang, X.S., Zhang, J.J.: Adaptive multi-view feature selection for human motion retrieval. Signal Process. (2014). doi:10.1016/j.sigpro.2014.11.015

  22. Wu, Y., Chang, E.Y., Chang, K.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th ACM International Conference on Multimedia, pp 572–579 (2004)

  23. Xi, W., Fox, E.A., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 130–137, Salvador (2005)

  24. Xiao, J., Feng, Y.F., Ji, M.M., Zhuang, Y.T.: Fast view-based 3D model retrieval via unsupervised multiple feature fusion and online projection learning. Signal Process. (2014). doi:10.1016/j.sigpro.2014.11.020

  25. Yang, Y., Ma, Z.G., Hauptmann, A., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15(3), 661–669 (2013)

    Article  Google Scholar 

  26. Yang, Y., Nie, F., Xu, D., Luo, J., Zhuang, Y., Pan, Y.: A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 723–742 (2012)

    Article  Google Scholar 

  27. Yang, Y., Song, J., Huang, Z., Ma, Z., Sebe, N., Hauptmann, A.G.: Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans. Multimedia 15(3), 572–581 (2013)

    Article  Google Scholar 

  28. Yang, S., Yuan, L., Lai, Y., Shen, X., Wonka, P., Ye, J.: Feature grouping and selection over an undirected graph. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp 922–930 (2012)

  29. Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10(3), 437–446 (2008)

    Article  Google Scholar 

  30. Yu, J., Tao, D., Wang, M.: Adaptive hypergraph learning and its application in image classification. IEEE Trans. Image Process. 21(7), 3262–3272 (2012)

    Article  MathSciNet  Google Scholar 

  31. Zhang, H., Liu, Y., Ma, Z.: Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119, 10–16 (2013)

    Article  Google Scholar 

  32. Zhang, H., Yu, J., Wang, M., Liu, Y.: Semi-supervised distance metric learning based on local linear regression for data clustering. Neurocomputing 93, 100–105 (2012)

    Article  Google Scholar 

  33. Zhang, H., Zha, Z., Yang, Y., Yan, S., Gao, Y., Chua, T.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM Multimedia Conference, MM ’13, pp 33–42, Barcelona (2013)

  34. Zhang, H., Zhuang, Y., Wu, F.: Cross-modal correlation learning for clustering on image-audio dataset. In: Proceedings of the 15th International Conference on Multimedia, pp 273–276, Augsburg (2007)

  35. Zhang, J.G., Han, Y.H., Tang, J.H., Hu, Q.H., Jiang, J.M.: What can we learn about motion videos from still images?. In: Proceedings of the 17th International Conference on Multimedia, pp 973–976 (2014)

  36. Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10(2), 221–229 (2008)

    Article  Google Scholar 

  37. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems NIPS (2003)

  38. Zhou, D., et al.: Ranking on data manifolds. Advances in Neural Information Processing Systems NIPS (2003)

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China(No.61373109, No.61003127, No.61273303 and No.61440016), State Key Laboratory of Software Engineering (SKLSE2012-09-31), Program for Outstanding Young Science and Technology Innovation Teams in Higher Education Institutions of Hubei Province, China(No.T201202), and the Natural Science Foundation of Hubei Provincial of China (2014CFB247).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Gao, X., Wu, P. et al. A cross-media distance metric learning framework based on multi-view correlation mining and matching. World Wide Web 19, 181–197 (2016). https://doi.org/10.1007/s11280-015-0342-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-015-0342-4

Keywords

Navigation