A cross-media distance metric learning framework based on multi-view correlation mining and matching

Zhang, Hong; Gao, Xingyu; Wu, Ping; Xu, Xin

doi:10.1007/s11280-015-0342-4

A cross-media distance metric learning framework based on multi-view correlation mining and matching

Published: 21 April 2015

Volume 19, pages 181–197, (2016)
Cite this article

World Wide Web Aims and scope Submit manuscript

Hong Zhang^1,2,
Xingyu Gao³,
Ping Wu¹ &
…
Xin Xu¹

1034 Accesses
20 Citations
Explore all metrics

Abstract

With the explosion of multimedia data, it is usual that different multimedia data often coexist in web repositories. Accordingly, it is more and more important to explore underlying intricate cross-media correlation instead of single-modality distance measure so as to improve multimedia semantics understanding. Cross-media distance metric learning focuses on correlation measure between multimedia data of different modalities. However, the existence of content heterogeneity and semantic gap makes it very challenging to measure cross-media distance. In this paper, we propose a novel cross-media distance metric learning framework based on sparse feature selection and multi-view matching. First, we employ sparse feature selection to select a subset of relevant features and remove redundant features for high-dimensional image features and audio features. Secondly, we maximize the canonical coefficient during image-audio feature dimension reduction for cross-media correlation mining. Thirdly, we further construct a Multi-modal Semantic Graph to find embedded manifold cross-media correlation. Moreover, we fuse the canonical correlation and the manifold information into multi-view matching which harmonizes different correlations with an iteration process and build Cross-media Semantic Space for cross-media distance measure. The experiments are conducted on image-audio dataset for cross-media retrieval. Experiment results are encouraging and show that the performance of our approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semisupervised Cross-Media Retrieval by Distance-Preserving Correlation Learning and Multi-modal Manifold Regularization

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

Article 21 April 2018

Joint graph regularization based modality-dependent cross-media retrieval

Article 15 June 2017

Notes

References

Bao, L., Cao, J., Zhang, Y., Li, J., Chen, M., Hauptmann, A.G.: Explicit and implicit concept-based video retrieval with bipartite graph propagation model. In: Proceedings of the 18th International Conference on Multimedia, pp 939–942 (2010)
Barnard, K., Duygulu, P., Forsyth, D.A., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)
MATH Google Scholar
Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: CVPR (2), pp 1002–1009 (2004)
Feng, Y.F., Xiao, J., Zhuang, Y.T., Liu, X.M.: Adaptive unsupervised mutli-view feature selection for visual concept recognition. In: Proceedings of the 11-th Asian Conference on Computer Vision (ACCV) (2012)
Gupta, S.K., Phung, D.Q., Adams, B., Tran, T., Venkatesh, S.: Nonnegative shared subspace learning and its application to social media retrieval. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 1169–1178 (2010)
Han, Y.H., Wu, F., Tao, D.C., Shao, J., Zhuang, Y.T., Jiang, J.M.: Sparse unsupervised dimensionality reduction for multiple view data. IEEE Trans. Circuits Syst. Video Technol. 22(10), 1485–1496 (2012)
Article Google Scholar
Han, Y.H., Wu, F., Zhuang, Y.T., He, X.F.: Multi-label transfer learning with sparse representation. IEEE Trans. Circuits Syst. Video Technol. (IEEE T-CSVT) 20(8), 1110–1121 (2010)
Article Google Scholar
Han, Y.H., Yang, Y., Ma, Z.G., Shen, H.Q., Sebe, N., Zhou, X.F.: Image attribute adaptation. IEEE Trans. Multimedia (IEEE T-MM) 16(4), 1115–1126 (2014)
Article Google Scholar
Hardoon, D.R., Shawe-Taylor, J.: Sparse canonical correlation analysis. Mach. Learn. 83(3), 331–353 (2011)
Article MathSciNet MATH Google Scholar
Hardoon, D.R., Szedmàk, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004)
Article MATH Google Scholar
Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: state of the art and challenges. TOMCCAP 2(1), 1–19 (2006)
Article Google Scholar
Liu, Y., Wu, F., Zhuang, Y., Xiao, J.: Active post-refined multimodality video semantic concept detection with tensor representation. In: Proceedings of the 16th International Conference on Multimedia, pp 91–100 (2008)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Article Google Scholar
Ma, Q., Nadamoto, A., Tanaka, K.: Complementary information retrieval for cross-media news content. Inf. Syst. 31(7), 659–678 (2006)
Article Google Scholar
Shrager, J., Hogg, T., Huberman, B.A.: Observation of phase transitions in spreading activation networks. Science 236(4805), 1092–1094 (1987)
Article Google Scholar
Snoek, C., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: Proceedings of the 13th ACM International Conference on Multimedia, pp 399–402 (2005)
Sun, T., Chen, S.: Locality preserving cca with applications to data visualization and pose estimation. Image Vis. Comput. 25(5), 531–543 (2007)
Article MATH Google Scholar
Tan, M., Wang, L., Tsang, I.W.: Learning sparse SVM for feature selection on very high dimensional datasets. In: Proceedings of the 27th International Conference on Machine Learning (ICML), pp 1047–1054 (2010)
Tang, J., Yan, S., Hong, R., Qi, G., Chua, T.: Inferring semantic concepts from community-contributed images and noisy tags. In: Proceedings of the 17th International Conference on Multimedia, pp 223–232 (2009)
Vogt, C.C., Cottrell, G.W.: Fusion via a linear combination of scores. Inf. Retr. 1(3), 151–173 (1999)
Article Google Scholar
Wang, Z., Feng, Y.F., Yang, X.S., Zhang, J.J.: Adaptive multi-view feature selection for human motion retrieval. Signal Process. (2014). doi:10.1016/j.sigpro.2014.11.015
Wu, Y., Chang, E.Y., Chang, K.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: Proceedings of the 12th ACM International Conference on Multimedia, pp 572–579 (2004)
Xi, W., Fox, E.A., Fan, W., Zhang, B., Chen, Z., Yan, J., Zhuang, D.: Simfusion: measuring similarity using unified relationship matrix. In: SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 130–137, Salvador (2005)
Xiao, J., Feng, Y.F., Ji, M.M., Zhuang, Y.T.: Fast view-based 3D model retrieval via unsupervised multiple feature fusion and online projection learning. Signal Process. (2014). doi:10.1016/j.sigpro.2014.11.020
Yang, Y., Ma, Z.G., Hauptmann, A., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15(3), 661–669 (2013)
Article Google Scholar
Yang, Y., Nie, F., Xu, D., Luo, J., Zhuang, Y., Pan, Y.: A multimedia retrieval framework based on semi-supervised ranking and relevance feedback. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 723–742 (2012)
Article Google Scholar
Yang, Y., Song, J., Huang, Z., Ma, Z., Sebe, N., Hauptmann, A.G.: Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans. Multimedia 15(3), 572–581 (2013)
Article Google Scholar
Yang, S., Yuan, L., Lai, Y., Shen, X., Wonka, P., Ye, J.: Feature grouping and selection over an undirected graph. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp 922–930 (2012)
Yang, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia 10(3), 437–446 (2008)
Article Google Scholar
Yu, J., Tao, D., Wang, M.: Adaptive hypergraph learning and its application in image classification. IEEE Trans. Image Process. 21(7), 3262–3272 (2012)
Article MathSciNet Google Scholar
Zhang, H., Liu, Y., Ma, Z.: Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119, 10–16 (2013)
Article Google Scholar
Zhang, H., Yu, J., Wang, M., Liu, Y.: Semi-supervised distance metric learning based on local linear regression for data clustering. Neurocomputing 93, 100–105 (2012)
Article Google Scholar
Zhang, H., Zha, Z., Yang, Y., Yan, S., Gao, Y., Chua, T.: Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In: ACM Multimedia Conference, MM ’13, pp 33–42, Barcelona (2013)
Zhang, H., Zhuang, Y., Wu, F.: Cross-modal correlation learning for clustering on image-audio dataset. In: Proceedings of the 15th International Conference on Multimedia, pp 273–276, Augsburg (2007)
Zhang, J.G., Han, Y.H., Tang, J.H., Hu, Q.H., Jiang, J.M.: What can we learn about motion videos from still images?. In: Proceedings of the 17th International Conference on Multimedia, pp 973–976 (2014)
Zhuang, Y., Yang, Y., Wu, F.: Mining semantic correlation of heterogeneous multimedia data for cross-media retrieval. IEEE Trans. Multimedia 10(2), 221–229 (2008)
Article Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems NIPS (2003)
Zhou, D., et al.: Ranking on data manifolds. Advances in Neural Information Processing Systems NIPS (2003)

Download references

Acknowledgments

This research is supported by the National Natural Science Foundation of China(No.61373109, No.61003127, No.61273303 and No.61440016), State Key Laboratory of Software Engineering (SKLSE2012-09-31), Program for Outstanding Young Science and Technology Innovation Teams in Higher Education Institutions of Hubei Province, China(No.T201202), and the Natural Science Foundation of Hubei Provincial of China (2014CFB247).

Author information

Authors and Affiliations

College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, 430081, China
Hong Zhang, Ping Wu & Xin Xu
Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan, China
Hong Zhang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xingyu Gao

Authors

Hong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xingyu Gao
View author publications
You can also search for this author in PubMed Google Scholar
Ping Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Gao, X., Wu, P. et al. A cross-media distance metric learning framework based on multi-view correlation mining and matching. World Wide Web 19, 181–197 (2016). https://doi.org/10.1007/s11280-015-0342-4

Download citation

Received: 27 November 2014
Revised: 26 February 2015
Accepted: 24 March 2015
Published: 21 April 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s11280-015-0342-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A cross-media distance metric learning framework based on multi-view correlation mining and matching

Abstract

Access this article

Similar content being viewed by others

Semisupervised Cross-Media Retrieval by Distance-Preserving Correlation Learning and Multi-modal Manifold Regularization

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

Joint graph regularization based modality-dependent cross-media retrieval

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A cross-media distance metric learning framework based on multi-view correlation mining and matching

Abstract

Access this article

Similar content being viewed by others

Semisupervised Cross-Media Retrieval by Distance-Preserving Correlation Learning and Multi-modal Manifold Regularization

Coupled feature selection based semi-supervised modality-dependent cross-modal retrieval

Joint graph regularization based modality-dependent cross-media retrieval

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation