Skip to main content
Erschienen in: Neural Computing and Applications 2/2016

01.02.2016 | Original Article

Joint learning of cross-modal classifier and factor analysis for multimedia data classification

verfasst von: Kanghong Duan, Hongxin Zhang, Jim Jing-Yan Wang

Erschienen in: Neural Computing and Applications | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we study the problem of learning from multiple model data for the purpose of document classification. In this problem, each document is composed of two different models of data, i.e., an image and a text. We propose to represent the data of two models by projecting them to a shared data space by using cross-model factor analysis formula and classify them in the shared space by using a linear class label predictor, named cross-model classifier. The parameters of both cross-model classifier and cross-model factor analysis are learned jointly, so that they can regularize the learning of each other. We construct a unified objective function for this learning problem. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projections measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple model document data sets show the advantage of the proposed algorithm over state-of-the-art multimedia data classification methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: 16th Annual neural information processing systems conference (NIPS 2002), pp 561–568 Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: 16th Annual neural information processing systems conference (NIPS 2002), pp 561–568
2.
Zurück zum Zitat Berghöfer E, Schulze D, Rauch C, Tscherepanow M, Khler T, Wachsmuth S (2013) Art-based fusion of multi-modal perception for robots. Neurocomputing 107:11–22CrossRef Berghöfer E, Schulze D, Rauch C, Tscherepanow M, Khler T, Wachsmuth S (2013) Art-based fusion of multi-modal perception for robots. Neurocomputing 107:11–22CrossRef
3.
Zurück zum Zitat Caicedo J, BenAbdallah J, González F, Nasraoui O (2012) Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1):50–60CrossRef Caicedo J, BenAbdallah J, González F, Nasraoui O (2012) Multimodal representation, indexing, automated annotation and retrieval of image collections via non-negative matrix factorization. Neurocomputing 76(1):50–60CrossRef
4.
Zurück zum Zitat Carenzi F, Bendahan P, Roschin V, Frolov A, Gorce P, Maier M (2004) A generic neural network for multi-modal sensorimotor learning. Neurocomputing 58–60:525–533CrossRef Carenzi F, Bendahan P, Roschin V, Frolov A, Gorce P, Maier M (2004) A generic neural network for multi-modal sensorimotor learning. Neurocomputing 58–60:525–533CrossRef
5.
Zurück zum Zitat Chen Y, Wang L, Wang W, Zhang Z (2012) Continuum regression for cross-modal multimedia retrieval. In: 2012 19th IEEE international conference on image processing (ICIP 2012), pp 1949–1952 Chen Y, Wang L, Wang W, Zhang Z (2012) Continuum regression for cross-modal multimedia retrieval. In: 2012 19th IEEE international conference on image processing (ICIP 2012), pp 1949–1952
6.
Zurück zum Zitat Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet G, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535CrossRef Costa Pereira J, Coviello E, Doyle G, Rasiwasia N, Lanckriet G, Levy R, Vasconcelos N (2014) On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans Pattern Anal Mach Intell 36(3):521–535CrossRef
7.
Zurück zum Zitat Deng J, Du L, Shen YD (2013) Heterogeneous metric learning for cross-modal multimedia retrieval. In: Web information systems engineering—WISE 2013. 14th International conference. proceedings: LNCS 8180, vol pt.I, pp 43–56 Deng J, Du L, Shen YD (2013) Heterogeneous metric learning for cross-modal multimedia retrieval. In: Web information systems engineering—WISE 2013. 14th International conference. proceedings: LNCS 8180, vol pt.I, pp 43–56
8.
Zurück zum Zitat Fomeni F, Letchford A (2014) A dynamic programming heuristic for the quadratic knapsack problem. INFORMS J Comput 26(1):173–182MathSciNetCrossRef Fomeni F, Letchford A (2014) A dynamic programming heuristic for the quadratic knapsack problem. INFORMS J Comput 26(1):173–182MathSciNetCrossRef
9.
Zurück zum Zitat Hong C, Zhu J (2013) Hypergraph-based multi-example ranking with sparse representation for transductive learning image retrieval. Neurocomputing 101:94–103CrossRef Hong C, Zhu J (2013) Hypergraph-based multi-example ranking with sparse representation for transductive learning image retrieval. Neurocomputing 101:94–103CrossRef
10.
Zurück zum Zitat Hu Y, Mian AS, Owens R (2011) Sparse approximated nearest points for image set classification. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on, pp 121–128 Hu Y, Mian AS, Owens R (2011) Sparse approximated nearest points for image set classification. In: Computer vision and pattern recognition (CVPR), 2011 IEEE conference on, pp 121–128
11.
Zurück zum Zitat Jayasekara S, Dassanayake H, Fernando A (2013) A novel image retrieval system based on histogram factorization and contextual similarity learning. Appl Mech Mater 380:4148–4151CrossRef Jayasekara S, Dassanayake H, Fernando A (2013) A novel image retrieval system based on histogram factorization and contextual similarity learning. Appl Mech Mater 380:4148–4151CrossRef
12.
Zurück zum Zitat Khan I, Saffari A, Bischof H (2009) Tvgraz: Multi-modal learning of object categories by combining textual and visual features. In: AAPR Workshop, pp 213–224 Khan I, Saffari A, Bischof H (2009) Tvgraz: Multi-modal learning of object categories by combining textual and visual features. In: AAPR Workshop, pp 213–224
13.
Zurück zum Zitat Kim HJ, Kim JU, Ra YG (2005) Boosting naïve bayes text classification using uncertainty-based selective sampling. Neurocomputing 67(1–4 SUPPL.):403–410CrossRef Kim HJ, Kim JU, Ra YG (2005) Boosting naïve bayes text classification using uncertainty-based selective sampling. Neurocomputing 67(1–4 SUPPL.):403–410CrossRef
14.
Zurück zum Zitat Lee KS, Nurzid Rosli A, Ariesthea Supandi I, Jo GS (2014) Dynamic sampling-based interpolation algorithm for representation of clickable moving object in collaborative video annotation. Neurocomputing 146:291–300CrossRef Lee KS, Nurzid Rosli A, Ariesthea Supandi I, Jo GS (2014) Dynamic sampling-based interpolation algorithm for representation of clickable moving object in collaborative video annotation. Neurocomputing 146:291–300CrossRef
15.
Zurück zum Zitat Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the eleventh ACM international conference on Multimedia, pp 604–611 Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Proceedings of the eleventh ACM international conference on Multimedia, pp 604–611
16.
Zurück zum Zitat Liu F, Yang G, Yin Y, Wang S (2014) Singular value decomposition based minutiae matching method for finger vein recognition. Neurocomputing 145:75–89CrossRef Liu F, Yang G, Yin Y, Wang S (2014) Singular value decomposition based minutiae matching method for finger vein recognition. Neurocomputing 145:75–89CrossRef
17.
Zurück zum Zitat Liu H, Li S (2013) Decision fusion of sparse representation and support vector machine for sar image target recognition. Neurocomputing 113:97–104CrossRef Liu H, Li S (2013) Decision fusion of sparse representation and support vector machine for sar image target recognition. Neurocomputing 113:97–104CrossRef
18.
Zurück zum Zitat Lumini A, Nanni L (2006) An advanced multi-modal method for human authentication featuring biometrics data and tokenised random numbers. Neurocomputing 69(13–15):1706–1710CrossRef Lumini A, Nanni L (2006) An advanced multi-modal method for human authentication featuring biometrics data and tokenised random numbers. Neurocomputing 69(13–15):1706–1710CrossRef
19.
Zurück zum Zitat Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. In: 11th Annual conference on neural information processing systems (NIPS 1997), pp 570–576 Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. In: 11th Annual conference on neural information processing systems (NIPS 1997), pp 570–576
20.
Zurück zum Zitat Masci J, Bronstein M, Bronstein A, Schmidhuber J (2014) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830CrossRef Masci J, Bronstein M, Bronstein A, Schmidhuber J (2014) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830CrossRef
21.
Zurück zum Zitat Merkl D (1998) Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21(1–3):61–77CrossRef Merkl D (1998) Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21(1–3):61–77CrossRef
22.
Zurück zum Zitat Miao P, Shen Y, Xia X (2014) Finite time dual neural networks with a tunable activation function for solving quadratic programming problems and its application. Neurocomputing 143:80–89CrossRef Miao P, Shen Y, Xia X (2014) Finite time dual neural networks with a tunable activation function for solving quadratic programming problems and its application. Neurocomputing 143:80–89CrossRef
23.
Zurück zum Zitat Oh K, Oh BS, Toh KA, Yau WY, Eng HL (2014) Combining sclera and periocular features for multi-modal identity verification. Neurocomputing 128:185–198CrossRef Oh K, Oh BS, Toh KA, Yau WY, Eng HL (2014) Combining sclera and periocular features for multi-modal identity verification. Neurocomputing 128:185–198CrossRef
24.
Zurück zum Zitat Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the international conference on Multimedia, ACM, pp 251–260 Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the international conference on Multimedia, ACM, pp 251–260
25.
Zurück zum Zitat Szymczyk P, Szymczyk M (2015) Classification of geological structure using ground penetrating radar and laplace transform artificial neural networks. Neurocomputing 148:354–362CrossRef Szymczyk P, Szymczyk M (2015) Classification of geological structure using ground penetrating radar and laplace transform artificial neural networks. Neurocomputing 148:354–362CrossRef
26.
Zurück zum Zitat Vidar EA, Alvindia SK (2013) SVD based graph regularized matrix factorization. In: Intelligent Data Engineering and Automated Learning-IDEAL 2013, Springer, pp 234–241 Vidar EA, Alvindia SK (2013) SVD based graph regularized matrix factorization. In: Intelligent Data Engineering and Automated Learning-IDEAL 2013, Springer, pp 234–241
27.
Zurück zum Zitat Wang D, Wu J, Zhang H, Xu K, Lin M (2013) Towards enhancing centroid classifier for text classification-a border-instance approach. Neurocomputing 101:299–308CrossRef Wang D, Wu J, Zhang H, Xu K, Lin M (2013) Towards enhancing centroid classifier for text classification-a border-instance approach. Neurocomputing 101:299–308CrossRef
28.
Zurück zum Zitat Wang J, Li Y, Zhang Y, Xie H, Wang C (2011) Bag-of-features based classification of breast parenchymal tissue in the mammogram via jointly selecting and weighting visual words. In: Image and Graphics (ICIG), 2011 Sixth International Conference on IEEE, pp 622–627 Wang J, Li Y, Zhang Y, Xie H, Wang C (2011) Bag-of-features based classification of breast parenchymal tissue in the mammogram via jointly selecting and weighting visual words. In: Image and Graphics (ICIG), 2011 Sixth International Conference on IEEE, pp 622–627
29.
Zurück zum Zitat Wang R, Guo H, Davis LS, Dai Q (2012) Covariance discriminative learning: a natural and efficient approach to image set classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on IEEE, pp 2496–2503 Wang R, Guo H, Davis LS, Dai Q (2012) Covariance discriminative learning: a natural and efficient approach to image set classification. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on IEEE, pp 2496–2503
30.
Zurück zum Zitat Wang Y, Guan L, Venetsanopoulos AN (2011) Kernel cross-modal factor analysis for multimodal information fusion. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on IEEE, pp 2384–2387 Wang Y, Guan L, Venetsanopoulos AN (2011) Kernel cross-modal factor analysis for multimodal information fusion. In: Acoustics, speech and signal processing (ICASSP), 2011 IEEE international conference on IEEE, pp 2384–2387
31.
Zurück zum Zitat Xing B, Zhang K, Sun S, Zhang L, Gao Z, Wang J, Chen S (2015) Emotion-driven chinese folk music-image retrieval based on de-svm. Neurocomputing 148:619–627CrossRef Xing B, Zhang K, Sun S, Zhang L, Gao Z, Wang J, Chen S (2015) Emotion-driven chinese folk music-image retrieval based on de-svm. Neurocomputing 148:619–627CrossRef
32.
Zurück zum Zitat Yu J, Cong Y, Qin Z, Wan T (2012) Cross-modal topic correlations for multimedia retrieval. In: 2012 21st international conference on pattern recognition (ICPR 2012), pp 246–249 Yu J, Cong Y, Qin Z, Wan T (2012) Cross-modal topic correlations for multimedia retrieval. In: 2012 21st international conference on pattern recognition (ICPR 2012), pp 246–249
33.
Zurück zum Zitat Zhang H, Lv S, Li W, Qu X (2014) A novel face recognition method using nearest line projection. J Comput 9(8):1952–1958 Zhang H, Lv S, Li W, Qu X (2014) A novel face recognition method using nearest line projection. J Comput 9(8):1952–1958
34.
Zurück zum Zitat Zhang X, Xu Z, Jia N, Yang W, Feng Q, Chen W, Feng Y (2015) Denoising of 3d magnetic resonance images by using higher-order singular value decomposition. Med Image Anal 19(1):75–86CrossRef Zhang X, Xu Z, Jia N, Yang W, Feng Q, Chen W, Feng Y (2015) Denoising of 3d magnetic resonance images by using higher-order singular value decomposition. Med Image Anal 19(1):75–86CrossRef
Metadaten
Titel
Joint learning of cross-modal classifier and factor analysis for multimedia data classification
verfasst von
Kanghong Duan
Hongxin Zhang
Jim Jing-Yan Wang
Publikationsdatum
01.02.2016
Verlag
Springer London
Erschienen in
Neural Computing and Applications / Ausgabe 2/2016
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-015-1866-3

Weitere Artikel der Ausgabe 2/2016

Neural Computing and Applications 2/2016 Zur Ausgabe

Extreme Learning Machine and Applications

Manifold regularized extreme learning machine

Extreme Learning Machine and Applications

An optimal method for data clustering