nach oben

International Journal of Computer Vision

Erschienen in:

01.08.2014

Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition

verfasst von: Fan Zhu, Ling Shao

Erschienen in: International Journal of Computer Vision | Ausgabe 1-2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We address the visual categorization problem and present a method that utilizes weakly labeled data from other visual domains as the auxiliary source data for enhancing the original learning system. The proposed method aims to expand the intra-class diversity of original training data through the collaboration with the source data. In order to bring the original target domain data and the auxiliary source domain data into the same feature space, we introduce a weakly-supervised cross-domain dictionary learning method, which learns a reconstructive, discriminative and domain-adaptive dictionary pair and the corresponding classifier parameters without using any prior information. Such a method operates at a high level, and it can be applied to different cross-domain applications. To build up the auxiliary domain data, we manually collect images from Web pages, and select human actions of specific categories from a different dataset. The proposed method is evaluated for human action recognition, image classification and event recognition tasks on the UCF YouTube dataset, the Caltech101/256 datasets and the Kodak dataset, respectively, achieving outstanding results.

Vorheriger Artikel Asymmetric and Category Invariant Feature Transformations for Domain Adaptation

Nächster Artikel Harnessing Lab Knowledge for Real-World Action Recognition

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://www.flickr.com/

http://www.youtube.com/

http://research.microsoft.com/~zliu/ActionRecoRsrc

Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transaction on Signal Processing, 54(11), 4311–4322.CrossRef

Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H. P., Schölkopf, B., & Smola, A. J. (2006). Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatices, 22, e49– e57.

Boureau, Y., Bach, F., LeCun, Y., & Ponce, J. (2010). Learning mid-level features for recognition. CVPR.

Cao, L., Liu, Z., & Huang, T. S. (2010). Cross-dataset action detection. CVPR.

Cao, X., Wang, Z., Yan, P., & Li, X. (2013). Transfer learning for pedestrian detection. Neurocomputing, 100, 51–57.CrossRef

Chen, S. S., Donoho, L. D., & Saunders, A. M. (1993). Atomic decomposition by basis pursuit. IEEE Transaction on Signal Processing, 41(12), 3397–3415.CrossRef

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. CVPR.

Dalal, N., Triggs, B., & Schmid, C. (2006). Human detection using oriented histograms of flow and appearance. ECCV.

Daumé III, Hal, Frustratingly easy domain adaptation, Proceedings of the Annual Meeting Association for Computational Linguistics, pp. 256–263 (2007).

Dikmen, M., Ning, H., Lin, D. J., Cao, L., Le, V., Tsai, S. F., et al. (2008). Surveillance event detection. TRECVID Video Evaluation Workshop.

Dollár, P., Rabaud, V., Cottrell, G., & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features, IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 .

Duan, L., Tsang, I. W., & Xu, D. (2012). Domain transfer multiple kernel learning. IEEE Transaction on Pattern Analysis and Machine Intelligence, 34, 465–479.CrossRef

Duan, L., Tsang, I. W., Xu, D., & Maybank, J. S. (2009). Domain transfer svm for video concept detection. CVPR.

Duan, L., Xu, D., Tsang, I. W., & Luo, J. (2012). Visual event recognition in videos by learning from web data. IEEE Transaction on Pattern Analysis and Machine Intelligence, 34, 1667–1680.CrossRef

Duchenne, O., Laptev, I., Sivic, J., Bach, F., & Ponce, J. (2009). Automatic annotation of human actions in video. ICCV.

Fei-Fei, L. (2006). Knowledge transfer in learning to recognize visual objects classes. ICDL.

Fei-Fei, L., Fergus, R., & Perona, P. (2007). Learning generative visual models from few training examples. An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106, 59–70.CrossRef

Gao, X., Wang, X., Li, X., & Tao, D. (2011). Transfer latent variable model based on divergence analysis. Pattern Recognition, 44, 2358–2366.CrossRefMATH

Gilbert, A., Illingworth, J., & Bowden, R. (2011). Action recognition using mined hierarchical compound features. IEEE Transaction on Pattern Analysis and Machine Intelligence, 33, 883–897.CrossRef

Golub, G., Hansen, P., & O’Leary, D. (1999). Tikhonov regularization and total least squares. Journal on Matrix Analysis and Applications, 21(1), 185–194.CrossRefMATHMathSciNet

Gregor, K., & LeCun, Y. (2010). ICML: Learning fast approximations of sparse coding. New York: Saunders.

Griffin, G., Holub, A., & Perona, P. (2007). Caltech-256 object category dataset, CIT Technical Report 1694.

Ikizler-Cinbis, N., Sclaroff, S. (2010). Object, scene and actions: Combining multiple features for human action recognition. ECCV.

Jégou, H., Douze, M., & Schmid, C. (2010). Improving bag-of-features for large scale image search. International Journal of Computer Vision, 87, 316–336.CrossRef

Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3D convolutional neural networks for human action recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence, 35, 221–231.CrossRef

Jiang, Z., Lin, Z., & Davis, L. S. (2011) Learning a discriminative dictionary for sparse coding via label consistent K-SVD. CVPR.

Junejo, I. N., Dexter, E., Laptev, I., & Pérez, P. (2011). View-independent action recognition from temporal self-similarities. IEEE Transaction on Pattern Analysis and Machine Intelligence, 33, 172–185.CrossRef

Kuehne, H., Jhuang, H., Garrote, E., Poggio, & T., Serre, T. (2011). HMDB: A large video database for human motion recognition. ICCV.

Kullback, S. (1987). The kullback-leibler distance. The American Statistician, 41, 340–341.

Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008). Learning realistic human actions from movies. CVPR.

Laptev, I. (2005). On space-time interest points. Internation Journal of Computer Vision, 64, 107–123.CrossRef

Lazebnik, S., Schmid, C., & Ponce, J. (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR.

Lee, H., Battle, A., Raina, R., & Andrew, Ng. (2007). Efficient sparse coding algorithms. NIPS.

Lee, H., Battle, A., Raina, R., & Ng, A. (2006). Efficient sparse coding algorithms. NIPS.

Li, R., & Zickler, T. (2012). Discriminative virtual views for cross-view action recognition. CVPR.

Liu, J., Luo, J., & Shah, M. (2009). Recognizing realistic actions from videos “in the wild”. CVPR.

Liu, J., Shah, M., Kuipers, B., & Savarese, S. (2011). Cross-view action recognition via view knowledge transfer. CVPR.

Liu, L., Shao, L., & Rockett, P. (2012). Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern Recognition. doi:10.1016/j.patcog.2012.10.004.

Liwicki, S., Zafeiriou, S., Tzimiropoulos, G., & Pantic, M. (2012). Efficient online subspace learning with an indefinite kernel for visual tracking and recognition. IEEE Transaction on Neural Networks and Learning Systems, 23, 1624–1636.CrossRef

Loui, A., Luo, J., Chang, S., Ellis, D., Jiang, W., Kennedy, l., Lee, K., & Yanagawa, K. (2007). Kodak’s consumer video benchmark data set: concept definition and annotation. IWMIR.

Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.CrossRef

Lowe, D. G., Luo, J., Chang, S. F., Ellis, D., Jiang, W., Kennedy, L., et al. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef

Mairal, J., Bach, F., Ponce, J., Sapiro, G,. & Zisserman, A. (2008). Discriminative learned dictionaries for local image analysis. CVPR.

Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009). Supervised dictionary learning. NIPS.

Mairal, J., Leordeanu, M., Bach, F., Hebert, M., & Ponce, J. (2008) Discriminative sparse image models for class-specific edge detection and image interpretation. ECCV.

Maji, S., Berg, A., & Malik, J. (2013). Efficient classification for additive Kernel SVMs. IEEE Transaction on Pattern Analysis and Machine Intelligence, 35, 66–77.CrossRef

Mallat, S. G., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transaction on Signal Processing, 41(12), 3397–3415.CrossRefMATH

Marszalek, M., Laptev, I., & Schmid, C. (2009). Actions in context. CVPR.

Orrite, C., Rodríguez, M., & Montañés, M. (2011). One-sequence learning of human actions. Human Behavior Unterstanding, 7065, 40–51.CrossRef

Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transaction on Knowledge and Data Engineering, 22, 1345–1359.CrossRef

Pati, Y., & Ramin, R. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. Asilomar Conference on Signals, Systems and Computers, 4, 40–44.CrossRef

Qiu, Q., Patel, V. M., Turaga, P., & Chellappa, R. (2012). Domain adaptive dictionary learning. ECCV.

Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: Transfer learning from unlabeled data. ICML.

Schuldt, C., Laptev, I., & Caputo, B. (2004). Recognizing human actions: A local svm approach. ICPR.

Sidenblada, H., & Black, M. J. (2003). Learning the statistics of people in images and video. International Journal of Computer Vision, 54, 183–209.

Sohn, K., Jung, D., Lee, H., & Hero, A. (2011) Efficient learning of sparse, distributed, convolutional feature representations for object recognition. ICCV.

Su, Y., & Jurie, F. (2012). Improving image classification using semantic attributes. International Journal of Computer Vision, 100, 1–19.CrossRef

Uemura, H., Ishikawa, S., Mikolajczyk, K. (2008). Feature tracking and motion compensation for action recognition. BMVC.

Wang, H., Klaser, A., Schmid, C., Liu, C. (2011). Action recognition by dense trajectories. CVPR.

Wang, H., Ullah, M., Klaser, A., Laptev, I., Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. BMVC.

Wang, J., Yang, J., Yu, K., Lv, F., huang, T., Gong, Y. (2010). Locality-constrained linear coding for image classification. CVPR.

Wang, Y., & Mori, G. (2009). Max-margin hidden conditional random fields for human action recognition. CVPR.

Wang, Y., & Mori, G. (2011). Hidden part models for human action recognition: Probabilistic versus max margin. IEEE Transaction on Pattern Analysis and Machine Intelligence, 33, 1310–1323.CrossRef

Wright, J., Yang, Y. A., Ganesh, A., Sastry, S. S., & Ma, Y. (2009). IEEE Transaction on Pattern Analysis and Machine Intelligence, 31, 210–227.CrossRef

Xiang, S., Nie, F., Meng, G., Pan, C., & Zhang, C. (2012). Discriminative least squares regression for multiclass classification and feature selection. IEEE Transaction on Neural Networks and Learning Systems, 23, 1738–1754.

Yang, L., Jin, R., Sukthankar, R., & Jurie, F. (2008). Unifying discriminative visual codebook generation with classifier training for object category recognition. CVPR.

Yang, J., Yan, R., & Hauptmann, A. G. (2007). Cross-domain video concept detection using adaptive SVMs. ACM MM.

Yang, J., Yu, K., Gong, Y., Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. CVPR.

Yang, J., Yu, K., & Huang, T. (2010). Supervised translation-invariant sparse coding. CVPR.

Yao, A., Gall, J., & Van, L. G. (2012). Coupled action recognition and pose estimation from multiple views. International Journal of Computer Vision, 100, 16–37.CrossRefMATH

Zafeiriou, S., Tzimiropoulos, G., Petrou, M., & Stathaki, T. (2012) Regularized kernel discriminant analysis with a robust kernel for face recognition and verification. NIPS.

Zhang, H., Berg, C. A., Maire, M., & Malik, J. (2006) SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. CVPR.

Zhang, Q., & Li, B. (2010). Discriminative K-SVD for dictionary learning in face recognition. CVPR.

Zhang, W., Surve, A., Fern, X., & Dietterich, T. (2009). Learning non-redundant codebooks for classifying complex objects. ICML.

Zheng, J., Jinag, Z., Phillips,P. J., & Chellappa, R. (2012) Cross-view action recognition via a transferable dictionary pair. BMVC.

Zhou, D., Bousquet, O., Lal, T., Weston, J., Gretton, A., & Schölkopf, B. (2004). Learning with local and global consistency. NIPS.

Zhou, M., Chen, H., Paisley, J., Ren, L., Sapiro, G., & Carin, L. (2009). Non-parametric bayesian dictionary learning for sparse image representations. NIPS.

Zhou, D., Weston, J., Gretton, A., Bousquet, O., & Schölkopf, B. (2004). Ranking on data manifolds. NIPS.

Zhu, F., & Shao, L. (2013). Enhancing action recognition by cross-domain dictionary learning. BMVC.

Titel: Weakly-Supervised Cross-Domain Dictionary Learning for Visual Recognition
verfasst von: Fan Zhu
Ling Shao
Publikationsdatum: 01.08.2014
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 1-2/2014
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-014-0703-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1-2/2014

Domain Adaptation for Structured Regression

Guest Editor’s Introduction to the Special Issue on Domain Adaptation for Vision Applications

Learning Kernels for Unsupervised Domain Adaptation with Applications to Visual Object Recognition

Model-Driven Domain Adaptation on Product Manifolds for Unconstrained Face Recognition

Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images

Generalized Transfer Subspace Learning Through Low-Rank Constraint