Skip to main content
Erschienen in: International Journal of Computer Vision 2/2016

01.06.2016

Exploiting Privileged Information from Web Data for Action and Event Recognition

verfasst von: Li Niu, Wen Li, Dong Xu

Erschienen in: International Journal of Computer Vision | Ausgabe 2/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the conventional approaches for action and event recognition, sufficient labelled training videos are generally required to learn robust classifiers with good generalization capability on new testing videos. However, collecting labelled training videos is often time consuming and expensive. In this work, we propose new learning frameworks to train robust classifiers for action and event recognition by using freely available web videos as training data. We aim to address three challenging issues: (1) the training web videos are generally associated with rich textual descriptions, which are not available in test videos; (2) the labels of training web videos are noisy and may be inaccurate; (3) the data distributions between training and test videos are often considerably different. To address the first two issues, we propose a new framework called multi-instance learning with privileged information (MIL-PI) together with three new MIL methods, in which we not only take advantage of the additional textual descriptions of training web videos as privileged information, but also explicitly cope with noise in the loose labels of training web videos. When the training and test videos come from different data distributions, we further extend our MIL-PI as a new framework called domain adaptive MIL-PI. We also propose another three new domain adaptation methods, which can additionally reduce the data distribution mismatch between training and test videos. Comprehensive experiments for action and event recognition demonstrate the effectiveness of our proposed approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
The work in Li et al. (2011) used both visual and textual features in the training process. However, it also requires the textual features in the testing process.
 
2
The bias term \(\hat{b}\) and the scalar terms \(\rho \) and \(\frac{1}{\Vert \mathbf {v}\Vert }\) will not change the trend of functions.
 
Literatur
Zurück zum Zitat Aggarwal, J. K., & Ryoo, M. S. (2011). Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3), 16.CrossRef Aggarwal, J. K., & Ryoo, M. S. (2011). Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3), 16.CrossRef
Zurück zum Zitat Andrews, S., Tsochantaridis, I., & Hofmann, T. (2003). Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems (NIPS) (pp. 561–568). Andrews, S., Tsochantaridis, I., & Hofmann, T. (2003). Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems (NIPS) (pp. 561–568).
Zurück zum Zitat Baktashmotlagh, M., Harandi, M., & Brian Lovell, M. S. (2013). Unsupervised domain adaptation by domain invariant projection. In IEEE International Conference on Computer Vision (ICCV) (pp. 769–776). Baktashmotlagh, M., Harandi, M., & Brian Lovell, M. S. (2013). Unsupervised domain adaptation by domain invariant projection. In IEEE International Conference on Computer Vision (ICCV) (pp. 769–776).
Zurück zum Zitat Bergamo, A., & Torresani, L. (2010). Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In Advances in Neural Information Processing Systems (NIPS) (pp. 181–189). Bergamo, A., & Torresani, L. (2010). Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In Advances in Neural Information Processing Systems (NIPS) (pp. 181–189).
Zurück zum Zitat Bobick, A. F. (1997). Movement, activity and action: The role of knowledge in the perception of motion. Philosophical Transactions of the Royal Society B: Biological Sciences, 352(1358), 1257–1265.CrossRef Bobick, A. F. (1997). Movement, activity and action: The role of knowledge in the perception of motion. Philosophical Transactions of the Royal Society B: Biological Sciences, 352(1358), 1257–1265.CrossRef
Zurück zum Zitat Bootkrajang, J., & Kabán, A. (2014). Learning kernel logistic regression in the presence of class label noise. Pattern Recognition, 47(11), 3641–3655.CrossRef Bootkrajang, J., & Kabán, A. (2014). Learning kernel logistic regression in the presence of class label noise. Pattern Recognition, 47(11), 3641–3655.CrossRef
Zurück zum Zitat Bruzzone, L., & Marconcini, M. (2010). Domain adaptation problems: A DASVM classification technique and a circular validation strategy. T-PAMI, 32(5), 770–787.CrossRef Bruzzone, L., & Marconcini, M. (2010). Domain adaptation problems: A DASVM classification technique and a circular validation strategy. T-PAMI, 32(5), 770–787.CrossRef
Zurück zum Zitat Bunescu, R. C., & Mooney, R. J. (2007). Multiple instance learning for sparse positive bags. In International Conference on Machine learning (ICML) (pp. 105–112). Bunescu, R. C., & Mooney, R. J. (2007). Multiple instance learning for sparse positive bags. In International Conference on Machine learning (ICML) (pp. 105–112).
Zurück zum Zitat Chang, S. F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A. C., & Luo, J. (2007). Large-scale multimodal semantic concept detection for consumer video. In International Workshop on Multimedia Information Retrieval (pp. 255–264). Chang, S. F., Ellis, D., Jiang, W., Lee, K., Yanagawa, A., Loui, A. C., & Luo, J. (2007). Large-scale multimodal semantic concept detection for consumer video. In International Workshop on Multimedia Information Retrieval (pp. 255–264).
Zurück zum Zitat Chen, L., Duan, L., & Xu, D. (2013a) Event recognition in videos by learning from heterogeneous web sources. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2666–2673). Chen, L., Duan, L., & Xu, D. (2013a) Event recognition in videos by learning from heterogeneous web sources. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2666–2673).
Zurück zum Zitat Chen, X., Shrivastava, A., & Gupta, A. (2013b) NEIL: Extracting visual knowledge from web data. In IEEE International Conference on Computer Vision (ICCV) (pp. 1409–1416). Chen, X., Shrivastava, A., & Gupta, A. (2013b) NEIL: Extracting visual knowledge from web data. In IEEE International Conference on Computer Vision (ICCV) (pp. 1409–1416).
Zurück zum Zitat Chen, Y., Bi, J., & Wang, J. Z. (2006). MILES: Multiple-instance learning via embedded instance selection. T-PAMI, 28(12), 1931–1947.CrossRef Chen, Y., Bi, J., & Wang, J. Z. (2006). MILES: Multiple-instance learning via embedded instance selection. T-PAMI, 28(12), 1931–1947.CrossRef
Zurück zum Zitat Chu, W. S., DelaTorre, F., & Cohn, J. (2013) Selective transfer machine for personalized facial action unit detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3515–3522). Chu, W. S., DelaTorre, F., & Cohn, J. (2013) Selective transfer machine for personalized facial action unit detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3515–3522).
Zurück zum Zitat Duan, L., Li, W., Tsang, I. W., & Xu, D. (2011). Improving web image search by bag-based re-ranking. T-IP, 20(11), 3280–3290.MathSciNetCrossRef Duan, L., Li, W., Tsang, I. W., & Xu, D. (2011). Improving web image search by bag-based re-ranking. T-IP, 20(11), 3280–3290.MathSciNetCrossRef
Zurück zum Zitat Duan, L., Tsang, I. W., & Xu, D. (2012a). Domain transfer multiple kernel learning. T-PAMI, 34(3), 465–479.CrossRef Duan, L., Tsang, I. W., & Xu, D. (2012a). Domain transfer multiple kernel learning. T-PAMI, 34(3), 465–479.CrossRef
Zurück zum Zitat Duan, L., Xu, D., & Chang, S. F. (2012b). Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1338–1345). Duan, L., Xu, D., & Chang, S. F. (2012b). Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1338–1345).
Zurück zum Zitat Duan, L., Xu, D., & Tsang, I. W. (2012c). Domain adaptation from multiple sources: A domain-dependent regularization approach. T-NNLS, 23(3), 504–518. Duan, L., Xu, D., & Tsang, I. W. (2012c). Domain adaptation from multiple sources: A domain-dependent regularization approach. T-NNLS, 23(3), 504–518.
Zurück zum Zitat Duan, L., Xu, D., Tsang, I. W., & Luo, J. (2012d). Visual event recognition in videos by learning from web data. T-PAMI, 34(9), 1667–1680.CrossRef Duan, L., Xu, D., Tsang, I. W., & Luo, J. (2012d). Visual event recognition in videos by learning from web data. T-PAMI, 34(9), 1667–1680.CrossRef
Zurück zum Zitat Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1778–1785). Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1778–1785).
Zurück zum Zitat Farquhar, J. D. R., Hardoon, D. R., Meng, H., Shawe-Taylor, J., & Szedmak, S. (2005). Two view learning: SVM-2K, theory and practice. In NIPS. Farquhar, J. D. R., Hardoon, D. R., Meng, H., Shawe-Taylor, J., & Szedmak, S. (2005). Two view learning: SVM-2K, theory and practice. In NIPS.
Zurück zum Zitat Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google’s image search. In ICCV. Fergus, R., Fei-Fei, L., Perona, P., & Zisserman, A. (2005). Learning object categories from Google’s image search. In ICCV.
Zurück zum Zitat Fernando, B., Habrard, A., Sebban, M., & Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In ICCV. Fernando, B., Habrard, A., Sebban, M., & Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In ICCV.
Zurück zum Zitat Ferrari, V., & Zisserman, A. (2007). Learning visual attributes. In Advances in Neural Information Processing Systems (NIPS) (pp. 433–440). Ferrari, V., & Zisserman, A. (2007). Learning visual attributes. In Advances in Neural Information Processing Systems (NIPS) (pp. 433–440).
Zurück zum Zitat Fouad, S., Tino, P., Raychaudhury, S., & Schneider, P. (2013). Incorporating privileged information through metric learning. T-NNLS, 24(7), 1086–1098. Fouad, S., Tino, P., Raychaudhury, S., & Schneider, P. (2013). Incorporating privileged information through metric learning. T-NNLS, 24(7), 1086–1098.
Zurück zum Zitat Gehler, P. V., & Nowozin, S. (2008). Infinite kernel learning.Tech. rep., Max Planck Institute for Biological Cybernetics. In NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels. Gehler, P. V., & Nowozin, S. (2008). Infinite kernel learning.Tech. rep., Max Planck Institute for Biological Cybernetics. In NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels.
Zurück zum Zitat Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2066–2073). Gong, B., Shi, Y., Sha, F., & Grauman, K. (2012). Geodesic flow kernel for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2066–2073).
Zurück zum Zitat Gopalan, R., Li, R., & Chellappa, R. (2011). Domain adaptation for object recognition: An unsupervised approach. In IEEE International Conference on Computer Vision (ICCV) (pp. 999–1006). Gopalan, R., Li, R., & Chellappa, R. (2011). Domain adaptation for object recognition: An unsupervised approach. In IEEE International Conference on Computer Vision (ICCV) (pp. 999–1006).
Zurück zum Zitat Gretton, A., Rasch, K. M., Schlkopf, B., & Smola, A. (2012). A kernel two-sample test. JMLR, 13, 723–773.MathSciNetMATH Gretton, A., Rasch, K. M., Schlkopf, B., & Smola, A. (2012). A kernel two-sample test. JMLR, 13, 723–773.MathSciNetMATH
Zurück zum Zitat Hardoon, D. R., Szedmak, S., & Shawe-taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12), 2639–2664.CrossRefMATH Hardoon, D. R., Szedmak, S., & Shawe-taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12), 2639–2664.CrossRefMATH
Zurück zum Zitat Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., & Huang, T. S. (2009). Action detection in complex scenes with spatial and temporal ambiguities. In IEEE International Conference on Computer Vision (ICCV) (pp. 128–135). Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., & Huang, T. S. (2009). Action detection in complex scenes with spatial and temporal ambiguities. In IEEE International Conference on Computer Vision (ICCV) (pp. 128–135).
Zurück zum Zitat Huang, J., Smola, A., Gretton, A., Borgwardt, K., & Scholkopf, B. (2007). Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems (NIPS) (pp. 601–608). Huang, J., Smola, A., Gretton, A., Borgwardt, K., & Scholkopf, B. (2007). Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems (NIPS) (pp. 601–608).
Zurück zum Zitat Hwang, S. J., & Grauman, K. (2012). Learning the relative importance of objects from tagged images for retrieval and cross-modal search. IJCV, 100(2), 134–153.MathSciNetCrossRef Hwang, S. J., & Grauman, K. (2012). Learning the relative importance of objects from tagged images for retrieval and cross-modal search. IJCV, 100(2), 134–153.MathSciNetCrossRef
Zurück zum Zitat Jiang, Y. G., Ye, G., Chang, S. F., Ellis, D., & Loui, A. C. (2011). Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In International Conference on Multimedia Retrieval (ICMR) (p. 29). Jiang, Y. G., Ye, G., Chang, S. F., Ellis, D., & Loui, A. C. (2011). Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In International Conference on Multimedia Retrieval (ICMR) (p. 29).
Zurück zum Zitat Jiang, Y. G., Bhattacharya, S., Chang, S. F., & Shah, M. (2013). High-level event recognition in unconstrained videos. International Journal of Multimedia Information Retrieval, 2(2), 73–101.CrossRef Jiang, Y. G., Bhattacharya, S., Chang, S. F., & Shah, M. (2013). High-level event recognition in unconstrained videos. International Journal of Multimedia Information Retrieval, 2(2), 73–101.CrossRef
Zurück zum Zitat Kloft, M., Brefeld, U., Sonnenburg, S., & Zien, A. (2011). \({\ell }_\text{ p }\)-norm multiple kernel learning. JMLR, 12, 953–997. Kloft, M., Brefeld, U., Sonnenburg, S., & Zien, A. (2011). \({\ell }_\text{ p }\)-norm multiple kernel learning. JMLR, 12, 953–997.
Zurück zum Zitat Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). HMDB: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV) (pp. 2556–2563). Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). HMDB: a large video database for human motion recognition. In IEEE International Conference on Computer Vision (ICCV) (pp. 2556–2563).
Zurück zum Zitat Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1785–1792). Kulis, B., Saenko, K., & Darrell, T. (2011). What you saw is not what you get: Domain adaptation using asymmetric kernel transforms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1785–1792).
Zurück zum Zitat Le, Q. V., Zou, W. Y., Yeung, S. Y., & Ng, A.Y. (2011). Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3361–3368). Le, Q. V., Zou, W. Y., Yeung, S. Y., & Ng, A.Y. (2011). Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3361–3368).
Zurück zum Zitat Leung, T., Song, Y., & Zhang, J. (2011). Handling label noise in video classification via multiple instance learning. In IEEE International Conference on Computer Vision (ICCV) (pp. 2056–2063). Leung, T., Song, Y., & Zhang, J. (2011). Handling label noise in video classification via multiple instance learning. In IEEE International Conference on Computer Vision (ICCV) (pp. 2056–2063).
Zurück zum Zitat Li, Q., Wu, J., & Tu, Z. (2013). Harvesting mid-level visual concepts from large-scale Internet images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 851–858). Li, Q., Wu, J., & Tu, Z. (2013). Harvesting mid-level visual concepts from large-scale Internet images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 851–858).
Zurück zum Zitat Li, W., Duan, L., Xu, D., & Tsang, I. W. (2011). Text-based image retrieval using progressive multi-instance learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2368–2375). Li, W., Duan, L., Xu, D., & Tsang, I. W. (2011). Text-based image retrieval using progressive multi-instance learning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2368–2375).
Zurück zum Zitat Li, W., Duan, L., Tsang, I.W., & Xu, D. (2012a). Batch mode adaptive multiple instance learning for computer vision tasks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2368–2375). Li, W., Duan, L., Tsang, I.W., & Xu, D. (2012a). Batch mode adaptive multiple instance learning for computer vision tasks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2368–2375).
Zurück zum Zitat Li, W., Duan, L., Tsang, I.W., & Xu, D. (2012b). Co-labeling: A new multi-view learning approach for ambiguous problems. In IEEE International Conference on Data Mining (ICDM) (pp. 419–428). Li, W., Duan, L., Tsang, I.W., & Xu, D. (2012b). Co-labeling: A new multi-view learning approach for ambiguous problems. In IEEE International Conference on Data Mining (ICDM) (pp. 419–428).
Zurück zum Zitat Li, W., Duan, L., Xu, D., & Tsang, I. W. (2014a). Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. T-PAMI, 36(6), 1134–1148.CrossRef Li, W., Duan, L., Xu, D., & Tsang, I. W. (2014a). Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. T-PAMI, 36(6), 1134–1148.CrossRef
Zurück zum Zitat Li, W., Niu, L., & Xu, D. (2014b). Exploiting privileged information from web data for image categorization. In European Conference on Computer Vision (ECCV) (pp. 437–452). Li, W., Niu, L., & Xu, D. (2014b). Exploiting privileged information from web data for image categorization. In European Conference on Computer Vision (ECCV) (pp. 437–452).
Zurück zum Zitat Li, Y.-F., Tsang, I. W., Kwok, J. T., & Zhou, Z.-H. (2009). Tighter and convex maximum margin clustering. In International Conference on Artificial Intelligence and Statistics (pp. 344–351). Li, Y.-F., Tsang, I. W., Kwok, J. T., & Zhou, Z.-H. (2009). Tighter and convex maximum margin clustering. In International Conference on Artificial Intelligence and Statistics (pp. 344–351).
Zurück zum Zitat Liang, L., Cai, F., & Cherkassky, V. (2009). Predictive learning with structured (grouped) data. Neural Networks, 22, 766–773.CrossRefMATH Liang, L., Cai, F., & Cherkassky, V. (2009). Predictive learning with structured (grouped) data. Neural Networks, 22, 766–773.CrossRefMATH
Zurück zum Zitat Lin, Z., Jiang, Z., & Davis, L. S. (2009). Recognizing actions by shape-motion prototype trees. In IEEE International Conference on Computer Vision (ICCV) (pp. 444–451). Lin, Z., Jiang, Z., & Davis, L. S. (2009). Recognizing actions by shape-motion prototype trees. In IEEE International Conference on Computer Vision (ICCV) (pp. 444–451).
Zurück zum Zitat Loui, A., Luo, J., Chang, S. F., Ellis, D., Jiang, W., Kennedy, L., Lee, K., & Yanagawa, A. (2007). Kodak’s consumer video benchmark data set: concept definition and annotation. In International Workshop on Multimedia Information Retrieval (pp. 245–254). Loui, A., Luo, J., Chang, S. F., Ellis, D., Jiang, W., Kennedy, L., Lee, K., & Yanagawa, A. (2007). Kodak’s consumer video benchmark data set: concept definition and annotation. In International Workshop on Multimedia Information Retrieval (pp. 245–254).
Zurück zum Zitat Morariu, V.I., & Davis, L.S. (2011). Multi-agent event recognition in structured scenarios. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3289–3296). Morariu, V.I., & Davis, L.S. (2011). Multi-agent event recognition in structured scenarios. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3289–3296).
Zurück zum Zitat Natarajan, N., Dhillon, I. S., Ravikumar, P. K., & Tewari, A. (2013). Learning with noisy labels. In Advances in Neural Information Processing Systems, pp 1196–1204. Natarajan, N., Dhillon, I. S., Ravikumar, P. K., & Tewari, A. (2013). Learning with noisy labels. In Advances in Neural Information Processing Systems, pp 1196–1204.
Zurück zum Zitat Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2011). Domain adaptation via transfer component analysis. T-NN, 22(2), 199–210. Pan, S. J., Tsang, I. W., Kwok, J. T., & Yang, Q. (2011). Domain adaptation via transfer component analysis. T-NN, 22(2), 199–210.
Zurück zum Zitat Schroff, F., Criminisi, A., & Zisserman, A. (2011). Harvesting image databases from the web. T-PAMI, 33(4), 754–766.CrossRef Schroff, F., Criminisi, A., & Zisserman, A. (2011). Harvesting image databases from the web. T-PAMI, 33(4), 754–766.CrossRef
Zurück zum Zitat Sharmanska, V., Quadrianto, N., Lampert, C. H. (2013). Learning to rank using privileged information. In IEEE International Conference on Computer Vision (ICCV) (pp. 825–832). Sharmanska, V., Quadrianto, N., Lampert, C. H. (2013). Learning to rank using privileged information. In IEEE International Conference on Computer Vision (ICCV) (pp. 825–832).
Zurück zum Zitat Shi, Y., Huang, Y., Minnen, D., Bobick, A., & Essa, I. (2004). Propagation networks for recognition of partially ordered sequential action. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (vol. 2, pp. II-862–II-869). Shi, Y., Huang, Y., Minnen, D., Bobick, A., & Essa, I. (2004). Propagation networks for recognition of partially ordered sequential action. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (vol. 2, pp. II-862–II-869).
Zurück zum Zitat Torralba, A., & Efros, A.A. (2011). Unbiased look at dataset bias. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1521–1528). Torralba, A., & Efros, A.A. (2011). Unbiased look at dataset bias. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1521–1528).
Zurück zum Zitat Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. T-PAMI, 30(11), 1958–1970.CrossRef Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. T-PAMI, 30(11), 1958–1970.CrossRef
Zurück zum Zitat Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In European Conference on Computer Vision (ECCV) (pp. 776–789). Torresani, L., Szummer, M., & Fitzgibbon, A. (2010). Efficient object category recognition using classemes. In European Conference on Computer Vision (ECCV) (pp. 776–789).
Zurück zum Zitat Tran, S. D., & Davis, L. S. (2008). Event modeling and recognition using markov logic networks. In European Conference on Computer Vision (ECCV) (pp. 610–623). Tran, S. D., & Davis, L. S. (2008). Event modeling and recognition using markov logic networks. In European Conference on Computer Vision (ECCV) (pp. 610–623).
Zurück zum Zitat Vapnik, V., & Vashist, A. (2009). A new learning paradigm: Learning using privileged infromatin. Neural Networks, 22, 544–557.CrossRefMATH Vapnik, V., & Vashist, A. (2009). A new learning paradigm: Learning using privileged infromatin. Neural Networks, 22, 544–557.CrossRefMATH
Zurück zum Zitat Vijayanarasimhan, S., & Grauman, K. (2008). Keywords to visual categories: Multiple-instance learning for weakly supervised object categorization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8). Vijayanarasimhan, S., & Grauman, K. (2008). Keywords to visual categories: Multiple-instance learning for weakly supervised object categorization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1–8).
Zurück zum Zitat Wang, H., & Schmid, C. (2013). Action recognition with improved trajectories. In IEEE International Conference on Computer Vision (ICCV) (pp. 3551–3558). Wang, H., & Schmid, C. (2013). Action recognition with improved trajectories. In IEEE International Conference on Computer Vision (ICCV) (pp. 3551–3558).
Zurück zum Zitat Wang, H., Klaser, A., Schmid, C., & Liu, C. L. (2011a). Action recognition by dense trajectories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3169–3176). Wang, H., Klaser, A., Schmid, C., & Liu, C. L. (2011a). Action recognition by dense trajectories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3169–3176).
Zurück zum Zitat Wang, L., Wang, Y., & Gao, W. (2011b). Mining layered grammar rules for action recognition. International Journal of Computer Vision, 93(2), 162–182. Wang, L., Wang, Y., & Gao, W. (2011b). Mining layered grammar rules for action recognition. International Journal of Computer Vision, 93(2), 162–182.
Zurück zum Zitat Xu, D., & Chang, S. F. (2008). Video event recognition using kernel methods with multilevel temporal alignment. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11), 1985–1997. Xu, D., & Chang, S. F. (2008). Video event recognition using kernel methods with multilevel temporal alignment. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11), 1985–1997.
Zurück zum Zitat Yu, T. H., Kim, T.K., & Cipolla, R. (2010). Real-time action recognition by spatiotemporal semantic and structural forests. In The British Machine Vision Conference (BMVC) (p. 52.1–52.12). Yu, T. H., Kim, T.K., & Cipolla, R. (2010). Real-time action recognition by spatiotemporal semantic and structural forests. In The British Machine Vision Conference (BMVC) (p. 52.1–52.12).
Zurück zum Zitat Zeng, Z., & Ji, Q. (2010). Knowledge based activity recognition with dynamic bayesian network. In European Conference on Computer Vision (ECCV) (pp. 532–546). Zeng, Z., & Ji, Q. (2010). Knowledge based activity recognition with dynamic bayesian network. In European Conference on Computer Vision (ECCV) (pp. 532–546).
Zurück zum Zitat Zhou, Z., & Zhang, M. (2006). Multi-instance multi-label learning with application to scene classification. In Advances in neural information processing systems (NIPS) (pp. 1609–1616). Zhou, Z., & Zhang, M. (2006). Multi-instance multi-label learning with application to scene classification. In Advances in neural information processing systems (NIPS) (pp. 1609–1616).
Zurück zum Zitat Zhu, G., Yang, M., Yu, K., Xu, W., & Gong, Y. (2009). Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor. In Proceedings of the 17th ACM international conference on Multimedia (pp. 165–174). ACM. Zhu, G., Yang, M., Yu, K., Xu, W., & Gong, Y. (2009). Detecting video events based on action recognition in complex scenes using spatio-temporal descriptor. In Proceedings of the 17th ACM international conference on Multimedia (pp. 165–174). ACM.
Metadaten
Titel
Exploiting Privileged Information from Web Data for Action and Event Recognition
verfasst von
Li Niu
Wen Li
Dong Xu
Publikationsdatum
01.06.2016
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 2/2016
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-015-0862-5

Weitere Artikel der Ausgabe 2/2016

International Journal of Computer Vision 2/2016 Zur Ausgabe

Premium Partner