nach oben

International Journal of Computer Vision

Erschienen in:

01.12.2013

Coloring Action Recognition in Still Images

verfasst von: Fahad Shahbaz Khan, Rao Muhammad Anwer, Joost van de Weijer, Andrew D. Bagdanov, Antonio M. Lopez, Michael Felsberg

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this article we investigate the problem of human action recognition in static images. By action recognition we intend a class of problems which includes both action classification and action detection (i.e. simultaneous localization and classification). Bag-of-words image representations yield promising results for action classification, and deformable part models perform very well object detection. The representations for action recognition typically use only shape cues and ignore color information. Inspired by the recent success of color in image classification and object detection, we investigate the potential of color for action classification and detection in static images. We perform a comprehensive evaluation of color descriptors and fusion approaches for action recognition. Experiments were conducted on the three datasets most used for benchmarking action recognition in still images: Willow, PASCAL VOC 2010 and Stanford-40. Our experiments demonstrate that incorporating color information considerably improves recognition performance, and that a descriptor based on color names outperforms pure color descriptors. Our experiments demonstrate that late fusion of color and shape information outperforms other approaches on action recognition. Finally, we show that the different color–shape fusion approaches result in complementary information and combining them yields state-of-the-art performance for action classification.

Vorheriger Artikel Camera Spectral Sensitivity and White Balance Estimation from Sky Images

Nächster Artikel Image Classification with the Fisher Vector: Theory and Practice

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

As evidenced by the First Workshop on Action Recognition and Pose Estimation in Still Images held in conjunction with ECCV 2012: http://vision.stanford.edu/apsi2012/index.html.

Only in experiment 6.2.3 do we add additional information from the background.

Note that the terminology of early and late fusion varies. In some communities early fusion refers to combination before the classifier and late fusion to combination after the classifier (Lan et al. 2012).

The combined vocabulary \(sc\) is constructed by concatenating the shape and color features before constructing the vocabulary in the combined feature-space.

Due to the absence of a vocabulary stage, several of the fusion methods explained in Sect. 4 cannot be applied to part-based object detection.

The Willow dataset is available at: http://www.di.ens.fr/willow/research/stillactions/.

PASCAL 2010 is available at: http://www.pascal-network.org/challenges/VOC/voc2010/.

The Stanford-40 dataset is available at http://vision.stanford.edu/Datasets/40actions.html.

We also performed experiments replacing HOG with pure color descriptors but significantly inferior results were obtained.

The confusion matrix is constructed by assigning each image to the class for which it gets the highest classification score.

Top ranked detections are the top \(N_j\) detections of a class, where \(N_j\) is equal to the number of positive examples for that class.

Benavente, R., Vanrell, M., & Baldrich, R. (2008). Parametric fuzzy sets for automatic color naming. Journal of the Optical Society of America A, 25(10), 2582–2593.CrossRef

Berlin, B., & Kay, P. (1969). Basic color terms: Their universality and evolution. Berkeley, CA: University of California Press.

Bosch, A., Zisserman, A., & Munoz, X. (2006). Scene classification via plsa. In Proceedings of the European conference on computer vision.

Bosch, A., Zisserman, A., & Munoz, X. (2008). Scene classification using a hybrid generative/discriminative approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(4), 712–727.CrossRef

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Conference on computer vision and pattern recognition.

Delaitre, V., Laptev, I., & Sivic, J. (2010). Recognizing human actions in still images: a study of bag-of-features and part-based representations. In Proceedings of the British machine vision conference.

Delaitre, V., Sivic, J., & Laptev, I. (2011). Learning person-object interactions for action recognition in still images. In Advances in neural information processing systems.

Desai, C., & Ramanan, D. (2012). Detecting actions, poses, and objects with relational phraselets. In Proceedings of the European conference on computer vision

Elfiky, N., Khan, F. S., van de Weijer, J., & Gonzalez, J. (2012). Discriminative compact pyramids for object and scene recognition. Pattern Recognition, 45(4), 1627–1636.MATHCrossRef

Everingham, M., Gool, L.V., Williams, C.K.I., JWinn, Zisserman A. (2009). The pascal visual object classes challenge 2009 (VOC2009) results.

Everingham, M., Gool, L. J. V., Williams, C. K. I., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2), 303–338.

Felsberg, M., & Hedborg, J. (2007). Real-time view-based pose recognition and interpolation for tracking initialization. Journal of Real-Time Image Processing, 2(3), 103–115.CrossRef

Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.CrossRef

Gaidon, A., Harchaoui, Z., & Schmid, C. (2011). Actom sequence models for efficient action detection. In Conference on computer vision and pattern recognition.

Gehler, P. V., & Nowozin, S. (2009). On feature combination for multiclass object classification. In Proceedings of IEEE international conference on computer vision.

Geusebroek, J. M., van den Boomgaard, R., Smeulders, A. W. M., & Geerts, H. (2001). Color invariance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(12), 1338–1350.CrossRef

Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In European conference on computer vision.

Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., & Huang, T. S. (2009). Action detection in complex scenes with spatial and temporal ambiguities. In Proceedings of IEEE international conference on computer vision.

Khan, F. S., van de Weijer, J., Bagdanov, A. D., & Vanrell, M. (2011). Portmanteau vocabularies for multi-cue image representations. In Advances in neural information processing systems.

Khan, F. S., Anwer, R. M., van de Weijer, J., Bagdanov, A. D., Vanrell, M., & Lopez, A. M. (2012a). Color attributes for object detection. In Conference on computer vision and pattern recognition.

Khan, F. S., van de Weijer, J., & Vanrell, M. (2012b). Modulating shape features by color attention for object recognition. International Journal of Computer Vision, 98(1), 49–64.CrossRef

Lan, Z. Z., Bao, L., Yu, S. I., Liu, W., & Hauptmann, A. G. (2012). Double fusion for multimedia event detection. In Multimedia Modeling.

Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision & pattern recognition.

Lenz, R., Bui, T. H., & Hernandez-Andres, J. (2005). Group theoretical structure of spectral spaces. Journal of Mathematical Imaging and Vision, 23(3), 297–313.MathSciNetCrossRef

Li, L. J., Su, H., Xing, E. P., & Li, F. F. (2010). Object bank: A high-level image representation for scene classification and semantic feature sparsification. In Advances in neural information processing systems.

Lowe, D. G. (2004). Distinctive image features from scale-invariant points. International Journal of Computer Vision, 60(2), 91–110.CrossRef

Maji, S., Bourdev, L. D., & Malik, J. (2011). Action recognition from a distributed representation of pose and appearance. In Computer vision and pattern recognition.

Mullen, K. T. (1985). The contrast sensitivity of human colour vision to red–green and blue–yellow chromatic gratings. The Journal of Physiology, 359, 381–400.

Pagani, A., Stricker, D., & Felsberg, M. (2009). Integral p-channels for fast and robust region matching. In Proceedings of international consortium for intergenerational programmes.

Prest, A., Schmid, C., & Ferrari, V. (2012). Weakly supervised learning of interactions between humans and objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3), 601–614.

van de Sande, K. E. A., Gevers, T., & Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1582–1596.

Shapovalova, N., Gong, W., Pedersoli, M., Roca, F. X., & Gonzalez, J. (2011). On importance of interactions and context in human action recognition. In Iberian conference on pattern recognition and image analysis.

Sharma, G., Jurie, F., & Schmid, C. (2012). Discriminative spatial saliency for image classification. In Conference on computer vision and pattern recognition.

Sharma, G., Jurie, F., & Schmid, C. (2013). Expanded parts model for human attribute and action recognition in still images. In Conference on computer vision and pattern recognition.

Tran, D., & Yuan, J. (2012). Max-margin structured output regression for spatio-temporal action localization. In Advances in neural information processing systems.

Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In Proceedings of IEEE international conference on computer vision.

Vigo, D. A. R., Khan, F. S., van de Weijer, J. & Gevers, T. (2010). The impact of color on bag-of-words based object recognition. In Indian council of philosophical research.

Wang, J., Yang, J., Yu, K., Lv, F., Huang, T. S., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In Conference on computer vision and pattern recognition.

van de Weijer, J., & Schmid, C. (2006). Coloring local feature extraction. In Proceedings of the European conference on computer vision.

van de Weijer, J., & Schmid, C. (2007). Applying color names to image description. In International consortium for intergenerational programmes.

van de Weijer, J., Schmid, C., Verbeek, J. J., & Larlus, D. (2009). Learning color names for real-world applications. IEEE Transaction in Image Processing (TIP), 18(7), 1512–1524.CrossRef

Yao, B., & Li, F. F. (2012). Recognizing human-object interactions in still images by modeling the mutual context of objects and human poses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1691–1703.MathSciNetCrossRef

Yao, B., Jiang, X., Khosla, A., Lin, A. L., Guibas, L. J., & Li, F. F. (2011). Human action recognition by learning bases of action attributes and parts. In Proceedings of IEEE international conference on computer vision.

Yuan, J., Liu, Z., & Wu, Y. (2011). Discriminative video pattern search for efficient action detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9), 1728–1743.CrossRef

Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object catergories: An in-depth study. A comprehensive study. International Journal of Computer Vision, 73(2), 213–218.CrossRef

Zhang, J., Huang, K., Yu, Y., & Tan, T. (2010). Boosted local structured hog-lbp for object localization. In IEEE conference on computer vision & pattern recognition.

Titel: Coloring Action Recognition in Still Images
verfasst von: Fahad Shahbaz Khan
Rao Muhammad Anwer
Joost van de Weijer
Andrew D. Bagdanov
Antonio M. Lopez
Michael Felsberg
Publikationsdatum: 01.12.2013
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 3/2013
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-013-0633-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2013

Image Classification with the Fisher Vector: Theory and Practice

Camera Spectral Sensitivity and White Balance Estimation from Sky Images

Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular High Speed Traffic Sequences

An Improved Hierarchical Dirichlet Process-Hidden Markov Model and Its Application to Trajectory Modeling and Retrieval