Skip to main content
Top
Published in: International Journal of Computer Vision 4/2024

08-11-2023

Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study

Authors: Yuecong Xu, Haozhi Cao, Jianxiong Yin, Zhenghua Chen, Xiaoli Li, Zhengguo Li, Qianwen Xu, Jianfei Yang

Published in: International Journal of Computer Vision | Issue 4/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

While action recognition (AR) has gained large improvements with the introduction of large-scale video datasets and the development of deep neural networks, AR models robust to challenging environments in real-world scenarios are still under-explored. We focus on the task of action recognition in dark environments, which can be applied to fields such as surveillance and autonomous driving at night. Intuitively, current deep networks along with visual enhancement techniques should be able to handle AR in dark environments, however, it is observed that this is not always the case in practice. To dive deeper into exploring solutions for AR in dark environments, we launched the \(\hbox {UG}^{2}{+}\) Challenge Track 2 (UG2-2) in IEEE CVPR 2021, with a goal of evaluating and advancing the robustness of AR models in dark environments. The challenge builds and expands on top of a novel ARID dataset, the first dataset for the task of dark video AR, and guides models to tackle such a task in both fully and semi-supervised manners. Baseline results utilizing current AR models and enhancement methods are reported, justifying the challenging nature of this task with substantial room for improvements. Thanks to the active participation from the research community, notable advances have been made in participants’ solutions, while analysis of these solutions helped better identify possible directions to tackle the challenge of AR in dark environments.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Anaya, J., & Barbu, A. (2018). Renoir: A dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation, 51, 144–154.CrossRef Anaya, J., & Barbu, A. (2018). Renoir: A dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation, 51, 144–154.CrossRef
go back to reference Beddiar, D. R., Nini, B., Sabokrou, M., & Hadid, A. (2020). Vision-based human activity recognition: A survey. Multimedia Tools and Applications, 79, 30509–30555.CrossRef Beddiar, D. R., Nini, B., Sabokrou, M., & Hadid, A. (2020). Vision-based human activity recognition: A survey. Multimedia Tools and Applications, 79, 30509–30555.CrossRef
go back to reference Blau, Y., & Michaeli, T. (2018). The perception-distortion tradeoff. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6228–6237). Blau, Y., & Michaeli, T. (2018). The perception-distortion tradeoff. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6228–6237).
go back to reference Bo, Y., Lu, Y., & He, W. (2020). Few-shot learning of video action recognition only based on video contents. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 595–604). Bo, Y., Lu, Y., & He, W. (2020). Few-shot learning of video action recognition only based on video contents. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 595–604).
go back to reference Busto, P. P., Iqbal, A., & Gall, J. (2018). Open set domain adaptation for image and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 413–429.CrossRef Busto, P. P., Iqbal, A., & Gall, J. (2018). Open set domain adaptation for image and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 413–429.CrossRef
go back to reference Butler, D. J., Wulff, J., Stanley, G. B., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In Computer vision–ECCV 2012: 12th European conference on computer vision, Florence, Italy, October 7–13, 2012, proceedings, part VI 12 (pp. 611–625). Butler, D. J., Wulff, J., Stanley, G. B., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In Computer vision–ECCV 2012: 12th European conference on computer vision, Florence, Italy, October 7–13, 2012, proceedings, part VI 12 (pp. 611–625).
go back to reference Cao, D., & Xu, L. (2020). Bypass enhancement rgb stream model for pedestrian action recognition of autonomous vehicles. Pattern recognition (pp. 12–19). Springer. Cao, D., & Xu, L. (2020). Bypass enhancement rgb stream model for pedestrian action recognition of autonomous vehicles. Pattern recognition (pp. 12–19). Springer.
go back to reference Cao, H., Xu, Y., Yang, J., Mao, K., Xie, L., Yin, J., & See, S. (2021). Self-supervised video representation learning by video incoherence detection. arXiv preprint arXiv:2109.12493. Cao, H., Xu, Y., Yang, J., Mao, K., Xie, L., Yin, J., & See, S. (2021). Self-supervised video representation learning by video incoherence detection. arXiv preprint arXiv:​2109.​12493.
go back to reference Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., & Zisserman, A. (2018). A short note about kinetics-600. arXiv preprint arXiv:1808.01340. Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., & Zisserman, A. (2018). A short note about kinetics-600. arXiv preprint arXiv:​1808.​01340.
go back to reference Carreira, J., Noland, E., Hillier, C., & Zisserman, A. (2019). A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987. Carreira, J., Noland, E., Hillier, C., & Zisserman, A. (2019). A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:​1907.​06987.
go back to reference Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6299–6308). Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6299–6308).
go back to reference Chen, C., Chen, Q., Do, M. N., & Koltun, V. (2019). Seeing motion in the dark. In Proceedings of the IEEE international conference on computer vision (pp. 3185–3194). Chen, C., Chen, Q., Do, M. N., & Koltun, V. (2019). Seeing motion in the dark. In Proceedings of the IEEE international conference on computer vision (pp. 3185–3194).
go back to reference Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in the dark. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3291–3300). Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in the dark. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3291–3300).
go back to reference Chen, L., Ma, N., Wang, P., Li, J., Wang, P., Pang, G., & Shi, X. (2020). Survey of pedestrian action recognition techniques for autonomous driving. Tsinghua Science and Technology, 25(4), 458–470.CrossRef Chen, L., Ma, N., Wang, P., Li, J., Wang, P., Pang, G., & Shi, X. (2020). Survey of pedestrian action recognition techniques for autonomous driving. Tsinghua Science and Technology, 25(4), 458–470.CrossRef
go back to reference Chen, R., Chen, J., Liang, Z., Gao, H., & Lin, S. (2021). Darklight networks for action recognition in the dark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 846–852). Chen, R., Chen, J., Liang, Z., Gao, H., & Lin, S. (2021). Darklight networks for action recognition in the dark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 846–852).
go back to reference Chen, Y. L., Wu, B. F., Huang, H. Y., & Fan, C. J. (2010). A real-time vision system for nighttime vehicle detection and traffic surveillance. IEEE Transactions on Industrial Electronics, 58(5), 2030–2044.CrossRef Chen, Y. L., Wu, B. F., Huang, H. Y., & Fan, C. J. (2010). A real-time vision system for nighttime vehicle detection and traffic surveillance. IEEE Transactions on Industrial Electronics, 58(5), 2030–2044.CrossRef
go back to reference Choi, J., Sharma, G., Schulter, S., & Huang, J. B. (2020). Shuffle and attend: Video domain adaptation. In European conference on computer vision (pp. 678–695). Choi, J., Sharma, G., Schulter, S., & Huang, J. B. (2020). Shuffle and attend: Video domain adaptation. In European conference on computer vision (pp. 678–695).
go back to reference Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).
go back to reference Fahad, L. G., & Rajarajan, M. (2015). Integration of discriminative and generative models for activity recognition in smart homes. Applied Soft Computing, 37, 992–1001.CrossRef Fahad, L. G., & Rajarajan, M. (2015). Integration of discriminative and generative models for activity recognition in smart homes. Applied Soft Computing, 37, 992–1001.CrossRef
go back to reference Feichtenhofer, C. (2020). X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 203–213). Feichtenhofer, C. (2020). X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 203–213).
go back to reference Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6202–6211). Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6202–6211).
go back to reference Feng, S., Setoodeh, P., & Haykin, S. (2017). Smart home: Cognitive interactive people-centric internet of things. IEEE Communications Magazine, 55(2), 34–39.CrossRef Feng, S., Setoodeh, P., & Haykin, S. (2017). Smart home: Cognitive interactive people-centric internet of things. IEEE Communications Magazine, 55(2), 34–39.CrossRef
go back to reference Fernando, B., Bilen, H., Gavves, E., & Gould, S. (2017). Self-supervised video representation learning with odd-one-out networks. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 3636–3645). Fernando, B., Bilen, H., Gavves, E., & Gould, S. (2017). Self-supervised video representation learning with odd-one-out networks. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 3636–3645).
go back to reference Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (pp. 1180–1189). Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (pp. 1180–1189).
go back to reference Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., et al. (2016). Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1), 2096–2030.MathSciNet Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., et al. (2016). Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1), 2096–2030.MathSciNet
go back to reference Ghadiyaram, D., Tran, D., & Mahajan, D. (2019). Large-scale weakly-supervised pre-training for video action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12046–12055). Ghadiyaram, D., Tran, D., & Mahajan, D. (2019). Large-scale weakly-supervised pre-training for video action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12046–12055).
go back to reference Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space–time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.CrossRef Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space–time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.CrossRef
go back to reference Gowda, S. N., Rohrbach, M., & Sevilla-Lara, L. (2021). Smart frame selection for action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 1451–1459). Gowda, S. N., Rohrbach, M., & Sevilla-Lara, L. (2021). Smart frame selection for action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 1451–1459).
go back to reference Goyal, R., Ebrahimi Kahou, S., Michalski, V., Materzynska, J., Westphal, S., Kim, H., et al. (2017). The “something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision (pp. 5842–5850). Goyal, R., Ebrahimi Kahou, S., Michalski, V., Materzynska, J., Westphal, S., Kim, H., et al. (2017). The “something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision (pp. 5842–5850).
go back to reference Gu, C., Sun, C., Ross, D. A., Vondrick, C., Pantofaru, C., Li, Y., et al. (2018). Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6047–6056). Gu, C., Sun, C., Ross, D. A., Vondrick, C., Pantofaru, C., Li, Y., et al. (2018). Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6047–6056).
go back to reference Guo, C., Li, C., Guo, J., Loy, C.C., Hou, J., Kwong, S., & Cong, R. (2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1780–1789). Guo, C., Li, C., Guo, J., Loy, C.C., Hou, J., Kwong, S., & Cong, R. (2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1780–1789).
go back to reference Guo, X., Li, Y., & Ling, H. (2016). Lime: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing, 26(2), 982–993.MathSciNetCrossRef Guo, X., Li, Y., & Ling, H. (2016). Lime: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing, 26(2), 982–993.MathSciNetCrossRef
go back to reference Hara, K., Kataoka, H., & Satoh, Y. (2018). Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6546–6555). Hara, K., Kataoka, H., & Satoh, Y. (2018). Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6546–6555).
go back to reference He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
go back to reference He, Y., Lin, J., Liu, Z., Wang, H., Li, L. J., & Han, S. (2018). Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European conference on computer vision (ECCV) (pp. 784–800). He, Y., Lin, J., Liu, Z., Wang, H., Li, L. J., & Han, S. (2018). Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European conference on computer vision (ECCV) (pp. 784–800).
go back to reference Hira, S., Das, R., Modi, A., & Pakhomov, D. (2021). Delta sampling r-bert for limited data and low-light action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops (pp. 853–862). Hira, S., Das, R., Modi, A., & Pakhomov, D. (2021). Delta sampling r-bert for limited data and low-light action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops (pp. 853–862).
go back to reference Huang, Z., Shi, X., Zhang, C., Wang, Q., Cheung, K.C., Qin, H., et al. (2022). Flowformer: A transformer architecture for optical flow. In Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, proceedings, part XVII (pp. 668–685). Huang, Z., Shi, X., Zhang, C., Wang, Q., Cheung, K.C., Qin, H., et al. (2022). Flowformer: A transformer architecture for optical flow. In Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, proceedings, part XVII (pp. 668–685).
go back to reference Hui, T. W., Tang, X., & Loy, C. C. (2018). Liteflownet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8981–8989). Hui, T. W., Tang, X., & Loy, C. C. (2018). Liteflownet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8981–8989).
go back to reference Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.CrossRef Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.CrossRef
go back to reference Jiang, H., & Zheng, Y. (2019). Learning to see moving objects in the dark. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7324–7333). Jiang, H., & Zheng, Y. (2019). Learning to see moving objects in the dark. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7324–7333).
go back to reference Kalfaoglu, M. E., Kalkan, S., & Alatan, A. A. (2020). Late temporal modeling in 3d cnn architectures with bert for action recognition. In European conference on computer vision (pp. 731–747). Kalfaoglu, M. E., Kalkan, S., & Alatan, A. A. (2020). Late temporal modeling in 3d cnn architectures with bert for action recognition. In European conference on computer vision (pp. 731–747).
go back to reference Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1725–1732). Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1725–1732).
go back to reference Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., et al. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950. Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., et al. (2017). The kinetics human action video dataset. arXiv preprint arXiv:​1705.​06950.
go back to reference Khowaja, S. A., & Lee, S. L. (2020). Hybrid and hierarchical fusion networks: A deep cross-modal learning architecture for action recognition. Neural Computing and Applications, 32(14), 10423–10434.CrossRef Khowaja, S. A., & Lee, S. L. (2020). Hybrid and hierarchical fusion networks: A deep cross-modal learning architecture for action recognition. Neural Computing and Applications, 32(14), 10423–10434.CrossRef
go back to reference Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5), 1366–1401.CrossRef Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5), 1366–1401.CrossRef
go back to reference Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). Hmdb: A large video database for human motion recognition. In 2011 international conference on computer vision (pp. 2556–2563). Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). Hmdb: A large video database for human motion recognition. In 2011 international conference on computer vision (pp. 2556–2563).
go back to reference Kumar Dwivedi, S., Gupta, V., Mitra, R., Ahmed, S., & Jain, A. (2019). Protogan: Towards few shot learning for action recognition. In Proceedings of the IEEE/CVF international conference on computer vision workshops (pp. 0–0). Kumar Dwivedi, S., Gupta, V., Mitra, R., Ahmed, S., & Jain, A. (2019). Protogan: Towards few shot learning for action recognition. In Proceedings of the IEEE/CVF international conference on computer vision workshops (pp. 0–0).
go back to reference Li, Y., Ji, B., Shi, X., Zhang, J., Kang, B., & Wang, L. (2020). Tea: Temporal excitation and aggregation for action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 909–918). Li, Y., Ji, B., Shi, X., Zhang, J., Kang, B., & Wang, L. (2020). Tea: Temporal excitation and aggregation for action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 909–918).
go back to reference Lin, J., Gan, C., & Han, S. (2019). Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7083–7093). Lin, J., Gan, C., & Han, S. (2019). Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7083–7093).
go back to reference Liu, J., Xu, D., Yang, W., Fan, M., & Huang, H. (2021). Benchmarking low-light image enhancement and beyond. International Journal of Computer Vision, 129(4), 1153–1184.CrossRef Liu, J., Xu, D., Yang, W., Fan, M., & Huang, H. (2021). Benchmarking low-light image enhancement and beyond. International Journal of Computer Vision, 129(4), 1153–1184.CrossRef
go back to reference Liu, K., Liu, W., Ma, H., Huang, W., & Dong, X. (2019). Generalized zero-shot learning for action recognition with web-scale video data. World Wide Web, 22(2), 807–824.CrossRef Liu, K., Liu, W., Ma, H., Huang, W., & Dong, X. (2019). Generalized zero-shot learning for action recognition with web-scale video data. World Wide Web, 22(2), 807–824.CrossRef
go back to reference Loh, Y. P., & Chan, C. S. (2019). Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding, 178, 30–42.CrossRef Loh, Y. P., & Chan, C. S. (2019). Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding, 178, 30–42.CrossRef
go back to reference Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105). Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105).
go back to reference Lv, F., Li, Y., & Lu, F. (2021). Attention guided low-light image enhancement with a large scale low-light simulation dataset. International Journal of Computer Vision, 129(7), 2175–2193.CrossRef Lv, F., Li, Y., & Lu, F. (2021). Attention guided low-light image enhancement with a large scale low-light simulation dataset. International Journal of Computer Vision, 129(7), 2175–2193.CrossRef
go back to reference Ma, C., Yang, C. Y., Yang, X., & Yang, M. H. (2017). Learning a no-reference quality metric for single-image super-resolution. Computer Vision and Image Understanding, 158, 1–16.CrossRef Ma, C., Yang, C. Y., Yang, X., & Yang, M. H. (2017). Learning a no-reference quality metric for single-image super-resolution. Computer Vision and Image Understanding, 158, 1–16.CrossRef
go back to reference Mishra, A., Verma, V. K., Reddy, M. S. K., Arulkumar, S., Rai, P., & Mittal, A. (2018). A generative approach to zero-shot and few-shot action recognition. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 372–380). Mishra, A., Verma, V. K., Reddy, M. S. K., Arulkumar, S., Rai, P., & Mittal, A. (2018). A generative approach to zero-shot and few-shot action recognition. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 372–380).
go back to reference Mittal, A., Soundararajan, R., & Bovik, A. C. (2012). Making a “completely blind’’ image quality analyzer. IEEE Signal Processing Letters, 20(3), 209–212.CrossRef Mittal, A., Soundararajan, R., & Bovik, A. C. (2012). Making a “completely blind’’ image quality analyzer. IEEE Signal Processing Letters, 20(3), 209–212.CrossRef
go back to reference Monfort, M., Andonian, A., Zhou, B., Ramakrishnan, K., Bargal, S. A., Yan, T., et al. (2019). Moments in time dataset: One million videos for event understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 502–508.CrossRef Monfort, M., Andonian, A., Zhou, B., Ramakrishnan, K., Bargal, S. A., Yan, T., et al. (2019). Moments in time dataset: One million videos for event understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 502–508.CrossRef
go back to reference Munro, J., & Damen, D. (2020). Multi-modal domain adaptation for fine-grained action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 122–132). Munro, J., & Damen, D. (2020). Multi-modal domain adaptation for fine-grained action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 122–132).
go back to reference Pan, B., Cao, Z., Adeli, E., & Niebles, J. C. (2020). Adversarial cross-domain action recognition with co-attention. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 11815–11822). Pan, B., Cao, Z., Adeli, E., & Niebles, J. C. (2020). Adversarial cross-domain action recognition with co-attention. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 11815–11822).
go back to reference Pan, T., Song, Y., Yang, T., Jiang, W., & Liu, W. (2021). Videomoco: Contrastive video representation learning with temporally adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11205–11214). Pan, T., Song, Y., Yang, T., Jiang, W., & Liu, W. (2021). Videomoco: Contrastive video representation learning with temporally adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11205–11214).
go back to reference Pan, Y., Xu, J., Wang, M., Ye, J., Wang, F., Bai, K., & Xu, Z. (2019). Compressing recurrent neural networks with tensor ring for action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 4683–4690). Pan, Y., Xu, J., Wang, M., Ye, J., Wang, F., Bai, K., & Xu, Z. (2019). Compressing recurrent neural networks with tensor ring for action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 4683–4690).
go back to reference Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 8026–8037. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 8026–8037.
go back to reference Qian, R., Meng, T., Gong, B., Yang, M. H., Wang, H., Belongie, S., & Cui, Y. (2021). Spatiotemporal contrastive video representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6964–6974). Qian, R., Meng, T., Gong, B., Yang, M. H., Wang, H., Belongie, S., & Cui, Y. (2021). Spatiotemporal contrastive video representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6964–6974).
go back to reference Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4161–4170). Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4161–4170).
go back to reference Royer, E., Lhuillier, M., Dhome, M., & Lavest, J. M. (2007). Monocular vision for mobile robot localization and autonomous navigation. International Journal of Computer Vision, 74(3), 237–260.CrossRef Royer, E., Lhuillier, M., Dhome, M., & Lavest, J. M. (2007). Monocular vision for mobile robot localization and autonomous navigation. International Journal of Computer Vision, 74(3), 237–260.CrossRef
go back to reference Saito, K., Watanabe, K., Ushiku, Y., & Harada, T. (2018). Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3723–3732). Saito, K., Watanabe, K., Ushiku, Y., & Harada, T. (2018). Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3723–3732).
go back to reference Sayed, N., Brattoli, B., & Ommer, B. (2018). Cross and learn: Cross-modal self-supervision. In German conference on pattern recognition (pp. 228–243). Sayed, N., Brattoli, B., & Ommer, B. (2018). Cross and learn: Cross-modal self-supervision. In German conference on pattern recognition (pp. 228–243).
go back to reference Sheth, D. Y., Mohan, S., Vincent, J. L., Manzorro, R., Crozier, P. A., Khapra, M. M., et al. (2021). Unsupervised deep video denoising. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1759–1768). Sheth, D. Y., Mohan, S., Vincent, J. L., Manzorro, R., Crozier, P. A., Khapra, M. M., et al. (2021). Unsupervised deep video denoising. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1759–1768).
go back to reference Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (pp. 568–576). Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (pp. 568–576).
go back to reference Singh, A., Chakraborty, O., Varshney, A., Panda, R., Feris, R., Saenko, K., & Das, A. (2021). Semi-supervised action recognition with temporal contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10389–10399). Singh, A., Chakraborty, O., Varshney, A., Panda, R., Feris, R., Saenko, K., & Das, A. (2021). Semi-supervised action recognition with temporal contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10389–10399).
go back to reference Soomro, K., Zamir, A. R., & Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402. Soomro, K., Zamir, A. R., & Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:​1212.​0402.
go back to reference Sultani, W., & Saleemi, I. (2014). Human action recognition across datasets by foreground-weighted histogram decomposition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 764–771). Sultani, W., & Saleemi, I. (2014). Human action recognition across datasets by foreground-weighted histogram decomposition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 764–771).
go back to reference Sun, D., Yang, X., Liu, M. Y., & Kautz, J. (2019). Models matter, so does training: An empirical study of cnns for optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(6), 1408–1423.CrossRef Sun, D., Yang, X., Liu, M. Y., & Kautz, J. (2019). Models matter, so does training: An empirical study of cnns for optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(6), 1408–1423.CrossRef
go back to reference Tassano, M., Delon, J., & Veit, T. (2020). Fastdvdnet: Towards real-time deep video denoising without flow estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1354–1363). Tassano, M., Delon, J., & Veit, T. (2020). Fastdvdnet: Towards real-time deep video denoising without flow estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1354–1363).
go back to reference Tran, D., Wang, H., Torresani, L., & Feiszli, M. (2019). Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5552–5561). Tran, D., Wang, H., Torresani, L., & Feiszli, M. (2019). Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5552–5561).
go back to reference Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6450–6459). https://doi.org/10.1109/CVPR.2018.00675 Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6450–6459). https://​doi.​org/​10.​1109/​CVPR.​2018.​00675
go back to reference Ullah, A., Muhammad, K., Ding, W., Palade, V., Haq, I. U., & Baik, S. W. (2021). Efficient activity recognition using lightweight cnn and ds-gru network for surveillance applications. Applied Soft Computing, 103, 107102.CrossRef Ullah, A., Muhammad, K., Ding, W., Palade, V., Haq, I. U., & Baik, S. W. (2021). Efficient activity recognition using lightweight cnn and ds-gru network for surveillance applications. Applied Soft Computing, 103, 107102.CrossRef
go back to reference Wang, J., Jiao, J., & Liu, Y. H. (2020). Self-supervised video representation learning by pace prediction. In European conference on computer vision (pp. 504–521). Wang, J., Jiao, J., & Liu, Y. H. (2020). Self-supervised video representation learning by pace prediction. In European conference on computer vision (pp. 504–521).
go back to reference Wang, L., Koniusz, P., & Huynh, D. Q. (2019). Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8698–8708). Wang, L., Koniusz, P., & Huynh, D. Q. (2019). Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8698–8708).
go back to reference Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision (pp. 20–36). Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision (pp. 20–36).
go back to reference Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794–7803). Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794–7803).
go back to reference Weinland, D., Boyer, E., & Ronfard, R. (2007). Action recognition from arbitrary views using 3d exemplars. In 2007 IEEE 11th international conference on computer vision (pp. 1–7). Weinland, D., Boyer, E., & Ronfard, R. (2007). Action recognition from arbitrary views using 3d exemplars. In 2007 IEEE 11th international conference on computer vision (pp. 1–7).
go back to reference Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., & Zhuang, Y. (2019). Self-supervised spatiotemporal learning via video clip order prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10334–10343). Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., & Zhuang, Y. (2019). Self-supervised spatiotemporal learning via video clip order prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10334–10343).
go back to reference Xu, H., Zhang, J., Cai, J., Rezatofighi, H., & Tao, D. (2022). Gmflow: Learning optical flow via global matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8121–8130). Xu, H., Zhang, J., Cai, J., Rezatofighi, H., & Tao, D. (2022). Gmflow: Learning optical flow via global matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8121–8130).
go back to reference Xu, X., Hospedales, T., & Gong, S. (2017). Transductive zero-shot action recognition by word-vector embedding. International Journal of Computer Vision, 123(3), 309–333.MathSciNetCrossRef Xu, X., Hospedales, T., & Gong, S. (2017). Transductive zero-shot action recognition by word-vector embedding. International Journal of Computer Vision, 123(3), 309–333.MathSciNetCrossRef
go back to reference Xu, Y., Yang, J., Cao, H., Chen, Z., Li, Q., & Mao, K. (2021). Partial video domain adaptation with partial adversarial temporal attentive network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9332–9341). Xu, Y., Yang, J., Cao, H., Chen, Z., Li, Q., & Mao, K. (2021). Partial video domain adaptation with partial adversarial temporal attentive network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9332–9341).
go back to reference Xu, Y., Yang, J., Cao, H., Mao, K., Yin, J., & See, S. (2021). Arid: A new dataset for recognizing action in the dark. In International workshop on deep learning for human activity recognition (pp. 70–84). Xu, Y., Yang, J., Cao, H., Mao, K., Yin, J., & See, S. (2021). Arid: A new dataset for recognizing action in the dark. In International workshop on deep learning for human activity recognition (pp. 70–84).
go back to reference Yang, J., Zou, H., Jiang, H., & Xie, L. (2018). Device-free occupant activity sensing using wifi-enabled iot devices for smart homes. IEEE Internet of Things Journal, 5(5), 3991–4002.CrossRef Yang, J., Zou, H., Jiang, H., & Xie, L. (2018). Device-free occupant activity sensing using wifi-enabled iot devices for smart homes. IEEE Internet of Things Journal, 5(5), 3991–4002.CrossRef
go back to reference Yao, Y., Liu, C., Luo, D., Zhou, Y., & Ye, Q. (2020). Video playback rate perception for self-supervised spatio-temporal representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). Yao, Y., Liu, C., Luo, D., Zhou, Y., & Ye, Q. (2020). Video playback rate perception for self-supervised spatio-temporal representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR).
go back to reference Ying, Z., Li, G., Ren, Y., Wang, R., & Wang, W. (2017). A new image contrast enhancement algorithm using exposure fusion framework. In International conference on computer analysis of images and patterns (pp. 36–46). Ying, Z., Li, G., Ren, Y., Wang, R., & Wang, W. (2017). A new image contrast enhancement algorithm using exposure fusion framework. In International conference on computer analysis of images and patterns (pp. 36–46).
go back to reference Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime tv-l 1 optical flow. In Joint pattern recognition symposium (pp. 214–223). Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime tv-l 1 optical flow. In Joint pattern recognition symposium (pp. 214–223).
go back to reference Zhang, F., Li, Y., You, S., & Fu, Y. (2021). Learning temporal consistency for low light video enhancement from single images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4967–4976). Zhang, F., Li, Y., You, S., & Fu, Y. (2021). Learning temporal consistency for low light video enhancement from single images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4967–4976).
go back to reference Zhang, S., Zhang, Y., Jiang, Z., Zou, D., Ren, J., & Zhou, B. (2020). Learning to see in the dark with events. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part XVIII 16 (pp. 666–682). Zhang, S., Zhang, Y., Jiang, Z., Zou, D., Ren, J., & Zhou, B. (2020). Learning to see in the dark with events. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part XVIII 16 (pp. 666–682).
go back to reference Zheng, Y., Zhang, M., & Lu, F. (2020). Optical flow in the dark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6749–6757). Zheng, Y., Zhang, M., & Lu, F. (2020). Optical flow in the dark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6749–6757).
go back to reference Zou, H., Yang, J., Prasanna Das, H., Liu, H., Zhou, Y., & Spanos, C. J. (2019). Wifi and vision multimodal learning for accurate and robust device-free human activity recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 0–0). Zou, H., Yang, J., Prasanna Das, H., Liu, H., Zhou, Y., & Spanos, C. J. (2019). Wifi and vision multimodal learning for accurate and robust device-free human activity recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 0–0).
Metadata
Title
Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study
Authors
Yuecong Xu
Haozhi Cao
Jianxiong Yin
Zhenghua Chen
Xiaoli Li
Zhengguo Li
Qianwen Xu
Jianfei Yang
Publication date
08-11-2023
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 4/2024
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-023-01932-5

Other articles of this Issue 4/2024

International Journal of Computer Vision 4/2024 Go to the issue

Premium Partner