Skip to main content
Erschienen in: International Journal of Computer Vision 4/2024

08.11.2023

Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study

verfasst von: Yuecong Xu, Haozhi Cao, Jianxiong Yin, Zhenghua Chen, Xiaoli Li, Zhengguo Li, Qianwen Xu, Jianfei Yang

Erschienen in: International Journal of Computer Vision | Ausgabe 4/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

While action recognition (AR) has gained large improvements with the introduction of large-scale video datasets and the development of deep neural networks, AR models robust to challenging environments in real-world scenarios are still under-explored. We focus on the task of action recognition in dark environments, which can be applied to fields such as surveillance and autonomous driving at night. Intuitively, current deep networks along with visual enhancement techniques should be able to handle AR in dark environments, however, it is observed that this is not always the case in practice. To dive deeper into exploring solutions for AR in dark environments, we launched the \(\hbox {UG}^{2}{+}\) Challenge Track 2 (UG2-2) in IEEE CVPR 2021, with a goal of evaluating and advancing the robustness of AR models in dark environments. The challenge builds and expands on top of a novel ARID dataset, the first dataset for the task of dark video AR, and guides models to tackle such a task in both fully and semi-supervised manners. Baseline results utilizing current AR models and enhancement methods are reported, justifying the challenging nature of this task with substantial room for improvements. Thanks to the active participation from the research community, notable advances have been made in participants’ solutions, while analysis of these solutions helped better identify possible directions to tackle the challenge of AR in dark environments.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Anaya, J., & Barbu, A. (2018). Renoir: A dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation, 51, 144–154.CrossRef Anaya, J., & Barbu, A. (2018). Renoir: A dataset for real low-light image noise reduction. Journal of Visual Communication and Image Representation, 51, 144–154.CrossRef
Zurück zum Zitat Beddiar, D. R., Nini, B., Sabokrou, M., & Hadid, A. (2020). Vision-based human activity recognition: A survey. Multimedia Tools and Applications, 79, 30509–30555.CrossRef Beddiar, D. R., Nini, B., Sabokrou, M., & Hadid, A. (2020). Vision-based human activity recognition: A survey. Multimedia Tools and Applications, 79, 30509–30555.CrossRef
Zurück zum Zitat Blau, Y., & Michaeli, T. (2018). The perception-distortion tradeoff. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6228–6237). Blau, Y., & Michaeli, T. (2018). The perception-distortion tradeoff. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6228–6237).
Zurück zum Zitat Bo, Y., Lu, Y., & He, W. (2020). Few-shot learning of video action recognition only based on video contents. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 595–604). Bo, Y., Lu, Y., & He, W. (2020). Few-shot learning of video action recognition only based on video contents. In Proceedings of the IEEE/CVF winter conference on applications of computer vision (pp. 595–604).
Zurück zum Zitat Busto, P. P., Iqbal, A., & Gall, J. (2018). Open set domain adaptation for image and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 413–429.CrossRef Busto, P. P., Iqbal, A., & Gall, J. (2018). Open set domain adaptation for image and action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 413–429.CrossRef
Zurück zum Zitat Butler, D. J., Wulff, J., Stanley, G. B., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In Computer vision–ECCV 2012: 12th European conference on computer vision, Florence, Italy, October 7–13, 2012, proceedings, part VI 12 (pp. 611–625). Butler, D. J., Wulff, J., Stanley, G. B., & Black, M. J. (2012). A naturalistic open source movie for optical flow evaluation. In Computer vision–ECCV 2012: 12th European conference on computer vision, Florence, Italy, October 7–13, 2012, proceedings, part VI 12 (pp. 611–625).
Zurück zum Zitat Cao, D., & Xu, L. (2020). Bypass enhancement rgb stream model for pedestrian action recognition of autonomous vehicles. Pattern recognition (pp. 12–19). Springer. Cao, D., & Xu, L. (2020). Bypass enhancement rgb stream model for pedestrian action recognition of autonomous vehicles. Pattern recognition (pp. 12–19). Springer.
Zurück zum Zitat Cao, H., Xu, Y., Yang, J., Mao, K., Xie, L., Yin, J., & See, S. (2021). Self-supervised video representation learning by video incoherence detection. arXiv preprint arXiv:2109.12493. Cao, H., Xu, Y., Yang, J., Mao, K., Xie, L., Yin, J., & See, S. (2021). Self-supervised video representation learning by video incoherence detection. arXiv preprint arXiv:​2109.​12493.
Zurück zum Zitat Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., & Zisserman, A. (2018). A short note about kinetics-600. arXiv preprint arXiv:1808.01340. Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., & Zisserman, A. (2018). A short note about kinetics-600. arXiv preprint arXiv:​1808.​01340.
Zurück zum Zitat Carreira, J., Noland, E., Hillier, C., & Zisserman, A. (2019). A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987. Carreira, J., Noland, E., Hillier, C., & Zisserman, A. (2019). A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:​1907.​06987.
Zurück zum Zitat Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6299–6308). Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the kinetics dataset. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6299–6308).
Zurück zum Zitat Chen, C., Chen, Q., Do, M. N., & Koltun, V. (2019). Seeing motion in the dark. In Proceedings of the IEEE international conference on computer vision (pp. 3185–3194). Chen, C., Chen, Q., Do, M. N., & Koltun, V. (2019). Seeing motion in the dark. In Proceedings of the IEEE international conference on computer vision (pp. 3185–3194).
Zurück zum Zitat Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in the dark. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3291–3300). Chen, C., Chen, Q., Xu, J., & Koltun, V. (2018). Learning to see in the dark. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3291–3300).
Zurück zum Zitat Chen, L., Ma, N., Wang, P., Li, J., Wang, P., Pang, G., & Shi, X. (2020). Survey of pedestrian action recognition techniques for autonomous driving. Tsinghua Science and Technology, 25(4), 458–470.CrossRef Chen, L., Ma, N., Wang, P., Li, J., Wang, P., Pang, G., & Shi, X. (2020). Survey of pedestrian action recognition techniques for autonomous driving. Tsinghua Science and Technology, 25(4), 458–470.CrossRef
Zurück zum Zitat Chen, R., Chen, J., Liang, Z., Gao, H., & Lin, S. (2021). Darklight networks for action recognition in the dark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 846–852). Chen, R., Chen, J., Liang, Z., Gao, H., & Lin, S. (2021). Darklight networks for action recognition in the dark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 846–852).
Zurück zum Zitat Chen, Y. L., Wu, B. F., Huang, H. Y., & Fan, C. J. (2010). A real-time vision system for nighttime vehicle detection and traffic surveillance. IEEE Transactions on Industrial Electronics, 58(5), 2030–2044.CrossRef Chen, Y. L., Wu, B. F., Huang, H. Y., & Fan, C. J. (2010). A real-time vision system for nighttime vehicle detection and traffic surveillance. IEEE Transactions on Industrial Electronics, 58(5), 2030–2044.CrossRef
Zurück zum Zitat Choi, J., Sharma, G., Schulter, S., & Huang, J. B. (2020). Shuffle and attend: Video domain adaptation. In European conference on computer vision (pp. 678–695). Choi, J., Sharma, G., Schulter, S., & Huang, J. B. (2020). Shuffle and attend: Video domain adaptation. In European conference on computer vision (pp. 678–695).
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255).
Zurück zum Zitat Fahad, L. G., & Rajarajan, M. (2015). Integration of discriminative and generative models for activity recognition in smart homes. Applied Soft Computing, 37, 992–1001.CrossRef Fahad, L. G., & Rajarajan, M. (2015). Integration of discriminative and generative models for activity recognition in smart homes. Applied Soft Computing, 37, 992–1001.CrossRef
Zurück zum Zitat Feichtenhofer, C. (2020). X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 203–213). Feichtenhofer, C. (2020). X3d: Expanding architectures for efficient video recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 203–213).
Zurück zum Zitat Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6202–6211). Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6202–6211).
Zurück zum Zitat Feng, S., Setoodeh, P., & Haykin, S. (2017). Smart home: Cognitive interactive people-centric internet of things. IEEE Communications Magazine, 55(2), 34–39.CrossRef Feng, S., Setoodeh, P., & Haykin, S. (2017). Smart home: Cognitive interactive people-centric internet of things. IEEE Communications Magazine, 55(2), 34–39.CrossRef
Zurück zum Zitat Fernando, B., Bilen, H., Gavves, E., & Gould, S. (2017). Self-supervised video representation learning with odd-one-out networks. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 3636–3645). Fernando, B., Bilen, H., Gavves, E., & Gould, S. (2017). Self-supervised video representation learning with odd-one-out networks. In Proceedings of the ieee conference on computer vision and pattern recognition (pp. 3636–3645).
Zurück zum Zitat Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (pp. 1180–1189). Ganin, Y., & Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (pp. 1180–1189).
Zurück zum Zitat Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., et al. (2016). Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1), 2096–2030.MathSciNet Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., et al. (2016). Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 17(1), 2096–2030.MathSciNet
Zurück zum Zitat Ghadiyaram, D., Tran, D., & Mahajan, D. (2019). Large-scale weakly-supervised pre-training for video action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12046–12055). Ghadiyaram, D., Tran, D., & Mahajan, D. (2019). Large-scale weakly-supervised pre-training for video action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12046–12055).
Zurück zum Zitat Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space–time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.CrossRef Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space–time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.CrossRef
Zurück zum Zitat Gowda, S. N., Rohrbach, M., & Sevilla-Lara, L. (2021). Smart frame selection for action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 1451–1459). Gowda, S. N., Rohrbach, M., & Sevilla-Lara, L. (2021). Smart frame selection for action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 1451–1459).
Zurück zum Zitat Goyal, R., Ebrahimi Kahou, S., Michalski, V., Materzynska, J., Westphal, S., Kim, H., et al. (2017). The “something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision (pp. 5842–5850). Goyal, R., Ebrahimi Kahou, S., Michalski, V., Materzynska, J., Westphal, S., Kim, H., et al. (2017). The “something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision (pp. 5842–5850).
Zurück zum Zitat Gu, C., Sun, C., Ross, D. A., Vondrick, C., Pantofaru, C., Li, Y., et al. (2018). Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6047–6056). Gu, C., Sun, C., Ross, D. A., Vondrick, C., Pantofaru, C., Li, Y., et al. (2018). Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6047–6056).
Zurück zum Zitat Guo, C., Li, C., Guo, J., Loy, C.C., Hou, J., Kwong, S., & Cong, R. (2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1780–1789). Guo, C., Li, C., Guo, J., Loy, C.C., Hou, J., Kwong, S., & Cong, R. (2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1780–1789).
Zurück zum Zitat Guo, X., Li, Y., & Ling, H. (2016). Lime: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing, 26(2), 982–993.MathSciNetCrossRef Guo, X., Li, Y., & Ling, H. (2016). Lime: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing, 26(2), 982–993.MathSciNetCrossRef
Zurück zum Zitat Hara, K., Kataoka, H., & Satoh, Y. (2018). Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6546–6555). Hara, K., Kataoka, H., & Satoh, Y. (2018). Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6546–6555).
Zurück zum Zitat He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
Zurück zum Zitat He, Y., Lin, J., Liu, Z., Wang, H., Li, L. J., & Han, S. (2018). Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European conference on computer vision (ECCV) (pp. 784–800). He, Y., Lin, J., Liu, Z., Wang, H., Li, L. J., & Han, S. (2018). Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European conference on computer vision (ECCV) (pp. 784–800).
Zurück zum Zitat Hira, S., Das, R., Modi, A., & Pakhomov, D. (2021). Delta sampling r-bert for limited data and low-light action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops (pp. 853–862). Hira, S., Das, R., Modi, A., & Pakhomov, D. (2021). Delta sampling r-bert for limited data and low-light action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) workshops (pp. 853–862).
Zurück zum Zitat Huang, Z., Shi, X., Zhang, C., Wang, Q., Cheung, K.C., Qin, H., et al. (2022). Flowformer: A transformer architecture for optical flow. In Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, proceedings, part XVII (pp. 668–685). Huang, Z., Shi, X., Zhang, C., Wang, Q., Cheung, K.C., Qin, H., et al. (2022). Flowformer: A transformer architecture for optical flow. In Computer vision–ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, proceedings, part XVII (pp. 668–685).
Zurück zum Zitat Hui, T. W., Tang, X., & Loy, C. C. (2018). Liteflownet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8981–8989). Hui, T. W., Tang, X., & Loy, C. C. (2018). Liteflownet: A lightweight convolutional neural network for optical flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8981–8989).
Zurück zum Zitat Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.CrossRef Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.CrossRef
Zurück zum Zitat Jiang, H., & Zheng, Y. (2019). Learning to see moving objects in the dark. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7324–7333). Jiang, H., & Zheng, Y. (2019). Learning to see moving objects in the dark. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7324–7333).
Zurück zum Zitat Kalfaoglu, M. E., Kalkan, S., & Alatan, A. A. (2020). Late temporal modeling in 3d cnn architectures with bert for action recognition. In European conference on computer vision (pp. 731–747). Kalfaoglu, M. E., Kalkan, S., & Alatan, A. A. (2020). Late temporal modeling in 3d cnn architectures with bert for action recognition. In European conference on computer vision (pp. 731–747).
Zurück zum Zitat Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1725–1732). Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1725–1732).
Zurück zum Zitat Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., et al. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950. Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., et al. (2017). The kinetics human action video dataset. arXiv preprint arXiv:​1705.​06950.
Zurück zum Zitat Khowaja, S. A., & Lee, S. L. (2020). Hybrid and hierarchical fusion networks: A deep cross-modal learning architecture for action recognition. Neural Computing and Applications, 32(14), 10423–10434.CrossRef Khowaja, S. A., & Lee, S. L. (2020). Hybrid and hierarchical fusion networks: A deep cross-modal learning architecture for action recognition. Neural Computing and Applications, 32(14), 10423–10434.CrossRef
Zurück zum Zitat Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5), 1366–1401.CrossRef Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5), 1366–1401.CrossRef
Zurück zum Zitat Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). Hmdb: A large video database for human motion recognition. In 2011 international conference on computer vision (pp. 2556–2563). Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., & Serre, T. (2011). Hmdb: A large video database for human motion recognition. In 2011 international conference on computer vision (pp. 2556–2563).
Zurück zum Zitat Kumar Dwivedi, S., Gupta, V., Mitra, R., Ahmed, S., & Jain, A. (2019). Protogan: Towards few shot learning for action recognition. In Proceedings of the IEEE/CVF international conference on computer vision workshops (pp. 0–0). Kumar Dwivedi, S., Gupta, V., Mitra, R., Ahmed, S., & Jain, A. (2019). Protogan: Towards few shot learning for action recognition. In Proceedings of the IEEE/CVF international conference on computer vision workshops (pp. 0–0).
Zurück zum Zitat Li, Y., Ji, B., Shi, X., Zhang, J., Kang, B., & Wang, L. (2020). Tea: Temporal excitation and aggregation for action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 909–918). Li, Y., Ji, B., Shi, X., Zhang, J., Kang, B., & Wang, L. (2020). Tea: Temporal excitation and aggregation for action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 909–918).
Zurück zum Zitat Lin, J., Gan, C., & Han, S. (2019). Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7083–7093). Lin, J., Gan, C., & Han, S. (2019). Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 7083–7093).
Zurück zum Zitat Liu, J., Xu, D., Yang, W., Fan, M., & Huang, H. (2021). Benchmarking low-light image enhancement and beyond. International Journal of Computer Vision, 129(4), 1153–1184.CrossRef Liu, J., Xu, D., Yang, W., Fan, M., & Huang, H. (2021). Benchmarking low-light image enhancement and beyond. International Journal of Computer Vision, 129(4), 1153–1184.CrossRef
Zurück zum Zitat Liu, K., Liu, W., Ma, H., Huang, W., & Dong, X. (2019). Generalized zero-shot learning for action recognition with web-scale video data. World Wide Web, 22(2), 807–824.CrossRef Liu, K., Liu, W., Ma, H., Huang, W., & Dong, X. (2019). Generalized zero-shot learning for action recognition with web-scale video data. World Wide Web, 22(2), 807–824.CrossRef
Zurück zum Zitat Loh, Y. P., & Chan, C. S. (2019). Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding, 178, 30–42.CrossRef Loh, Y. P., & Chan, C. S. (2019). Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding, 178, 30–42.CrossRef
Zurück zum Zitat Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105). Long, M., Cao, Y., Wang, J., & Jordan, M. (2015). Learning transferable features with deep adaptation networks. In International conference on machine learning (pp. 97–105).
Zurück zum Zitat Lv, F., Li, Y., & Lu, F. (2021). Attention guided low-light image enhancement with a large scale low-light simulation dataset. International Journal of Computer Vision, 129(7), 2175–2193.CrossRef Lv, F., Li, Y., & Lu, F. (2021). Attention guided low-light image enhancement with a large scale low-light simulation dataset. International Journal of Computer Vision, 129(7), 2175–2193.CrossRef
Zurück zum Zitat Ma, C., Yang, C. Y., Yang, X., & Yang, M. H. (2017). Learning a no-reference quality metric for single-image super-resolution. Computer Vision and Image Understanding, 158, 1–16.CrossRef Ma, C., Yang, C. Y., Yang, X., & Yang, M. H. (2017). Learning a no-reference quality metric for single-image super-resolution. Computer Vision and Image Understanding, 158, 1–16.CrossRef
Zurück zum Zitat Mishra, A., Verma, V. K., Reddy, M. S. K., Arulkumar, S., Rai, P., & Mittal, A. (2018). A generative approach to zero-shot and few-shot action recognition. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 372–380). Mishra, A., Verma, V. K., Reddy, M. S. K., Arulkumar, S., Rai, P., & Mittal, A. (2018). A generative approach to zero-shot and few-shot action recognition. In 2018 IEEE winter conference on applications of computer vision (WACV) (pp. 372–380).
Zurück zum Zitat Mittal, A., Soundararajan, R., & Bovik, A. C. (2012). Making a “completely blind’’ image quality analyzer. IEEE Signal Processing Letters, 20(3), 209–212.CrossRef Mittal, A., Soundararajan, R., & Bovik, A. C. (2012). Making a “completely blind’’ image quality analyzer. IEEE Signal Processing Letters, 20(3), 209–212.CrossRef
Zurück zum Zitat Monfort, M., Andonian, A., Zhou, B., Ramakrishnan, K., Bargal, S. A., Yan, T., et al. (2019). Moments in time dataset: One million videos for event understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 502–508.CrossRef Monfort, M., Andonian, A., Zhou, B., Ramakrishnan, K., Bargal, S. A., Yan, T., et al. (2019). Moments in time dataset: One million videos for event understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 502–508.CrossRef
Zurück zum Zitat Munro, J., & Damen, D. (2020). Multi-modal domain adaptation for fine-grained action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 122–132). Munro, J., & Damen, D. (2020). Multi-modal domain adaptation for fine-grained action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 122–132).
Zurück zum Zitat Pan, B., Cao, Z., Adeli, E., & Niebles, J. C. (2020). Adversarial cross-domain action recognition with co-attention. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 11815–11822). Pan, B., Cao, Z., Adeli, E., & Niebles, J. C. (2020). Adversarial cross-domain action recognition with co-attention. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 11815–11822).
Zurück zum Zitat Pan, T., Song, Y., Yang, T., Jiang, W., & Liu, W. (2021). Videomoco: Contrastive video representation learning with temporally adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11205–11214). Pan, T., Song, Y., Yang, T., Jiang, W., & Liu, W. (2021). Videomoco: Contrastive video representation learning with temporally adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 11205–11214).
Zurück zum Zitat Pan, Y., Xu, J., Wang, M., Ye, J., Wang, F., Bai, K., & Xu, Z. (2019). Compressing recurrent neural networks with tensor ring for action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 4683–4690). Pan, Y., Xu, J., Wang, M., Ye, J., Wang, F., Bai, K., & Xu, Z. (2019). Compressing recurrent neural networks with tensor ring for action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 4683–4690).
Zurück zum Zitat Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 8026–8037. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 8026–8037.
Zurück zum Zitat Qian, R., Meng, T., Gong, B., Yang, M. H., Wang, H., Belongie, S., & Cui, Y. (2021). Spatiotemporal contrastive video representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6964–6974). Qian, R., Meng, T., Gong, B., Yang, M. H., Wang, H., Belongie, S., & Cui, Y. (2021). Spatiotemporal contrastive video representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6964–6974).
Zurück zum Zitat Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4161–4170). Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4161–4170).
Zurück zum Zitat Royer, E., Lhuillier, M., Dhome, M., & Lavest, J. M. (2007). Monocular vision for mobile robot localization and autonomous navigation. International Journal of Computer Vision, 74(3), 237–260.CrossRef Royer, E., Lhuillier, M., Dhome, M., & Lavest, J. M. (2007). Monocular vision for mobile robot localization and autonomous navigation. International Journal of Computer Vision, 74(3), 237–260.CrossRef
Zurück zum Zitat Saito, K., Watanabe, K., Ushiku, Y., & Harada, T. (2018). Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3723–3732). Saito, K., Watanabe, K., Ushiku, Y., & Harada, T. (2018). Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3723–3732).
Zurück zum Zitat Sayed, N., Brattoli, B., & Ommer, B. (2018). Cross and learn: Cross-modal self-supervision. In German conference on pattern recognition (pp. 228–243). Sayed, N., Brattoli, B., & Ommer, B. (2018). Cross and learn: Cross-modal self-supervision. In German conference on pattern recognition (pp. 228–243).
Zurück zum Zitat Sheth, D. Y., Mohan, S., Vincent, J. L., Manzorro, R., Crozier, P. A., Khapra, M. M., et al. (2021). Unsupervised deep video denoising. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1759–1768). Sheth, D. Y., Mohan, S., Vincent, J. L., Manzorro, R., Crozier, P. A., Khapra, M. M., et al. (2021). Unsupervised deep video denoising. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1759–1768).
Zurück zum Zitat Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (pp. 568–576). Simonyan, K., & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems (pp. 568–576).
Zurück zum Zitat Singh, A., Chakraborty, O., Varshney, A., Panda, R., Feris, R., Saenko, K., & Das, A. (2021). Semi-supervised action recognition with temporal contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10389–10399). Singh, A., Chakraborty, O., Varshney, A., Panda, R., Feris, R., Saenko, K., & Das, A. (2021). Semi-supervised action recognition with temporal contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10389–10399).
Zurück zum Zitat Soomro, K., Zamir, A. R., & Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402. Soomro, K., Zamir, A. R., & Shah, M. (2012). Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:​1212.​0402.
Zurück zum Zitat Sultani, W., & Saleemi, I. (2014). Human action recognition across datasets by foreground-weighted histogram decomposition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 764–771). Sultani, W., & Saleemi, I. (2014). Human action recognition across datasets by foreground-weighted histogram decomposition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 764–771).
Zurück zum Zitat Sun, D., Yang, X., Liu, M. Y., & Kautz, J. (2019). Models matter, so does training: An empirical study of cnns for optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(6), 1408–1423.CrossRef Sun, D., Yang, X., Liu, M. Y., & Kautz, J. (2019). Models matter, so does training: An empirical study of cnns for optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(6), 1408–1423.CrossRef
Zurück zum Zitat Tassano, M., Delon, J., & Veit, T. (2020). Fastdvdnet: Towards real-time deep video denoising without flow estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1354–1363). Tassano, M., Delon, J., & Veit, T. (2020). Fastdvdnet: Towards real-time deep video denoising without flow estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1354–1363).
Zurück zum Zitat Tran, D., Wang, H., Torresani, L., & Feiszli, M. (2019). Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5552–5561). Tran, D., Wang, H., Torresani, L., & Feiszli, M. (2019). Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 5552–5561).
Zurück zum Zitat Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6450–6459). https://doi.org/10.1109/CVPR.2018.00675 Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6450–6459). https://​doi.​org/​10.​1109/​CVPR.​2018.​00675
Zurück zum Zitat Ullah, A., Muhammad, K., Ding, W., Palade, V., Haq, I. U., & Baik, S. W. (2021). Efficient activity recognition using lightweight cnn and ds-gru network for surveillance applications. Applied Soft Computing, 103, 107102.CrossRef Ullah, A., Muhammad, K., Ding, W., Palade, V., Haq, I. U., & Baik, S. W. (2021). Efficient activity recognition using lightweight cnn and ds-gru network for surveillance applications. Applied Soft Computing, 103, 107102.CrossRef
Zurück zum Zitat Wang, J., Jiao, J., & Liu, Y. H. (2020). Self-supervised video representation learning by pace prediction. In European conference on computer vision (pp. 504–521). Wang, J., Jiao, J., & Liu, Y. H. (2020). Self-supervised video representation learning by pace prediction. In European conference on computer vision (pp. 504–521).
Zurück zum Zitat Wang, L., Koniusz, P., & Huynh, D. Q. (2019). Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8698–8708). Wang, L., Koniusz, P., & Huynh, D. Q. (2019). Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 8698–8708).
Zurück zum Zitat Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision (pp. 20–36). Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016). Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision (pp. 20–36).
Zurück zum Zitat Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794–7803). Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7794–7803).
Zurück zum Zitat Wei, C., Wang, W., Yang, W., & Liu, J. (2018). Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560. Wei, C., Wang, W., Yang, W., & Liu, J. (2018). Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:​1808.​04560.
Zurück zum Zitat Weinland, D., Boyer, E., & Ronfard, R. (2007). Action recognition from arbitrary views using 3d exemplars. In 2007 IEEE 11th international conference on computer vision (pp. 1–7). Weinland, D., Boyer, E., & Ronfard, R. (2007). Action recognition from arbitrary views using 3d exemplars. In 2007 IEEE 11th international conference on computer vision (pp. 1–7).
Zurück zum Zitat Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., & Zhuang, Y. (2019). Self-supervised spatiotemporal learning via video clip order prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10334–10343). Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., & Zhuang, Y. (2019). Self-supervised spatiotemporal learning via video clip order prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10334–10343).
Zurück zum Zitat Xu, H., Zhang, J., Cai, J., Rezatofighi, H., & Tao, D. (2022). Gmflow: Learning optical flow via global matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8121–8130). Xu, H., Zhang, J., Cai, J., Rezatofighi, H., & Tao, D. (2022). Gmflow: Learning optical flow via global matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8121–8130).
Zurück zum Zitat Xu, X., Hospedales, T., & Gong, S. (2017). Transductive zero-shot action recognition by word-vector embedding. International Journal of Computer Vision, 123(3), 309–333.MathSciNetCrossRef Xu, X., Hospedales, T., & Gong, S. (2017). Transductive zero-shot action recognition by word-vector embedding. International Journal of Computer Vision, 123(3), 309–333.MathSciNetCrossRef
Zurück zum Zitat Xu, Y., Yang, J., Cao, H., Chen, Z., Li, Q., & Mao, K. (2021). Partial video domain adaptation with partial adversarial temporal attentive network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9332–9341). Xu, Y., Yang, J., Cao, H., Chen, Z., Li, Q., & Mao, K. (2021). Partial video domain adaptation with partial adversarial temporal attentive network. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9332–9341).
Zurück zum Zitat Xu, Y., Yang, J., Cao, H., Mao, K., Yin, J., & See, S. (2021). Arid: A new dataset for recognizing action in the dark. In International workshop on deep learning for human activity recognition (pp. 70–84). Xu, Y., Yang, J., Cao, H., Mao, K., Yin, J., & See, S. (2021). Arid: A new dataset for recognizing action in the dark. In International workshop on deep learning for human activity recognition (pp. 70–84).
Zurück zum Zitat Yang, J., Zou, H., Jiang, H., & Xie, L. (2018). Device-free occupant activity sensing using wifi-enabled iot devices for smart homes. IEEE Internet of Things Journal, 5(5), 3991–4002.CrossRef Yang, J., Zou, H., Jiang, H., & Xie, L. (2018). Device-free occupant activity sensing using wifi-enabled iot devices for smart homes. IEEE Internet of Things Journal, 5(5), 3991–4002.CrossRef
Zurück zum Zitat Yao, Y., Liu, C., Luo, D., Zhou, Y., & Ye, Q. (2020). Video playback rate perception for self-supervised spatio-temporal representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). Yao, Y., Liu, C., Luo, D., Zhou, Y., & Ye, Q. (2020). Video playback rate perception for self-supervised spatio-temporal representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Ying, Z., Li, G., Ren, Y., Wang, R., & Wang, W. (2017). A new image contrast enhancement algorithm using exposure fusion framework. In International conference on computer analysis of images and patterns (pp. 36–46). Ying, Z., Li, G., Ren, Y., Wang, R., & Wang, W. (2017). A new image contrast enhancement algorithm using exposure fusion framework. In International conference on computer analysis of images and patterns (pp. 36–46).
Zurück zum Zitat Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime tv-l 1 optical flow. In Joint pattern recognition symposium (pp. 214–223). Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime tv-l 1 optical flow. In Joint pattern recognition symposium (pp. 214–223).
Zurück zum Zitat Zhang, F., Li, Y., You, S., & Fu, Y. (2021). Learning temporal consistency for low light video enhancement from single images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4967–4976). Zhang, F., Li, Y., You, S., & Fu, Y. (2021). Learning temporal consistency for low light video enhancement from single images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4967–4976).
Zurück zum Zitat Zhang, S., Zhang, Y., Jiang, Z., Zou, D., Ren, J., & Zhou, B. (2020). Learning to see in the dark with events. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part XVIII 16 (pp. 666–682). Zhang, S., Zhang, Y., Jiang, Z., Zou, D., Ren, J., & Zhou, B. (2020). Learning to see in the dark with events. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part XVIII 16 (pp. 666–682).
Zurück zum Zitat Zheng, Y., Zhang, M., & Lu, F. (2020). Optical flow in the dark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6749–6757). Zheng, Y., Zhang, M., & Lu, F. (2020). Optical flow in the dark. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6749–6757).
Zurück zum Zitat Zou, H., Yang, J., Prasanna Das, H., Liu, H., Zhou, Y., & Spanos, C. J. (2019). Wifi and vision multimodal learning for accurate and robust device-free human activity recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 0–0). Zou, H., Yang, J., Prasanna Das, H., Liu, H., Zhou, Y., & Spanos, C. J. (2019). Wifi and vision multimodal learning for accurate and robust device-free human activity recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 0–0).
Metadaten
Titel
Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study
verfasst von
Yuecong Xu
Haozhi Cao
Jianxiong Yin
Zhenghua Chen
Xiaoli Li
Zhengguo Li
Qianwen Xu
Jianfei Yang
Publikationsdatum
08.11.2023
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 4/2024
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-023-01932-5

Weitere Artikel der Ausgabe 4/2024

International Journal of Computer Vision 4/2024 Zur Ausgabe

Premium Partner