Skip to main content

International Journal of Computer Vision OnlineFirst articles

Open Access 07-05-2024

3D-MuPPET: 3D Multi-Pigeon Pose Estimation and Tracking

Markerless methods for animal posture tracking have been rapidly developing recently, but frameworks and benchmarks for tracking large animal groups in 3D are still lacking. To overcome this gap in the literature, we present 3D-MuPPET, a framework …

Authors:
Urs Waldmann, Alex Hoi Hang Chan, Hemal Naik, Máté Nagy, Iain D. Couzin, Oliver Deussen, Bastian Goldluecke, Fumihiro Kano

07-05-2024

Physics-Driven Spectrum-Consistent Federated Learning for Palmprint Verification

Palmprint as biometrics has gained increasing attention recently due to its discriminative ability and robustness. However, existing methods mainly improve palmprint verification within one spectrum, which is challenging to verify across different …

Authors:
Ziyuan Yang, Andrew Beng Jin Teoh, Bob Zhang, Lu Leng, Yi Zhang

06-05-2024

L3AM: Linear Adaptive Additive Angular Margin Loss for Video-Based Hand Gesture Authentication

Feature extractors significantly impact the performance of biometric systems. In the field of hand gesture authentication, existing studies focus on improving the model architectures and behavioral characteristic representation methods to enhance …

Authors:
Wenwei Song, Wenxiong Kang, Adams Wai-Kin Kong, Yufeng Zhang, Yitao Qiao

Open Access 06-05-2024

Meet JEANIE: A Similarity Measure for 3D Skeleton Sequences via Temporal-Viewpoint Alignment

Video sequences exhibit significant nuisance variations (undesired effects) of speed of actions, temporal locations, and subjects’ poses, leading to temporal-viewpoint misalignment when comparing two sets of frames or evaluating the similarity of …

Authors:
Lei Wang, Jun Liu, Liang Zheng, Tom Gedeon, Piotr Koniusz

01-05-2024

Scaling Up Multi-domain Semantic Segmentation with Sentence Embeddings

The state-of-the-art semantic segmentation methods have achieved impressive performance on predefined close-set individual datasets, but their generalization to zero-shot domains and unseen categories is limited. Labeling a large-scale dataset is …

Authors:
Wei Yin, Yifan Liu, Chunhua Shen, Baichuan Sun, Anton van den Hengel

30-04-2024

A Causal Inspired Early-Branching Structure for Domain Generalization

Learning domain-invariant semantic representations is crucial for achieving domain generalization (DG), where a model is required to perform well on unseen target domains. One critical challenge is that standard training often results in entangled …

Authors:
Liang Chen, Yong Zhang, Yibing Song, Zhen Zhang, Lingqiao Liu

Open Access 30-04-2024

Species-Agnostic Patterned Animal Re-identification by Aggregating Deep Local Features

Access to large image volumes through camera traps and crowdsourcing provides novel possibilities for animal monitoring and conservation. It calls for automatic methods for analysis, in particular, when re-identifying individual animals from the …

Authors:
Ekaterina Nepovinnykh, Ilia Chelak, Tuomas Eerola, Veikka Immonen, Heikki Kälviäinen, Maksim Kholiavchenko, Charles V. Stewart

Open Access 29-04-2024

Matching Compound Prototypes for Few-Shot Action Recognition

The task of few-shot action recognition aims to recognize novel action classes using only a small number of labeled training samples. How to better describe the action in each video and how to compare the similarity between videos are two of the …

Authors:
Yifei Huang, Lijin Yang, Guo Chen, Hongjie Zhang, Feng Lu, Yoichi Sato

27-04-2024

Domain-Agnostic Priors for Semantic Segmentation Under Unsupervised Domain Adaptation and Domain Generalization

In computer vision, an important challenge to deep neural networks comes from adjusting the varying properties of different image domains. To study this problem, researchers have been investigating a practical setting in which the deep neural …

Authors:
Xinyue Huo, Lingxi Xie, Hengtong Hu, Wengang Zhou, Houqiang Li, Qi Tian

26-04-2024

Light Flickering Guided Reflection Removal

When photographing through a piece of glass, reflections usually degrade the quality of captured images or videos. In this paper, by exploiting periodically varying light flickering, we investigate the problem of removing strong reflections from …

Authors:
Yuchen Hong, Yakun Chang, Jinxiu Liang, Lei Ma, Tiejun Huang, Boxin Shi

25-04-2024

PIE: Physics-Inspired Low-Light Enhancement

In this paper, we propose a physics-inspired contrastive learning paradigm for low-light enhancement, called PIE. PIE primarily addresses three issues: (i) To resolve the problem of existing learning-based methods often training a LLE model with …

Authors:
Dong Liang, Zhengyan Xu, Ling Li, Mingqiang Wei, Songcan Chen

Open Access 24-04-2024

WildCLIP: Scene and Animal Attribute Retrieval from Camera Trap Data with Domain-Adapted Vision-Language Models

Wildlife observation with camera traps has great potential for ethology and ecology, as it gathers data non-invasively in an automated way. However, camera traps produce large amounts of uncurated data, which is time-consuming to annotate.

Authors:
Valentin Gabeff, Marc Rußwurm, Devis Tuia, Alexander Mathis

24-04-2024 | Editorial

Guest Editorial: Special Issue on the British Machine Vision Conference 2022

This special issue in the International Journal of Computer Vision is dedicated to the 33rd British Machine Vision Conference, held from the 21st to the 24th of November 2022 in London (the Kia Oval, Cricket Ground), UK. The articles included in …

Authors:
Guang Yang, Angelica Aviles-Rivero, Yingying Fang, Zhenhua Feng, Gianluigi Ciocca, Yulia Hicks, Constantino Carlos Reyes-Aldasoro

24-04-2024

Descriptor Distillation: A Teacher-Student-Regularized Framework for Learning Local Descriptors

Learning a fast and discriminative patch descriptor is a challenging topic in computer vision. Recently, many existing works focus on training various descriptor learning networks by minimizing a triplet loss (or its variants), which is expected …

Authors:
Yuzhen Liu, Qiulei Dong

24-04-2024

I2DFormer+: Learning Image to Document Summary Attention for Zero-Shot Image Classification

Despite the tremendous progress in zero-shot learning (ZSL), the majority of existing methods still rely on human-annotated attributes, which are difficult to annotate and scale. An unsupervised alternative is to represent each class using the …

Authors:
Muhammad Ferjad Naeem, Yongqin Xian, Luc Van Gool, Federico Tombari

24-04-2024

An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification

Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques. However, the existing benchmark datasets lack diversity, and models trained on these data cannot generalize well to dynamic wild scenarios.

Authors:
Lei Zhang, Xiaowei Fu, Fuxiang Huang, Yi Yang, Xinbo Gao

24-04-2024 | Editorial

Guest Editorial: Special Issue on Traditional Computer Vision in the Age of Deep Learning

Authors:
Matteo Poggi, Federica Arrigoni, Andrea Fusiello, Stefano Mattoccia, Adrien Bartoli, Torsten Sattler, Tomas Pajdla

24-04-2024

Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering

Incomplete multi-modal clustering (IMmC) is challenging due to the unexpected missing of some modalities in data. A key to this problem is to explore complementarity information among different samples with incomplete information of unpaired data.

Authors:
Yu Wang, Xinjie Yao, Pengfei Zhu, Weihao Li, Meng Cao, Qinghua Hu

24-04-2024

MutualFormer: Multi-modal Representation Learning via Cross-Diffusion Attention

Aggregating multi-modal data to obtain reliable data representation attracts more and more attention. Recent studies demonstrate that Transformer models usually work well for multi-modal tasks. Existing Transformers generally either adopt the …

Authors:
Xixi Wang, Xiao Wang, Bo Jiang, Jin Tang, Bin Luo

24-04-2024

Position, Padding and Predictions: A Deeper Look at Position Information in CNNs

In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent. Theoretically, an implication of this fact is that a filter may know …

Authors:
Md Amirul Islam, Matthew Kowal, Sen Jia, Konstantinos G. Derpanis, Neil D. B. Bruce