Top

International Journal of Machine Learning and Cybernetics

Published in:

Open Access 10-12-2022 | Original Article

Complete solution for vehicle Re-ID in surround-view camera system

Authors: Zizhang Wu, Tianhao Xu, Fan Wang, Xiaoquan Wang, Jing Song

Published in: International Journal of Machine Learning and Cybernetics | Issue 5/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

Vehicle re-identification (Re-ID) is a critical component of the autonomous driving perception system, and research in this area has accelerated in recent years. However, there is yet no perfect solution to the vehicle re-identification issue associated with the car’s surround-view camera system. Our analysis identifies two significant issues in the aforementioned scenario: (1) It is difficult to identify the same vehicle in many picture frames due to the unique construction of the fisheye camera. (2) The appearance of the same vehicle when seen via the surround vision system’s several cameras is rather different. To overcome these issues, we suggest an integrative vehicle Re-ID solution method. On the one hand, we provide a technique for determining the consistency of the tracking box drift with respect to the target. On the other hand, we combine a Re-ID network based on the attention mechanism with spatial limitations to increase performance in situations involving multiple cameras. Finally, our approach combines state-of-the-art accuracy with real-time performance. We will soon make the source code and annotated fisheye dataset available.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

With the advent of autonomous driving, significant efforts have been devoted in the computer vision community to vehicle-related research. Especially, vehicle Re-ID is one of the most active research fields aiming at identifying the same vehicle across the image archive captured from different cameras. However, vehicle Re-ID in real-world scenarios is still a challenging task in the computer vision field despite its long success. Consequently, it is desirable to seek an effective and robust vehicle Re-ID method, which is the prerequisite for achieving the trajectory prediction, state estimation, and speed estimation of the target vehicle.

Existing Re-ID studies mainly focus on the pedestrian Re-ID and the vehicle Re-ID. Unlike pedestrian Re-ID [36‐38] which extract rich features from images with different poses and colors, vehicle Re-ID faces more severe challenges. Vehicles captured by cameras suffer from distortion, occlusion, individual similarity, etc. Although the pioneering deep learning-based methods [14, 21, 39, 42] can learn global semantic features from an entire vehicle image, they still fail to distinguish some small but discriminative regions. Therefore, Liu et al. [20], Shen et al. [29], Antonio Marin-Reyes et al. [1] utilized Spatio-temporal relations to learn the similarity between vehicle images, which boost the performance of Re-ID. However, the Spatio-temporal information is not annotated in all existing datasets, so some restrictions exist for exploring these methods. To improve the adaptability of vehicle Re-ID methods to different scenarios, Liu et al. [19], Liu et al. [22], Wang et al. [33] employ vehicle attributes to learn the correlations among vehicles. Although the vehicle attributes are more general and discriminating than other features, the relationship between vehicle attributes and categories is ignored. Subsequently, Zheng et al. [45] introduced the attention mechanism in the attribute branch. This operation is conducive to selecting attributes for corresponding input vehicle images, which helps the category branch select better discriminative features for category recognition.

Despite the tremendous progress in current vehicle Re-ID methods, most are designed for surveillance scenarios. While in autonomous driving, surround-view camera systems have become more and more popular. To achieve a 360$^{\circ }$ blind-spot free environment perception, we mount multiple fisheye cameras around the vehicle, as shown in Fig.1. Therefore, it is essential to learn the relationship between the same vehicle among different vehicle-mounted cameras and different frames in a single camera. We state two main challenges for achieving a vehicle Re-ID solution in the surround-view multi-camera scenario:

In a single-camera view, vehicle features in consecutive frames vary dramatically due to fisheye distortion, occlusion, truncation, etc. It is difficult to recognize the same vehicle from the past image archive under such interference.

In a multi-camera view, the appearance of the same vehicle varies significantly from different viewpoints. The individual similarity of vehicles also led to great confusion in matching.

In this paper, we propose our methods respectively to these challenges. For the balance of accuracy and efficiency, SiamRPN++ network [15] and BDBnet [6] are employed for vehicle re-identification in single-camera and multi-camera, respectively. However, these two models are mainly designed for surveillance scenes and are unable to deal with the significant variations in vehicle appearance in the fisheye system. Therefore, a post-process module named quality evaluation mechanism for the output bounding box is proposed to alleviate the target drift caused by fisheye distortion, occlusion, etc. Besides, an attention module and a spatial constraint strategy are introduced to respond to the intra-class difference and inter-class similarity of vehicles [19]. To drive the study of vehicle Re-ID in surround-view scenarios and fill the gap in the relative dataset, we will release the large-scale annotated fisheye dataset.

Our contributions can be summarized as follows:

We provide an integrated vehicle Re-ID solution for the multi-camera surround-view scenario.
We propose a technique for evaluating the output bounding box’s quality, which can alleviate the issue of target drift.
A novel spatial constraint strategy is introduced for regularizing the Re-ID results in the surround-view camera system.
A large-scale fisheye dataset with annotations is provided to aid in promoting relevant research.

This section summarises the literature on vehicle re-identification techniques and tracking algorithms, both of which are closely connected to our study.

Vehicle Re-ID Vehicle Re-ID has been widely studied in recent years. As well-known, the common challenge is how to deal with the inter-class similarity, and the intra-class difference [19]. Different vehicles have a similar appearance, while the same vehicle looks different due to the diverse perspectives and distortion. Until now, lots of work have been proposed to tackle this challenge. Liu et al. [19] designed a pipeline, which adopts deep relative distance learning (DRDL) to project vehicle images into Euclidean space, then calculate the distance in Euclidean space to measure the similarity of two-vehicle images. Liu et al. [20] created a dataset named VeRi-776, which employs visual features, license plates, and spatial-temporal information to explore the Re-ID task of vehicles. Meanwhile, further works [1, 10, 23, 29, 35] introduced spatial-temporal information to boost the performance of Re-ID. Shen et al. [29] proposed a two-stage framework, which utilizes complex spatial-temporal information of vehicles to regularize Re-ID results effectively. Subsequently, Antonio Marin-Reyes et al. [1] used spatial prior knowledge to generate the tracklet and selected the vehicle in the middle frame as a feature of the tracklet. Furthermore, Lv et al. [23] utilized location information between cameras to improve the accuracy of Re-ID. In light of the success of using location information, we exploit a novel spatial constraint strategy to enhance the Re-ID results in the surround-view camera system.

Since the spatial-temporal information in datasets is not always available, approaches for Re-ID have been proposed by combining local and global features of targets [6, 8, 9, 13, 14, 17]. For instance, Ghiasi et al. [9] designed a network with the combination of the local and global branches, which employs joint feature vectors to enhance robustness on occlusion problems. Different from the DropBlock, Dai et al. [6] proposed Batch DropBlock that drops the same region in a batch of the image to accomplish the metric learning task better. Kuma et al. [14] enabled the model to perform intensive learning of local features from the loss function aspect. Inspired by the attention mechanism, methods of processing the local and global information are more flexible [16, 30, 31, 41, 46]. In this paper, we also utilize the attention mechanism to make the model focus on target regions and use triplet loss [13] and softmax loss to enhance the performance of vehicle Re-ID.

In some complex scenarios, the relative positions between vehicles are constantly changing, so the real-time performance of the model is also critical. Chu et al. [5] proposed a perspective-aware metric learning method for extreme viewpoint variations, in which a viewpoint-aware network (VANet) learned two metrics for comparable and dissimilar perspectived using two feature spaces, respectively. Chen et al. [4] proposed a Semantics-guided Part Attention Network (SPAN) using semantic labels to predict attention masks at different angles of the vehicle to extract discriminative features for each component. Liu et al. [18] considered that pairs of vehicles differ visually in large intervals and proposed a Self-Attention Stair Feature Fusion model to learn discriminative features to capture the details of the images. In addition to the recognition difficulties caused by the viewing angle, lighting is also an important factor affecting accuracy. Ma et al. [24] proposed a refined part model to learn feature embeddings to automatically localized vehicles through a Grid Spatial Transformer Network (GSTN). The above methods and our proposed model are based on deep learning methods. Therefore, we performed a real-time performance comparison. Chu et al. [5] Chen et al. [4] Meng et al. [26] show the FPS of the model. In addition, the amount of parameter [18] [24] [28] of the model is crucial in the application, because the arithmetic power of the electronic control unit(ECU) is limited.

Tracking algorithms related with Vehicle Re-ID Object tracking algorithm plays an important role in the implementation of vehicle Re-ID scheme [3, 12, 43, 44]. For object tracking tasks, the trackers based on Siamese network [2, 11, 32, 34, 47] have received significant attentions for their well-balanced tracking accuracy and efficiency, which is of great value for boosting vehicle Re-ID task in the surround-view multi-camera scenario. The seminal work [2, 11, 32] formulated visual tracking as a cross-correlation problem and produced a tracking similarity map from the depth model with the Siamese network, to find the location of the target by comparing the similarity between the target and the search region. However, these works have an inherent drawback, as their tracking accuracies on the OTB benchmark [40] still leave a relatively large gap with state-of-the-art deep trackers like ECO [7] and MDNet [27]. To overcome this drawback, Li et al. [15] transferred the object position in the bounding box during the training phase to avoid the location bias of the network. Thus, the network can focus on the object marginal area of the search region, and the accuracy of Siamese trackers is boosted significantly. Besides, this approach also proposed a lightweight depth-wise cross-correlation layer to improve the running speed. We adopt the SiamRPN++ model to realize the vehicle Re-ID task in a single camera considering its great performance.

3 Vehicle Re-ID in surround-view camera system

We separate the vehicle Re-ID task into two subtasks in the construction of the surround-view multi-camera system: single-camera Re-ID and multi-camera Re-ID. This section will describe our methodologies in detail for each subtask. The parameters in the method are selected based on our statistical analysis of data distribution and experimental results. They are evaluated on the validation dataset to select a group of parameters with relatively optimal effect.

3.1 Vehicle Re-ID in single camera

The single-camera vehicle Re-ID task aims at matching vehicles from the same view in consecutive frames. We utilize SiamRPN++ [15] as the single tracker model and place such tracker for each target to realize Re-ID in a single camera. Despite the great success of SiamRPN++, we observe that it still fails when the target suffers from a large distortion rate in different positions of the camera, which enlarges differences between target features in different frames. Besides, occlusion between targets leads to more complexity of target location information. Failure cases are shown in Fig. 2(b). To circumvent the challenges above, we propose the novel post-process method for data association as follows.

3.1.1 Quality Evaluation Mechanism

Unlike the tracking task, the vehicle Re-ID task is less constrained by the bounding box’s size and is more sensitive to the bounding box center’s drift level. As a consequence, post-processing techniques for tracking outcomes must be tailored to the Re-ID task’s needs. We offer a unique quality rating system for monitoring outcomes that is inspired by the attention mechanism.

Center Drift It is essential to update the tracking template to adapt to target movement. IoU and confidence of the output bounding box in consecutive frames are usually taken as indicators to update the template dynamically. Comparatively, the Re-ID process pays more attention to the center drift of the target. In Fig. 2(b), severe center drift results in many matching errors. However, IoU is not an appropriate metric for the task here. In Fig. 2(c), the center of the target vehicle is stable while IoU decreases continuously because of the changing size of the bounding box. Updating templates in such circumstances may consume more resources and have a higher risk of wrong predictions. To alleviate this problem, we define a center drift metric, $IoU_R$, to measure the drift level of the target center.

$$\begin{aligned} IoU_R=\frac{S}{2R^2-S}, \end{aligned}$$

(1)

where R is the side length of the orange square in the output bounding box center as shown in Fig. 2(c), we set it as a constant in experiments. S is the intersection area between tracking results of the same target in consecutive frames.

Re-ID Confidence. The confidence score output from the tracking process is used for ranking the bounding boxes. However, it is closely related to the center position and size of the bounding box, which is improper for Re-ID processes. Therefore, we suggest Re-ID confidence ($C_R$) to verify the accuracy of Re-ID results as Eq. (2).

$$\begin{aligned} C_R=C_T \times IoU_R, \end{aligned}$$

(2)

where $C_T$ is the tracking confidence. The drift level can down weight the scores of bounding boxes far from the previous center of an object.

We define the conditions for updating the tracking template based on $IoU_R$ and $C_R$ as follows:

$$\begin{aligned} IoU_{RM}<T_1, C_{RM}<T_2, \end{aligned}$$

(3)

where M represents the average result of consecutive M frames, $T_1$, $T_2$ are corresponding thresholds for $IoU_R$ and $C_R$. The template adaptive updating process adapted to the process of the Re-ID task is realized.

Occlusion Coefficient To compensate for the box drift induced by occlusion, we incorporate the occlusion coefficient as Eq. (4):

$$\begin{aligned} OC=\frac{I_N}{A}, \end{aligned}$$

(4)

where $I_N$ stands for the intersection area between two tracking results of objects in the same frame, A is the area of the object. An object is defined as severely occluded when OC is greater than the threshold $T_O$. When both objects have high overlapping rates, the object with lower $C_R$ is counted as occluded. We take consecutive frame results as the criterion for dealing with the occluded target as the position relation changes over time. The tracker and ID of the obscured object will be maintained until N consecutive frames of occlusion; otherwise, they would be removed forever.

The $IoU_R$, $C_R$ and OC constitute the quality evaluation mechanism, which processes the tracking results and optimizes the Re-ID performance in a single camera.

3.1.2 Framework of vehicle Re-ID in single camera

The overall framework of vehicle Re-ID in a single camera is depicted as Fig. 3. Each object is allocated a unique tracker, the template of which is populated with the outcome of the object detection. These trackers are responsible for the next frame and provide the quality assessment module with Re-ID findings. A non-qualified result triggers the need to amend the template and initiate another tracking procedure. Depending on the number of consecutive frames, the tracker and ID of obscured objects will be erased or temporarily retained.

3.2 Vehicle Re-ID in multi-camera

Vehicle Re-ID in multi-camera aims at building correlation between the same vehicle in different cameras. Most of the current methods employ a deep network to achieve Re-ID. However, in a surround-view camera system, cameras are mounted at different positions around the vehicle, resulting in the same object appears variously in different cameras, as shown in Fig. 4(a). Understanding the fact that adopting general deep learning networks is incapable of handling this problem. Therefore, we introduce an attention module in this paper that forces the network to pay more attention to target areas. Furthermore, different targets that appear in the same camera may have similar appearances, as shown in Fig. 4(b). It is challenging to distinguish these two black vehicles in front camera only using image-level features. Consequently, we introduce a novel spatial constraint strategy to handle this thorny problem.

3.2.1 Attention Module

Existing vehicle Re-ID methods mainly serve for surveillance scenarios. To meet the requirement of vehicle Re-ID for the surround-view camera system, we apply a modified BDBnet [6] as our multi-camera Re-ID model. BDBnet consists of a global branch and a local branch. Especially, a fixed mask is added to a local branch to help the network learn semantic features, and it is shown to be effective in pedestrian Re-ID. Different from pedestrian Re-ID, vehicle Re-ID for surround-view camera system suffers from deformation in a multi-camera system. Fixed templates are difficult to improve learning outcomes, so we introduce an attention module that leads the network to learn self-adaptive templates to focus on target regions. As shown in Fig. 5, the structure in the red box is the attention module. For each new target, the network is applied to extract features and measure the Euclidean distance between this feature and features stored in the feature gallery. Then the distance is converted to confidence score $s_1$ through Eq. (5).

$$\begin{aligned} s_1=ln\left( \frac{1}{D_F}+1\right) , \end{aligned}$$

(5)

where $D_F$ is the Euclidean distance between the feature of the new target and features stored in gallery.

3.2.2 Spatial Constraint Strategy

We introduce a novel spatial restriction technique for dealing with scenarios in which a single camera contains multiple similar targets. As previously stated, the wheel grounding key points of the same target in multiple cameras correspond to the same real-world coordinate location. Here we define the wheel grounding key points as the points where the wheel touches the ground. These points are special since their coordinates in the local coordinate system (ego-vehicle coordinate system) are P(x, y, 0). After obtaining the pixel coordinates of the contact points, we can get their physical coordinates in the real world coordinates through the Fisheye IPM algorithm [25]. However, the projected position varies due to external factors, such as a camera-mounted angle and calibration. We define the offset caused by these factors as projection uncertainty as shown in Fig. 6. Two key points of the same category which are projected into the overlapping area are determined to belong to the same vehicle. Furthermore, the error is decreased if the target gets closer to the camera, so we suggest a different standard for wheels in a different position. As presented in Eq. (6), we first calculate coordinate distances between key points and then convert them to score $s_2$:

$$\begin{aligned} s_2=ln\left( \frac{1}{D_K}+1\right) , \end{aligned}$$

(6)

where $D_K$ is the distance between the projection coordinate of the key points, and $D_K=D_f+D_r$, $D_f$ and $D_r$ are the projection coordinate distance of these two front key points and two rear key points, respectively.

3.2.3 Framework of vehicle Re-ID in multi-camera

The integral process of multi-camera vehicle Re-ID is shown in Fig.7. The first branch is used to obtain the confidence score $s_1$ of feature similarity metrics, and the second branch is used to obtain the confidence score $s_2$ of physical coordinate distances of key points. Finally, ID of the target with the highest score s is assigned to the new target.

$$\begin{aligned} s=\frac{\alpha s_1+\beta s_2}{\alpha +\beta }, \end{aligned}$$

(7)

where $\alpha$ and $\beta$ are set to 1 in following experiments.

The parameters in Eq. (7) are based on experimental results, which have been verified on our fisheye dataset. If more accurate wheel position coordinates can be obtained, the weight of physical distance fraction can be appropriately increased, while in the case of large wheel coordinate deviation areas or vehicles with close distances in congestion scenes, the weight of feature similarity can be increased to achieve better results.

The multi-camera ID association in this paper is implemented in a certain order. The association between the left camera and the front camera is carried out first, followed by the association between the front camera and the right camera. In detail, the process is as follows:

For the case that a new target appears on the left side of the left camera or the right side of the right camera, the association only needs to be carried out in their camera. If successful, we assigned the original ID for the new target. If not, we assign a new ID for this target immediately.

When a new target appears on other areas of the left or right camera, it just needs to be associated with the front camera.

When a new target appears on the right side of the front camera, only the right camera needs to be associated. Similarly, it just needs to be associated with the left camera for the new target appears on the left side of the front camera.

3.3 Overall framework

The surround-view multi-camera Re-ID system processes data of each camera serially. Each new object in a single channel will be assigned a single target tracker and matched with data of other channels according to the Re-ID strategy in multi-camera. If not matched successfully, a new ID will be created for it. Else, the ID of the matching object will be inherited by the newcomer. According to the Re-ID strategy in a single camera, all targets in a single camera will be matched in time sequence.

To facility the representation in engineering applications, it is necessary to use a vector to describe a target during single and multi-camera Re-ID stages. In the single-camera stage, the target-vehicle vector comprises the object id and tracker information such as the bounding boxes (X, Y, W, H) information and occlusion coefficient, which are independent in each camera. Then the target vector with id generated by the Re-ID method in the single-camera stage is sent to the multi-camera Re-ID stage. Since the same target may be observed in different channels, the goal of the multi-camera Re-ID stage is to fuse the same targets between channels and update the same identification numbers in the vectors. In this stage, new targets information, such as the wheel coordinates, can be generated and appended to the vectors. Finally, we could use vectors to represent all targets. Each vector contains a target’s complete information, including a unique identity number. The overall flow of the model is shown in the flowchart in Fig.9

4 Experiments

4.1 Dataset and evaluation metric

We generate the fisheye dataset from structured roads. It consists of a total of 22190 annotated images sampled from 36 sequences. Our dataset contains a large number of images with occluded and illuminated scenes, as shown in Fig.8. The resolution of the image is 1280$\times$720. We use the 80$\%$ and 20$\%$ images as training and testing sets. We utilize three cameras (i.e., front, left, and right) to capture these images at 30 frames per second (fps). The dataset is divided into amounts of image sequences. We select a group of image sequences that have about 80$\%$ of the total number of images and use these images as training sets. The rest of the image sequence serves as the verification set. The comparison with other approaches is shown in Table 1. They are deployed on the Qualcomm 820A platform with an Adreno 530 GPU and a Hexagon 680 DSP. The results prove our proposed approach is computationally efficient.

Table 1

The comparison with other approaches on speed

Methods	Speed (FPS)
VANET [Chu et al. [5]]	19
SPAN [Chen et al. [4]]	13
PVEN [Meng et al. [26]]	12
Ours	30

Table 2

The impact of the different methods updating tracking template. $IC_{front}$, $IC_{left}$ and $IC_{right}$ correspond to the IC of front, left and right cameras

Methods	$IC_{front}$	$IC_{left}$	$IC_{right}$
Default	0.82	0.81	0.87
+$IoU_T+C_T$	0.88	0.91	0.90
+$IoU_R+C_R$	0.94	0.96	0.97

We evaluate the results with the identity consistency (IC). It is formulated as

$$\begin{aligned} IC=1-\frac{\sum _tIDSW_t}{\sum _tID_t}, \end{aligned}$$

(8)

where t is the frame index, identity switch (IDSW) is counted if a ground truth target i is matched to tracking output j and the last known assignment is k, $k\ne j$. $ID_t$ is the sum of ground truth targets with ID in frame t.

4.2 Implementation details

We trained all networks with stochastic gradient descent (SGD) on a GTX 1080Ti.

For the single-camera Re-ID model SiamRPN++, 50 epochs with batch size 24 were trained with a learning rate of 0.0005, weight decay of 0.0001 and momentum of 0.9. For the multi-camera Re-ID model, 150 epochs with batch size 256 were trained with an initial learning rate of 0.00035 (fix for first ten epochs), weight decay of 0.9 and momentum of 0.0005. The hyperparameters values of the network, such as learning rate and batch size, are based on a group of parameters obtained by empirical tuning. The values of $\alpha$ and $\beta$ are obtained experimentally, is shown in Table 3

Table 3

The effect of setting the values of $\alpha$ and $\beta$ on $IC_{front}$, $IC_{left}$ and $IC_{right}$ in Eq. (7)

	$\alpha$ = 1, $\beta$ = 1	$\alpha$ = 2, $\beta$ = 1	$\alpha$ = 1, $\beta$ = 2
$IC_{front}$	0.94	0.95	0.90
$IC_{left}$	0.96	0.94	0.96
$IC_{right}$	0.97	0.97	0.93

4.3 Evaluation of the proposed method

Quality Evaluation Mechanism. The proposed quality evaluation mechanism is a key component to optimize Re-ID performance in a single camera. Therefore, we conduct some ablation studies on our dataset to find out the contribution of our method to performance.

Table 4

The comparison of our best result and other similar approaches on IC.

Approaches	$IC_{front}$	$IC_{left}$	$IC_{right}$
VANET	0.90	0.90	0.91
SPAN	0.91	0.93	0.92
PVEN	0.93	0.94	0.95
Our	0.94	0.96	0.97

We first compare the template updating metrics. As shown in Table 2, Default is our baseline model without updating templates. Updating the tracking template with $IoU_T$ and $C_T$ as metrics brings significant improvement in identity consistency. Furthermore, when utilizing the $IoU_R$ and $C_R$, the identity consistency is greatly improved once again. That means the revised metrics help the model pay more attention to the demands of the Re-ID task. To further evaluate the proposed method, we compared our model with other approaches as shown in Table 4. Meanwhile, we also present the number of parameters in Table 5, which shows our proposed method is cost-effective.

Table 5

The comparison with other approaches on amount of parameters

Methods	Parameters (M)
Satt [Liu et al. [18]]	31.09
GSTE [Ma et al. [24]]	32.73
VANET	10.9
HPGN [Shen et al. [28]]	27.71
Ours	8.19

The number of frames (N) to delete the occluded object are summarized in Table 6. Zero frame means that we do not handle the occluded cars. When $N=2$, the identity switch is decreased slightly. It is expected because deleting the temporarily occluded cars frequently contributes to the ID switch as its rapid reappearance. As N grows, the ID is gradually steady. However, maintaining too many frames means don’t handle occlusion, and the consistency descends once again. Experimental results show that $N = 4$ is optimal in this paper.

Discussion of robustness. The fisheye dataset verifies our proposed system’s robustness under different illumination and occlusion. The dataset was captured in different daytime periods, including diverse illumination and various occlusion conditions caused by different traffic conditions such as traffic jam scenarios. Table 6 shows the robustness of our method in an occlusion situation. Besides, the frame rate can reach about 30 fps in practical tests, which can meet real-time demand. However, it should also be noted that our dataset does not include rainy and night scenes.

Table 6

The impact of the different methods updating tracking template

N	$IC_{front}$	$IC_{left}$	$IC_{right}$
0	0.83	0.81	0.80
2	0.85	0.83	0.84
3	0.92	0.90	0.93
4	0.94	0.96	0.97
5	0.93	0.91	0.92

Vehicle Re-ID strategy in multi-camera. We examine the influence of various matching strategies in multi-camera in Table 7. All the experiments are based on the same implementations in a single view. The first row shows the method of matching with feature metrics as Dai et al. [6]. After introducing the attention module, the Re-ID accuracy has achieved a promising improvement. Based on that, adding the spatial constraints strategy improves, as shown in the last row.

Table 7

Ablation study in multi-camera Re-ID evaluated on proposed fisheye dataset

ID Matching strategy in multi-camera	$IC_{front}$	$IC_{left}$	$IC_{right}$
Baseline	0.84	0.85	0.86
+Attention module	0.87	0.92	0.89
+Spatial constraint strategy and Attention module	0.94	0.96	0.97

4.4 Evaluation of positioning error using in spatial constraint strategy

We conduct an experiment to evaluate the object position accuracy, which affects the effectiveness of the spatial constraint strategy. Since there is no ideal horizontal ground plane in practical application, we designed an experiment to evaluate the positioning accuracy applied in constraint strategy on the ground with a 5$\%$ slope gradient. We randomly select 12 objects distributing from $-2.5$ to 2.5 meters along x-direction and calculate their position errors. As shown in Fig.10a, the result demonstrates that all the errors are less than 30 cm. Meanwhile, 12 points are randomly selected in the range of 1.5–3.5 m along y-direction to analyze the position errors. As shown in Fig.10b, the position errors of objects are less than 20 cm. It can be concluded that our system has the capacity to work on the ground with a slight slope gradient, which proves the robustness of the proposed system in practical scenarios.

5 Conclusions

We present a comprehensive vehicle Re-ID approach for the surround-view multi-camera situation in this study. The introduced quality evaluation mechanism for the output bounding box can help eliminate distortion and occlusion-related issues. Moreover, we deploy an attention module to direct the network’s attention to certain locations. Additionally, a unique spatial restriction method is applied in this situation to regularise the Re-ID findings greatly. Extensive component analysis and comparisons on the fisheye dataset demonstrate that our vehicle Re-ID solution produces promising results. Our model achieved 30 FPS on the Qualcomm 820A platform and its number of parameters is only 8.19 million. Furthermore, the annotated fisheye dataset will be made publicly available to aid in advancing research in this area. We will continue to optimise the performance of vehicle Re-ID in the surround-view multi-camera situation in future investigations.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Generative face inpainting hashing for occluded face retrieval

next article SCMP-IL: an incremental learning method with super constraints on model parameters

Our product recommendations

ATZelectronics worldwide

ATZlectronics worldwide is up-to-speed on new trends and developments in automotive electronics on a scientific level with a high depth of information.

Order your 30-days-trial for free and without any commitment.

inform now

ATZelektronik

Die Fachzeitschrift ATZelektronik bietet für Entwickler und Entscheider in der Automobil- und Zulieferindustrie qualitativ hochwertige und fundierte Informationen aus dem gesamten Spektrum der Pkw- und Nutzfahrzeug-Elektronik.

Lassen Sie sich jetzt unverbindlich 2 kostenlose Ausgabe zusenden.

inform now

Antonio Marin-Reyes P, Palazzi A, Bergamini L, Calderara S, Lorenzo-Navarro J, Cucchiara R (2018) Unsupervised vehicle re-identification using triplet networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 166–171

Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, Springer, Berlin, pp 850–865

Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 2544–2550

Chen TS, Liu CT, Wu CW, Chien SY (2020) Orientation-aware vehicle re-identification with semantics-guided part attention network. In: European Conference on Computer Vision, Springer, pp 330–346

Chu R, Sun Y, Li Y, Liu Z, Zhang C, Wei Y (2019) Vehicle re-identification with viewpoint-aware metric learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8282–8291

Dai Z, Chen M, Gu X, Zhu S, Tan P (2019) Batch dropblock network for person re-identification and beyond. In: Proceedings of the IEEE International Conference on Computer Vision, pp 3691–3701

Danelljan M, Bhat G, Shahbaz Khan F, Felsberg M (2017) Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6638–6646

Farenzena M, Bazzani L, Perina A, Murino V, Cristani M (2010) Person re-identification by symmetry-driven accumulation of local features. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2360–2367. https://doi.org/10.1109/CVPR.2010.5539926

Ghiasi G, Lin TY, Le QV (2018) Dropblock: a regularization method for convolutional networks. In: Advances in Neural Information Processing Systems, pp 10727–10737

10.

Hamdoun O, Moutarde F, Stanciulescu B, Steux B (2008) Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences. In: 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras, pp 1–6

11.

Held D, Thrun S, Savarese S (2016) Learning to track at 100 fps with deep regression networks. In: European Conference on Computer Vision, Springer, Berlin, pp 749–765

12.

Henriques JF, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596CrossRef

13.

Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. arXiv:170307737

14.

Kuma R, Weill E, Aghdasi F, Sriram P (2019) Vehicle re-identification: an efficient baseline using triplet embedding. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–9

15.

Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4282–4291

16.

Li W, Zhu X, Gong S (2018) Harmonious attention network for person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2285–2294

17.

Liao Hu, Zhu Li (2014) Person re-identification by local maximal occurrence representation and metric learning. Comput Sci 37(9):1834–1848

18.

Liu C, Huynh DQ, Reynolds M (2019) Urban area vehicle re-identification with self-attention stair feature fusion and temporal bayesian re-ranking. In: 2019 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8

19.

Liu H, Tian Y, Yang Y, Pang L, Huang T (2016a) Deep relative distance learning: tell the difference between similar vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2167–2175

20.

Liu X, Liu W, Mei T, Ma H (2016b) A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: European conference on computer vision, Springer, pp 869–884

21.

Liu X, Wu L, Ma H, Fu H (2016c) Large-scale vehicle re-identification in urban surveillance videos. In: IEEE International Conference on Multimedia and Expo (ICME) 2016

22.

Liu X, Zhang S, Huang Q, Wen G (2018) Ram: a region-aware deep model for vehicle re-identification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME)

23.

Lv J, Chen W, Li Q, Yang C (2018) Unsupervised cross-dataset person re-identification by transfer learning of spatial-temporal patterns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7948–7956

24.

Ma X, Zhu K, Guo H, Wang J, Huang M, Miao Q (2019) Vehicle re-identification with refined part model. In: 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), IEEE, pp 603–606

25.

Mallot HA, Bülthoff HH, Little J, Bohrer S (1991) Inverse perspective mapping simplifies optical flow computation and obstacle detection. Biol Cybern 64(3):177–185CrossRefMATH

26.

Meng D, Li L, Liu X, Li Y, Yang S, Zha ZJ, Gao X, Wang S, Huang Q (2020) Parsing-based view-aware embedding network for vehicle re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7103–7112

27.

Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4293–4302

28.

Shen F, Zhu J, Zhu X, Xie Y, Huang J (2021) Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification. IEEE Transactions on Intelligent Transportation Systems

29.

Shen Y, Xiao T, Li H, Yi S, Wang X (2017) Learning deep neural networks for vehicle re-id with visual-spatio-temporal path proposals. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1900–1909

30.

Shen Y, Li H, Xiao T, Yi S, Chen D, Wang X (2018) Deep group-shuffling random walk for person re-identification. arXiv:1807.11178

31.

Si J, Zhang H, Li CG, Kuen J, Kong X, Kot AC, Wang G (2018) Dual attention matching network for context-aware feature sequence based person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5363–5372

32.

Tao R, Gavves E, Smeulders AW (2016) Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1420–1429

33.

Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attribute-guided feature learning network for vehicle re-identification. arXiv:2001.03872

34.

Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4854–4863

35.

Wang Z, Tang L, Liu X, Yao Z, Wang X (2017) Orientation invariant feature embedding and spatial temporal regularization for vehicle re-identification. In: 2017 IEEE International Conference on Computer Vision (ICCV)

36.

Wu L, Wang Y, Li X, Gao J (2017) What-and-where to match: Deep spatially multiplicative integration networks for person re-identification. Pattern Recognition pS003132031730400

37.

Wu L, Hong R, Wang Y, Wang M (2019) Cross-entropy adversarial view adaptation for person re-identification. IEEE Transactions on Circuits and Systems for Video Technology 30(7):2081–2092

38.

Wu L, Wang Y, Shao L, Wang M (2018b) 3-d personvlad: learning deep global representations for video-based person reidentification. IEEE Transactions on Neural Networks and Learning Systems, pp 1–13

39.

Wu L, Wang Y, Shao L, Wang M (2019) 3-d personvlad: Learning deep global representations for video-based person reidentification. IEEE transactions on neural networks and learning systems 30(11):3347–3359CrossRef

40.

Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848CrossRef

41.

Yang S, Lin W, Yan J, Xu M, Wang J (2015) Person re-identification with correspondence structure learning. In: 2015 IEEE International Conference on Computer Vision (ICCV)

42.

Zapletal D, Herout A (2016) Vehicle re-identification for automatic video traffic surveillance. pp 1568–1574. https://doi.org/10.1109/CVPRW.2016.195

43.

Zhang M, Xing J, Gao J, Hu W (2015a) Robust visual tracking using joint scale-spatial correlation filters. In: 2015 IEEE International Conference on Image Processing (ICIP), IEEE, pp 1468–1472

44.

Zhang M, Xing J, Gao J, Shi X, Wang Q, Hu W (2015b) Joint scale-spatial correlation tracking with adaptive rotation estimation. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 32–40

45.

Zheng A, Lin X, Li C, He R, Tang J (2019) Attributes guided feature learning for vehicle re-identification. arXiv:190508997

46.

Zheng WS, Li X, Xiang T, Liao S, Lai J, Gong S (2015) Partial person re-identification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4678–4686

47.

Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 101–117

Title: Complete solution for vehicle Re-ID in surround-view camera system
Authors: Zizhang Wu
Tianhao Xu
Fan Wang
Xiaoquan Wang
Jing Song
Publication date: 10-12-2022
Publisher: Springer Berlin Heidelberg
Published in: International Journal of Machine Learning and Cybernetics / Issue 5/2023
Print ISSN: 1868-8071
Electronic ISSN: 1868-808X
DOI: https://doi.org/10.1007/s13042-022-01724-2

	\(\alpha\) = 1, \(\beta\) = 1	\(\alpha\) = 2, \(\beta\) = 1	\(\alpha\) = 1, \(\beta\) = 2
\(IC_{front}\)	0.94	0.95	0.90
\(IC_{left}\)	0.96	0.94	0.96
\(IC_{right}\)	0.97	0.97	0.93

Springer Professional

Abstract

Publisher's Note

1 Introduction

2 Related work and our contributions

3 Vehicle Re-ID in surround-view camera system

3.1 Vehicle Re-ID in single camera

3.1.1 Quality Evaluation Mechanism

3.1.2 Framework of vehicle Re-ID in single camera

3.2 Vehicle Re-ID in multi-camera

3.2.1 Attention Module

3.2.2 Spatial Constraint Strategy

3.2.3 Framework of vehicle Re-ID in multi-camera

3.3 Overall framework

4 Experiments

4.1 Dataset and evaluation metric

4.2 Implementation details

4.3 Evaluation of the proposed method

4.4 Evaluation of positioning error using in spatial constraint strategy

5 Conclusions

Publisher's Note

Our product recommendations

ATZelectronics worldwide

ATZelektronik

Other articles of this Issue 5/2023

Few-shot learning based on enhanced pseudo-labels and graded pseudo-labeled data selection

Heterogeneous dual network with feature consistency for domain adaptation person re-identification

A novel consensus model with probabilistic linguistic preference relation for the utilization mode selection of renewable energy sources

Contrastive embedding-based feature generation for generalized zero-shot learning

Multi-label sequence generating model via label semantic attention mechanism

GMNet: an action recognition network with global motion representation