1 Introduction
-
In the feature extraction stage, the Hierarchical Attention Fusion Module (HAFM) is proposed to solve the problem of lack of semantic information in the early stages of the encoder-decoder structure. This module utilizes an axis-selective Transformer block with linear complexity to model the spatial relationship between pixels, which adaptively enhances important features in global space. Subsequently, the cross-layer feature interaction mechanism realizes similarity representation and enhancement of features at across different levels.
-
F3N is used to replace the standard feedforward network structure in the Transformer model. F3N is able to match and align sub-token segments corresponding to the same pixel position, and can aggregate information from different patches for the same pixel position. The problem of losing fine-grained local features during the decoding process is solved without adding additional learnable parameters.
-
HPAT introduces a novel Transformer model to capture local and global spatial information effectively. Through extensive experiments and ablation studies, HPAT has demonstrated competitive results on image deblurring, as shown in Fig. 1.
2 Related work
2.1 CNN Based Image Deblurring
2.2 Transformers based image deblurring
2.3 Feedforword Network
3 Hierarchical Patch Aggregation Transformer for Motion Deblurring
3.1 Overall Pipeline
3.2 Hierarchical Attention Fusion Module
3.2.1 Axis-Selective Transformer Block
3.2.2 Cross-Layer Feature Interaction Mechanism
3.3 Fusion Feedforward Network
4 Experience and Results
4.1 Datasets and Experimental Settings
Method | Params(M) | Time(ms) | Year |
---|---|---|---|
DeblurGan-v2[26] | 61 | 42 | 2018 |
DMPHN[56] | 22 | 212 | 2019 |
DBGAN[59] | 12 | 909 | 2020 |
MIMO-UNet[60] | 16 | 22 | 2020 |
DeepRFT[38] | 10 | 242 | 2021 |
MPRNet[40] | 20 | 104 | 2021 |
HINet[41] | 89 | 31 | 2022 |
DGUNet[61] | 18 | – | 2022 |
NAFNet[62] | 68 | 92 | 2022 |
IPT[32] | 114 | – | 2020 |
Uformer-B[46] | 51 | 310 | 2021 |
Restormer[47] | 26 | 798 | 2022 |
MFFDNet[44] | 38 | 123 | 2023 |
Our | 58 | 328 | 2023 |
Method | PSNR\(\uparrow \) | SSIM\(\uparrow \) | Year |
---|---|---|---|
DeblurGan-v2[26] | 29.08 | 0.873 | 2018 |
DMPHN[56] | 30.45 | 0.902 | 2019 |
DBGAN[59] | 31.10 | 0.942 | 2020 |
MIMO-UNet[60] | 32.44 | 0.933 | 2020 |
DeepRFT[38] | 32.82 | 0.938 | 2021 |
MPRNet[40] | 32.66 | 0.936 | 2021 |
HINet[41] | 32.77 | 0.936 | 2022 |
DGUNet[61] | 32.71 | 0.937 | 2022 |
NAFNet[62] | 32.87 | 0.948 | 2022 |
IPT[32] | 32.58 | 0.935 | 2020 |
Uformer-B[46] | 32.97 | 0.942 | 2021 |
Restormer[47] | 32.92 | 0.940 | 2022 |
MFFDNet[44] | 32.87 | 0.959 | 2023 |
Our | 33.43 | 0.961 | 2023 |
4.2 Results and Analysis
4.2.1 Deblurring Results on the GoPro Dataset
4.2.2 Deblurring Results on the HIDE Dataset
Method | PSNR\(\uparrow \) | SSIM\(\uparrow \) | Params | Time | Year |
---|---|---|---|---|---|
DeepDeblur[1] | 25.73 | 0.874 | 12 | 86 | 2017 |
DeblurGan-v2[26] | 27.51 | 0.848 | 61 | 40 | 2018 |
DMPHN[56] | 27.79 | 0.864 | 22 | 217 | 2019 |
DBGAN[59] | 28.97 | 0.913 | 12 | 899 | 2020 |
MIMO-UNet[60] | 29.99 | 0.906 | 16 | 21 | 2020 |
DeepRFT[38] | 30.99 | 0.919 | 10 | 236 | 2021 |
MPRNet[40] | 30.96 | 0.917 | 20 | 98 | 2021 |
HINet[41] | 30.33 | 0.909 | 89 | 24 | 2022 |
DGUNet[61] | 30.96 | 0.918 | 18 | - | 2022 |
Uformer-B[46] | 30.89 | 0.919 | 51 | 307 | 2021 |
Restormer[47] | 31.22 | 0.921 | 26 | 778 | 2022 |
MFFDNet[44] | 30.16 | 0.932 | 38 | 119 | 2023 |
Our | 30.79 | 0.939 | 58 | 314 | 2023 |
4.2.3 Deblurring Results on the RealBlur Dataset
Methods | RealBlur_J | RealBlur_R | Average | Year | |||
---|---|---|---|---|---|---|---|
PSNR\(\uparrow \) | SSIM\(\uparrow \) | PSNR\(\uparrow \) | SSIM\(\uparrow \) | PSNR\(\uparrow \) | SSIM\(\uparrow \) | ||
DeblurGan-v2[26] | 26.68 | 0.815 | 33.41 | 0.928 | 30.05 | 0.872 | 2018 |
DMPHN[56] | 26.75 | 0.825 | 33.21 | 0.936 | 29.98 | 0.881 | 2019 |
DeepRFT[38] | 26.66 | 0.823 | 34.03 | 0.943 | 30.35 | 0.883 | 2021 |
MPRNet[40] | 26.51 | 0.820 | 33.91 | 0.942 | 30.21 | 0.881 | 2021 |
HINet[41] | 26.36 | 0.800 | 33.80 | 0.938 | 30.08 | 0.869 | 2021 |
MSSNet[63] | 26.59 | 0.826 | 33.93 | 0.945 | 30.26 | 0.886 | 2022 |
DGUNet[61] | 26.60 | 0.824 | 33.96 | 0.943 | 30.28 | 0.884 | 2022 |
Uformer-B[46] | 26.65 | 0.828 | 33.85 | 0.943 | 30.25 | 0.886 | 2021 |
Restormer[47] | 26.63 | 0.823 | 33.98 | 0.946 | 30.31 | 0.885 | 2022 |
MFFDNet[44] | 28.42 | 0.864 | 35.71 | 0.949 | 32.07 | 0.910 | 2023 |
Ours | 28.76 | 0.876 | 36.02 | 0.954 | 32.39 | 0.915 | 2023 |
4.3 Ablation Experiments
4.3.1 Hierarchical Attention Fusion Module
4.3.2 Fusion Feedforward Network
Variant | Component | PSNR\(\uparrow \)(dB) |
---|---|---|
Baseline(a) | Model with Resblock(d) | 32.72 |
Encoder-Decoder(b) | LETB(W-MSA+F3N)(e) | 32.83 |
ASTB(CA-MSA+FFN)(f) | 32.78 | |
HAFM(c) | ASTB(CA-MSA+F3N))(g) | 32.82 |
ASTB(CA-MSA+FEN)+CFIM(h) | 32.99 |
Variant | Component | PSNR\(\uparrow \)(dB) |
---|---|---|
Conv | 32.87 | |
HAFM | Conv+Conv+Conv | 32.91 |
ASTB(CA-MSA+FEN)+CFIM | 32.99 |
Number of Patches | Patch size | PSNR\(\uparrow \)(dB) |
---|---|---|
4 | 14 | 32.93 |
9 | 7 | 32.99 |
16 | 5 | 32.97 |
25 | 4 | 32.94 |
Number of Patches | Patch size | PSNR\(\uparrow \)(dB) |
---|---|---|
5 | 32.95 | |
9 | 7 | 32.99 |
10 | 32.98 | |
14 | 32.96 |
Variant | Component | PSNR\(\uparrow \)(dB) |
---|---|---|
Feedforward | LETB(W-MSA+FFN) | 32.87 |
Network | LETB(W-MSA+LEFF) | 32.95 |
LETB(W-MSA+F3N) | 32.99 |