nach oben

International Journal of Computer Vision

Erschienen in:

02.06.2023

Semantic-Aware Visual Decomposition for Image Coding

verfasst von: Jianhui Chang, Jian Zhang, Jiguo Li, Shiqi Wang, Qi Mao, Chuanmin Jia, Siwei Ma, Wen Gao

Erschienen in: International Journal of Computer Vision | Ausgabe 9/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we propose a novel image coding framework with semantic-aware visual decomposition towards extremely low bitrate compression. In particular, an input image is analyzed into a semantic map as structural representation and semantic-wise texture representation and further compressed into bitstreams at the encoder side. On the decoder side, the received bitstreams of dual-layer representations are decoded and reconstructed for target image synthesis with generative models. Moreover, the attention mechanism is introduced into the model architecture for texture representation modeling and a coherency regularization is proposed to further optimize the texture representation space by aligning the representation space with the source pixel space for higher synthesis quality. Besides, we also propose a cross-channel entropy module and control the quantization scale to facilitate rate-distortion optimization. Upon compressing the decomposed components into the bitstream, the simple yet effective representation philosophy benefits image compression in many aspects. First, in terms of compression performance, compact representations, and high visual synthesis quality can bring remarkable advantages. Second, the proposed framework yields a physically explainable bitstream composed of the structural segment and semantic-wise texture segments. Third and most importantly, subsequent vision tasks (e.g., content manipulation) can receive fundamental support from the semantic-aware visual decomposition and synthesis mechanism. Extensive experimental results demonstrate the superiority of the proposed framework towards efficient visual representation learning, high efficiency image compression (\(<0.1\) bpp), and intelligent visual applications (e.g., manipulation and analysis).

Vorheriger Artikel On Making SIFT Features Affine Covariant

Nächster Artikel Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nur mit Berechtigung zugänglich

For reproducible research, the source codes of our method will be made public when this paper is accepted.

https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM.

Agustsson, E., Tschannen, M., & Mentzer, F., et al. (2019). Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 221–231).

Akbari, M., Liang, J., & Han, J. (2019). DSSLIC: Deep semantic segmentation-based layered image compression. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2042–2046).

Aujol, J. F., Gilboa, G., Chan, T., et al. (2006). Structure–texture image decomposition: Modeling, algorithms, and parameter selection. International Journal of Computer Vision, 67(1), 111–136.CrossRefMATH

Ballé, J., Chou, P. A., Minnen, D., et al. (2020). Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing, 15(2), 339–353.CrossRef

Ballé, J., Laparra, V., & Simoncelli, E. (2017). End-to-end optimized image compression. In Proceedings of international conference on learning representations (ICLR).

Ballé, J., Minnen, D., & Singh, S., et al. (2018). Variational image compression with a scale hyperprior. In Proceedings of international conference on learning representations (ICLR).

Benesty, J., Chen, J., & Huang, Y., et al. (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1–4). Springer.

Bjontegaard, G. (2001). Calculation of average PSNR differences between RD-curves. ITU-T VCEG-M33, Austin, TX, USA.

Bross, B., Wang, Y. K., Ye, Y., et al. (2021). Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10), 3736–3764.CrossRef

Bross, B., Wieckowski, A., & Schwarz, H., et al. (2016). Suggested process to select the benchmark set. In Document JVET-J0094 10th JVET meeting.

Casaca, W., Paiva, A., Gomez-Nieto, E., et al. (2013). Spectral image segmentation using image decomposition and inner product-based metric. Journal of Mathematical Imaging and Vision, 45(3), 227–238.MathSciNetCrossRef

Chang, J., Mao, Q., & Zhao, Z., et al. (2019). Layered conceptual image compression via deep semantic synthesis. In IEEE international conference on image processing (ICIP) (pp. 694–698).

Chang, J., Zhao, Z., Jia, C., et al. (2022). Conceptual compression via deep structure and texture synthesis. IEEE Transactions on Image Processing, 31, 2809–2823.CrossRef

Chang, J., Zhao, Z., & Yang, L., et al. (2021). Thousand to one: Semantic prior modeling for conceptual coding. In 2021 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.

Cheng, B., Schwing, A., & Kirillov, A. (2021). Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems (NeurIPS), 34, 17,864-17,875.

Cheng, Z., Sun, H., & Takeuchi, M., et al. (2020). Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7939–7948).

Choi, Y., El-Khamy, M., & Lee, J. (2019). Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF international conference on computer vision (CVPR) (pp. 3146–3154).

Cordts, M., Omran, M., & Ramos, S., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

Ding, K., Ma, K., Wang, S., et al. (2022). Image quality assessment: Unifying structure and texture similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5), 2567–2581.

Dong, X., Zhou, H., & Dong, J. (2020). Texture classification using pair-wise difference pooling-based bilinear convolutional neural networks. IEEE Transactions on Image Processing, 29, 8776–8790.CrossRefMATH

Gregor, K., Besse, F., & Rezende, D. J., et al. (2016). Towards conceptual compression. In Advances in neural information processing systems (NeurIPS) (pp. 3549–3557).

Gu, S., Meng, D., & Zuo, W., et al. (2017). Joint convolutional analysis and synthesis sparse representation for single image layer separation. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 1708–1716).

Guo, C., Zhu, S. C., & Wu, Y. N. (2007). Primal sketch: Integrating structure and texture. Computer Vision and Image Understanding, 106(1), 5–19.CrossRef

Hoang, T. M., Zhou, J., & Fan, Y. (2020). Image compression with encoder–decoder matched semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 619–623).

Iwai, S., Miyazaki, T., & Sugaya, Y., et al. (2020). Fidelity-controllable extreme image compression with generative adversarial networks. In ICPR (pp. 8235–8242). IEEE.

Jeon, J., Cho, S., & Tong, X., et al. (2014). Intrinsic image decomposition using structure-texture separation and surface normals. In European conference on computer vision (ECCV) (pp. 218–233). Springer.

Jia, C., Ge, Z., & Wang, S., et al. (2021). Rate distortion characteristic modeling for neural image compression. arXiv preprint arXiv:2106.12954.

Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In Proceedings of European conference on computer vision (ECCV). Springer.

Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4401–4410).

Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1867–1874).

Khosla, P., Teterwak, P., Wang, C., et al. (2020). Supervised contrastive learning. Advances in Neural Information Processing Systems (NeurIPS), 33, 18661–18673.

Kim, Y., Ham, B., Do, M. N., et al. (2018). Structure–texture image decomposition using deep variational priors. IEEE Transactions on Image Processing, 28(6), 2692–2704.MathSciNetCrossRefMATH

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of international conference on learning representations (ICLR).

Lee, C. H., Liu, Z., & Wu, L., et al. (2020). Maskgan: Towards diverse and interactive facial image manipulation. In IEEE conference on computer vision and pattern recognition (CVPR).

Lee, J., Cho, S., & Beack, S. K. (2018). Context-adaptive entropy model for end-to-end optimized image compression. In Proceedings of international conference on learning representations (ICLR).

Li, J., Jia, C., & Zhang, X., et al. (2021a). Cross modal compression: Towards human-comprehensible semantic compression. In Proceedings of the 29th ACM international conference on multimedia (pp. 4230–4238).

Li, X., Shi, J., & Chen, Z. (2021b). Task-driven semantic coding via reinforcement learning. arXiv preprint arXiv:2106.03511.

Li, Y., Jia, C., & Wang, S., et al. (2018). Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images. In 2018 IEEE fourth international conference on multimedia big data (BigMM) (pp. 1–5). IEEE.

Li, Y., Wang, S., & Zhang, X., et al. (2021c). Quality assessment of end-to-end learned image compression: The benchmark and objective measure. In Proceedings of the 29th ACM international conference on multimedia (pp. 4297–4305).

Liu, D., Li, Y., Lin, J., et al. (2020). Deep learning-based video coding: A review and a case study. ACM Computing Surveys (CSUR), 53(1), 1–35.CrossRef

Livingstone, S. R., & Russo, F. A. (2018). The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196,391.CrossRef

Luo, S., Yang, Y., & Yin, Y., et al. (2018). DeepSIC: Deep semantic image compression. In International conference on neural information processing (NeurIPS) (pp. 96–106). Springer.

Ma, H., Liu, D., Yan, N., et al. (2020). End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1247–1263.CrossRef

Ma, S., Zhang, X., Jia, C., et al. (2019). Image and video compression with neural networks: A review. IEEE Transactions on Circuits and Systems for Video Technology, 30(6), 1683–1698.CrossRef

Mao, S., Rajan, D., & Chia, L. T. (2021). Deep residual pooling network for texture recognition. Pattern Recognition, 112(107), 817.

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information (Vol. 1(2)). Freeman and Company.

Mentzer, F., Toderici, G. D., & Tschannen, M., et al. (2020). High-fidelity generative image compression. In Proceedings of advances in neural information processing systems (NeurIPS).

Minnen, D., Ballé, J., & Toderici, G. D. (2018). Joint autoregressive and hierarchical priors for learned image compression. In Advances in neural information processing systems (NeurIPS) (pp. 10,771–10,780).

Park, T., Liu, M. Y., & Wang, T. C., et al. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

Park, T., Zhu, J. Y., & Wang, O., et al. (2020). Swapping autoencoder for deep image manipulation. In Advances in neural information processing systems (NeurIPS).

Paszke, A., Gross, S., Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems (NeurIPS), 32, 8026–8037.

Pennebaker, W. B., & Mitchell, J. L. (1992). JPEG: Still image data compression standard. Springer.

Rabbani, M. (2002). JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, 11(2), 286.MathSciNetCrossRef

Schwarz, H., Rudat, C., & Siekmann, M., et al. (2016). Coding efficiency/complexity analysis of jem 1.0 coding tools for the random access configuration. In Document JVET-B0044 3rd 2nd JVET meeting.

Shaham, T. R., Dekel, T., & Michaeli, T. (2019). SinGAN: Learning a generative model from a single natural image. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 4570–4580).

Sneyers, J., & Wuille, P. (2016). FLIF: Free lossless image format based on MANIAC compression. In 2016 IEEE international conference on image processing (ICIP) (pp. 66–70). IEEE.

Sun, S., He, T., & Chen, Z. (2021). Semantic structured image coding framework for multiple intelligent applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(9), 3631–3642.CrossRef

Sun, Z., Tan, Z., & Sun, X., et al. (2021b). Interpolation variable rate image compression. In Proceedings of the 29th ACM international conference on multimedia (pp. 5574–5582).

Sze, V., Budagavi, M., & Sullivan, G. J. (2014). High efficiency video coding (HEVC). Integrated Circuit and Systems, Algorithms and Architectures Springer, 39, 40.

Wang, S., Wang, S., Yang, W., et al. (2021). Towards analysis-friendly face representation with scalable feature and texture compression. IEEE Transactions on Multimedia, 24, 3169–3181.CrossRef

Wang, T. C., Liu, M. Y., & Zhu, J. Y., et al. (2018a). High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8798–8807).

Wang, X., Girshick, R., & Gupta, A., et al. (2018b). Non-local neural networks. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 7794–7803).

Wang, Y., Liu, D., Ma, S., et al. (2020). Ensemble learning-based rate-distortion optimization for end-to-end image compression. IEEE Transactions on Circuits and Systems for Video Technology, 31(3), 1193–1207.CrossRef

Xia, Q., Liu, H., & Ma, Z. (2020). Object-based image coding: A learning-driven revisit. In 2020 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.

Yan, N., Liu, D., & Li, H., et al. (2020). Towards semantically scalable image coding using semantic map. In 2020 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision (ECCV) (pp. 818–833). Springer.

Zhang, H., Zhang, Z., & Odena, A., et al. (2020). Consistency regularization for generative adversarial networks. In Proceedings of international conference on learning representations (ICLR).

Zhang, P., Wang, S., & Wang, M., et al. (2023). Rethinking semantic image compression: Scalable representation with cross-modality transfer. IEEE Transactions on Circuits and Systems for Video Technology.

Zhang, R., Isola, P., & Efros, A. A., et al. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 586–595).

Zhao, Z., Jia, C., & Wang, S., et al. (2021). Learned image compression using adaptive block-wise encoding and reconstruction network. In 2021 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.

Zhou, B., Zhao, H., & Puig, X., et al. (2017). Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

Zhu, H., Wu, W., & Zhu, W., et al. (2022a). Celebv-hq: A large-scale video facial attributes dataset. In European conference on computer vision (pp. 650–667). Springer.

Zhu, L., Yang, W., Chen, B., et al. (2022). Enlightening low-light images with dynamic guidance for context enrichment. IEEE Transactions on Circuits and Systems for Video Technology, 32, 5068–5079.CrossRef

Zhu, P., Abdal, R., & Qin, Y., et al. (2020). Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).

Zhu, W., Ding, W., Xu, J., et al. (2014). Screen content coding based on HEVC framework. IEEE Transactions on Multimedia, 16(5), 1316–1326.

Titel: Semantic-Aware Visual Decomposition for Image Coding
verfasst von: Jianhui Chang
Jian Zhang
Jiguo Li
Shiqi Wang
Qi Mao
Chuanmin Jia
Siwei Ma
Wen Gao
Publikationsdatum: 02.06.2023
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 9/2023
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-023-01809-7

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 9/2023

Semi-Supervised Domain Generalization with Stochastic StyleMatch

Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence

On Making SIFT Features Affine Covariant

Blur Invariants for Image Recognition

Learning to Remove Shadows from a Single Image

Towards Fine-Grained Optimal 3D Face Dense Registration: An Iterative Dividing and Diffusing Method

Premium Partner