Skip to main content
Erschienen in: International Journal of Computer Vision 9/2023

02.06.2023

Semantic-Aware Visual Decomposition for Image Coding

verfasst von: Jianhui Chang, Jian Zhang, Jiguo Li, Shiqi Wang, Qi Mao, Chuanmin Jia, Siwei Ma, Wen Gao

Erschienen in: International Journal of Computer Vision | Ausgabe 9/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we propose a novel image coding framework with semantic-aware visual decomposition towards extremely low bitrate compression. In particular, an input image is analyzed into a semantic map as structural representation and semantic-wise texture representation and further compressed into bitstreams at the encoder side. On the decoder side, the received bitstreams of dual-layer representations are decoded and reconstructed for target image synthesis with generative models. Moreover, the attention mechanism is introduced into the model architecture for texture representation modeling and a coherency regularization is proposed to further optimize the texture representation space by aligning the representation space with the source pixel space for higher synthesis quality. Besides, we also propose a cross-channel entropy module and control the quantization scale to facilitate rate-distortion optimization. Upon compressing the decomposed components into the bitstream, the simple yet effective representation philosophy benefits image compression in many aspects. First, in terms of compression performance, compact representations, and high visual synthesis quality can bring remarkable advantages. Second, the proposed framework yields a physically explainable bitstream composed of the structural segment and semantic-wise texture segments. Third and most importantly, subsequent vision tasks (e.g., content manipulation) can receive fundamental support from the semantic-aware visual decomposition and synthesis mechanism. Extensive experimental results demonstrate the superiority of the proposed framework towards efficient visual representation learning, high efficiency image compression (\(<0.1\) bpp), and intelligent visual applications (e.g., manipulation and analysis).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
For reproducible research, the source codes of our method will be made public when this paper is accepted.
 
Literatur
Zurück zum Zitat Agustsson, E., Tschannen, M., & Mentzer, F., et al. (2019). Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 221–231). Agustsson, E., Tschannen, M., & Mentzer, F., et al. (2019). Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 221–231).
Zurück zum Zitat Akbari, M., Liang, J., & Han, J. (2019). DSSLIC: Deep semantic segmentation-based layered image compression. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2042–2046). Akbari, M., Liang, J., & Han, J. (2019). DSSLIC: Deep semantic segmentation-based layered image compression. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2042–2046).
Zurück zum Zitat Aujol, J. F., Gilboa, G., Chan, T., et al. (2006). Structure–texture image decomposition: Modeling, algorithms, and parameter selection. International Journal of Computer Vision, 67(1), 111–136.CrossRefMATH Aujol, J. F., Gilboa, G., Chan, T., et al. (2006). Structure–texture image decomposition: Modeling, algorithms, and parameter selection. International Journal of Computer Vision, 67(1), 111–136.CrossRefMATH
Zurück zum Zitat Ballé, J., Chou, P. A., Minnen, D., et al. (2020). Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing, 15(2), 339–353.CrossRef Ballé, J., Chou, P. A., Minnen, D., et al. (2020). Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing, 15(2), 339–353.CrossRef
Zurück zum Zitat Ballé, J., Laparra, V., & Simoncelli, E. (2017). End-to-end optimized image compression. In Proceedings of international conference on learning representations (ICLR). Ballé, J., Laparra, V., & Simoncelli, E. (2017). End-to-end optimized image compression. In Proceedings of international conference on learning representations (ICLR).
Zurück zum Zitat Ballé, J., Minnen, D., & Singh, S., et al. (2018). Variational image compression with a scale hyperprior. In Proceedings of international conference on learning representations (ICLR). Ballé, J., Minnen, D., & Singh, S., et al. (2018). Variational image compression with a scale hyperprior. In Proceedings of international conference on learning representations (ICLR).
Zurück zum Zitat Benesty, J., Chen, J., & Huang, Y., et al. (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1–4). Springer. Benesty, J., Chen, J., & Huang, Y., et al. (2009). Pearson correlation coefficient. In Noise reduction in speech processing (pp. 1–4). Springer.
Zurück zum Zitat Bjontegaard, G. (2001). Calculation of average PSNR differences between RD-curves. ITU-T VCEG-M33, Austin, TX, USA. Bjontegaard, G. (2001). Calculation of average PSNR differences between RD-curves. ITU-T VCEG-M33, Austin, TX, USA.
Zurück zum Zitat Bross, B., Wang, Y. K., Ye, Y., et al. (2021). Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10), 3736–3764.CrossRef Bross, B., Wang, Y. K., Ye, Y., et al. (2021). Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10), 3736–3764.CrossRef
Zurück zum Zitat Bross, B., Wieckowski, A., & Schwarz, H., et al. (2016). Suggested process to select the benchmark set. In Document JVET-J0094 10th JVET meeting. Bross, B., Wieckowski, A., & Schwarz, H., et al. (2016). Suggested process to select the benchmark set. In Document JVET-J0094 10th JVET meeting.
Zurück zum Zitat Casaca, W., Paiva, A., Gomez-Nieto, E., et al. (2013). Spectral image segmentation using image decomposition and inner product-based metric. Journal of Mathematical Imaging and Vision, 45(3), 227–238.MathSciNetCrossRef Casaca, W., Paiva, A., Gomez-Nieto, E., et al. (2013). Spectral image segmentation using image decomposition and inner product-based metric. Journal of Mathematical Imaging and Vision, 45(3), 227–238.MathSciNetCrossRef
Zurück zum Zitat Chang, J., Mao, Q., & Zhao, Z., et al. (2019). Layered conceptual image compression via deep semantic synthesis. In IEEE international conference on image processing (ICIP) (pp. 694–698). Chang, J., Mao, Q., & Zhao, Z., et al. (2019). Layered conceptual image compression via deep semantic synthesis. In IEEE international conference on image processing (ICIP) (pp. 694–698).
Zurück zum Zitat Chang, J., Zhao, Z., Jia, C., et al. (2022). Conceptual compression via deep structure and texture synthesis. IEEE Transactions on Image Processing, 31, 2809–2823.CrossRef Chang, J., Zhao, Z., Jia, C., et al. (2022). Conceptual compression via deep structure and texture synthesis. IEEE Transactions on Image Processing, 31, 2809–2823.CrossRef
Zurück zum Zitat Chang, J., Zhao, Z., & Yang, L., et al. (2021). Thousand to one: Semantic prior modeling for conceptual coding. In 2021 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE. Chang, J., Zhao, Z., & Yang, L., et al. (2021). Thousand to one: Semantic prior modeling for conceptual coding. In 2021 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
Zurück zum Zitat Cheng, B., Schwing, A., & Kirillov, A. (2021). Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems (NeurIPS), 34, 17,864-17,875. Cheng, B., Schwing, A., & Kirillov, A. (2021). Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems (NeurIPS), 34, 17,864-17,875.
Zurück zum Zitat Cheng, Z., Sun, H., & Takeuchi, M., et al. (2020). Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7939–7948). Cheng, Z., Sun, H., & Takeuchi, M., et al. (2020). Learned image compression with discretized Gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7939–7948).
Zurück zum Zitat Choi, Y., El-Khamy, M., & Lee, J. (2019). Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF international conference on computer vision (CVPR) (pp. 3146–3154). Choi, Y., El-Khamy, M., & Lee, J. (2019). Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF international conference on computer vision (CVPR) (pp. 3146–3154).
Zurück zum Zitat Cordts, M., Omran, M., & Ramos, S., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Cordts, M., Omran, M., & Ramos, S., et al. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Ding, K., Ma, K., Wang, S., et al. (2022). Image quality assessment: Unifying structure and texture similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5), 2567–2581. Ding, K., Ma, K., Wang, S., et al. (2022). Image quality assessment: Unifying structure and texture similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5), 2567–2581.
Zurück zum Zitat Dong, X., Zhou, H., & Dong, J. (2020). Texture classification using pair-wise difference pooling-based bilinear convolutional neural networks. IEEE Transactions on Image Processing, 29, 8776–8790.CrossRefMATH Dong, X., Zhou, H., & Dong, J. (2020). Texture classification using pair-wise difference pooling-based bilinear convolutional neural networks. IEEE Transactions on Image Processing, 29, 8776–8790.CrossRefMATH
Zurück zum Zitat Gregor, K., Besse, F., & Rezende, D. J., et al. (2016). Towards conceptual compression. In Advances in neural information processing systems (NeurIPS) (pp. 3549–3557). Gregor, K., Besse, F., & Rezende, D. J., et al. (2016). Towards conceptual compression. In Advances in neural information processing systems (NeurIPS) (pp. 3549–3557).
Zurück zum Zitat Gu, S., Meng, D., & Zuo, W., et al. (2017). Joint convolutional analysis and synthesis sparse representation for single image layer separation. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 1708–1716). Gu, S., Meng, D., & Zuo, W., et al. (2017). Joint convolutional analysis and synthesis sparse representation for single image layer separation. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 1708–1716).
Zurück zum Zitat Guo, C., Zhu, S. C., & Wu, Y. N. (2007). Primal sketch: Integrating structure and texture. Computer Vision and Image Understanding, 106(1), 5–19.CrossRef Guo, C., Zhu, S. C., & Wu, Y. N. (2007). Primal sketch: Integrating structure and texture. Computer Vision and Image Understanding, 106(1), 5–19.CrossRef
Zurück zum Zitat Hoang, T. M., Zhou, J., & Fan, Y. (2020). Image compression with encoder–decoder matched semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 619–623). Hoang, T. M., Zhou, J., & Fan, Y. (2020). Image compression with encoder–decoder matched semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 619–623).
Zurück zum Zitat Iwai, S., Miyazaki, T., & Sugaya, Y., et al. (2020). Fidelity-controllable extreme image compression with generative adversarial networks. In ICPR (pp. 8235–8242). IEEE. Iwai, S., Miyazaki, T., & Sugaya, Y., et al. (2020). Fidelity-controllable extreme image compression with generative adversarial networks. In ICPR (pp. 8235–8242). IEEE.
Zurück zum Zitat Jeon, J., Cho, S., & Tong, X., et al. (2014). Intrinsic image decomposition using structure-texture separation and surface normals. In European conference on computer vision (ECCV) (pp. 218–233). Springer. Jeon, J., Cho, S., & Tong, X., et al. (2014). Intrinsic image decomposition using structure-texture separation and surface normals. In European conference on computer vision (ECCV) (pp. 218–233). Springer.
Zurück zum Zitat Jia, C., Ge, Z., & Wang, S., et al. (2021). Rate distortion characteristic modeling for neural image compression. arXiv preprint arXiv:2106.12954. Jia, C., Ge, Z., & Wang, S., et al. (2021). Rate distortion characteristic modeling for neural image compression. arXiv preprint arXiv:​2106.​12954.
Zurück zum Zitat Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In Proceedings of European conference on computer vision (ECCV). Springer. Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In Proceedings of European conference on computer vision (ECCV). Springer.
Zurück zum Zitat Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4401–4410). Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4401–4410).
Zurück zum Zitat Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1867–1874). Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1867–1874).
Zurück zum Zitat Khosla, P., Teterwak, P., Wang, C., et al. (2020). Supervised contrastive learning. Advances in Neural Information Processing Systems (NeurIPS), 33, 18661–18673. Khosla, P., Teterwak, P., Wang, C., et al. (2020). Supervised contrastive learning. Advances in Neural Information Processing Systems (NeurIPS), 33, 18661–18673.
Zurück zum Zitat Kim, Y., Ham, B., Do, M. N., et al. (2018). Structure–texture image decomposition using deep variational priors. IEEE Transactions on Image Processing, 28(6), 2692–2704.MathSciNetCrossRefMATH Kim, Y., Ham, B., Do, M. N., et al. (2018). Structure–texture image decomposition using deep variational priors. IEEE Transactions on Image Processing, 28(6), 2692–2704.MathSciNetCrossRefMATH
Zurück zum Zitat Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of international conference on learning representations (ICLR). Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of international conference on learning representations (ICLR).
Zurück zum Zitat Lee, C. H., Liu, Z., & Wu, L., et al. (2020). Maskgan: Towards diverse and interactive facial image manipulation. In IEEE conference on computer vision and pattern recognition (CVPR). Lee, C. H., Liu, Z., & Wu, L., et al. (2020). Maskgan: Towards diverse and interactive facial image manipulation. In IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Lee, J., Cho, S., & Beack, S. K. (2018). Context-adaptive entropy model for end-to-end optimized image compression. In Proceedings of international conference on learning representations (ICLR). Lee, J., Cho, S., & Beack, S. K. (2018). Context-adaptive entropy model for end-to-end optimized image compression. In Proceedings of international conference on learning representations (ICLR).
Zurück zum Zitat Li, J., Jia, C., & Zhang, X., et al. (2021a). Cross modal compression: Towards human-comprehensible semantic compression. In Proceedings of the 29th ACM international conference on multimedia (pp. 4230–4238). Li, J., Jia, C., & Zhang, X., et al. (2021a). Cross modal compression: Towards human-comprehensible semantic compression. In Proceedings of the 29th ACM international conference on multimedia (pp. 4230–4238).
Zurück zum Zitat Li, Y., Jia, C., & Wang, S., et al. (2018). Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images. In 2018 IEEE fourth international conference on multimedia big data (BigMM) (pp. 1–5). IEEE. Li, Y., Jia, C., & Wang, S., et al. (2018). Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images. In 2018 IEEE fourth international conference on multimedia big data (BigMM) (pp. 1–5). IEEE.
Zurück zum Zitat Li, Y., Wang, S., & Zhang, X., et al. (2021c). Quality assessment of end-to-end learned image compression: The benchmark and objective measure. In Proceedings of the 29th ACM international conference on multimedia (pp. 4297–4305). Li, Y., Wang, S., & Zhang, X., et al. (2021c). Quality assessment of end-to-end learned image compression: The benchmark and objective measure. In Proceedings of the 29th ACM international conference on multimedia (pp. 4297–4305).
Zurück zum Zitat Liu, D., Li, Y., Lin, J., et al. (2020). Deep learning-based video coding: A review and a case study. ACM Computing Surveys (CSUR), 53(1), 1–35.CrossRef Liu, D., Li, Y., Lin, J., et al. (2020). Deep learning-based video coding: A review and a case study. ACM Computing Surveys (CSUR), 53(1), 1–35.CrossRef
Zurück zum Zitat Livingstone, S. R., & Russo, F. A. (2018). The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196,391.CrossRef Livingstone, S. R., & Russo, F. A. (2018). The Ryerson audio-visual database of emotional speech and song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE, 13(5), e0196,391.CrossRef
Zurück zum Zitat Luo, S., Yang, Y., & Yin, Y., et al. (2018). DeepSIC: Deep semantic image compression. In International conference on neural information processing (NeurIPS) (pp. 96–106). Springer. Luo, S., Yang, Y., & Yin, Y., et al. (2018). DeepSIC: Deep semantic image compression. In International conference on neural information processing (NeurIPS) (pp. 96–106). Springer.
Zurück zum Zitat Ma, H., Liu, D., Yan, N., et al. (2020). End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1247–1263.CrossRef Ma, H., Liu, D., Yan, N., et al. (2020). End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 1247–1263.CrossRef
Zurück zum Zitat Ma, S., Zhang, X., Jia, C., et al. (2019). Image and video compression with neural networks: A review. IEEE Transactions on Circuits and Systems for Video Technology, 30(6), 1683–1698.CrossRef Ma, S., Zhang, X., Jia, C., et al. (2019). Image and video compression with neural networks: A review. IEEE Transactions on Circuits and Systems for Video Technology, 30(6), 1683–1698.CrossRef
Zurück zum Zitat Mao, S., Rajan, D., & Chia, L. T. (2021). Deep residual pooling network for texture recognition. Pattern Recognition, 112(107), 817. Mao, S., Rajan, D., & Chia, L. T. (2021). Deep residual pooling network for texture recognition. Pattern Recognition, 112(107), 817.
Zurück zum Zitat Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information (Vol. 1(2)). Freeman and Company. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information (Vol. 1(2)). Freeman and Company.
Zurück zum Zitat Mentzer, F., Toderici, G. D., & Tschannen, M., et al. (2020). High-fidelity generative image compression. In Proceedings of advances in neural information processing systems (NeurIPS). Mentzer, F., Toderici, G. D., & Tschannen, M., et al. (2020). High-fidelity generative image compression. In Proceedings of advances in neural information processing systems (NeurIPS).
Zurück zum Zitat Minnen, D., Ballé, J., & Toderici, G. D. (2018). Joint autoregressive and hierarchical priors for learned image compression. In Advances in neural information processing systems (NeurIPS) (pp. 10,771–10,780). Minnen, D., Ballé, J., & Toderici, G. D. (2018). Joint autoregressive and hierarchical priors for learned image compression. In Advances in neural information processing systems (NeurIPS) (pp. 10,771–10,780).
Zurück zum Zitat Park, T., Liu, M. Y., & Wang, T. C., et al. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Park, T., Liu, M. Y., & Wang, T. C., et al. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Park, T., Zhu, J. Y., & Wang, O., et al. (2020). Swapping autoencoder for deep image manipulation. In Advances in neural information processing systems (NeurIPS). Park, T., Zhu, J. Y., & Wang, O., et al. (2020). Swapping autoencoder for deep image manipulation. In Advances in neural information processing systems (NeurIPS).
Zurück zum Zitat Paszke, A., Gross, S., Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems (NeurIPS), 32, 8026–8037. Paszke, A., Gross, S., Massa, F., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems (NeurIPS), 32, 8026–8037.
Zurück zum Zitat Pennebaker, W. B., & Mitchell, J. L. (1992). JPEG: Still image data compression standard. Springer. Pennebaker, W. B., & Mitchell, J. L. (1992). JPEG: Still image data compression standard. Springer.
Zurück zum Zitat Rabbani, M. (2002). JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, 11(2), 286.MathSciNetCrossRef Rabbani, M. (2002). JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, 11(2), 286.MathSciNetCrossRef
Zurück zum Zitat Schwarz, H., Rudat, C., & Siekmann, M., et al. (2016). Coding efficiency/complexity analysis of jem 1.0 coding tools for the random access configuration. In Document JVET-B0044 3rd 2nd JVET meeting. Schwarz, H., Rudat, C., & Siekmann, M., et al. (2016). Coding efficiency/complexity analysis of jem 1.0 coding tools for the random access configuration. In Document JVET-B0044 3rd 2nd JVET meeting.
Zurück zum Zitat Shaham, T. R., Dekel, T., & Michaeli, T. (2019). SinGAN: Learning a generative model from a single natural image. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 4570–4580). Shaham, T. R., Dekel, T., & Michaeli, T. (2019). SinGAN: Learning a generative model from a single natural image. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 4570–4580).
Zurück zum Zitat Sneyers, J., & Wuille, P. (2016). FLIF: Free lossless image format based on MANIAC compression. In 2016 IEEE international conference on image processing (ICIP) (pp. 66–70). IEEE. Sneyers, J., & Wuille, P. (2016). FLIF: Free lossless image format based on MANIAC compression. In 2016 IEEE international conference on image processing (ICIP) (pp. 66–70). IEEE.
Zurück zum Zitat Sun, S., He, T., & Chen, Z. (2021). Semantic structured image coding framework for multiple intelligent applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(9), 3631–3642.CrossRef Sun, S., He, T., & Chen, Z. (2021). Semantic structured image coding framework for multiple intelligent applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(9), 3631–3642.CrossRef
Zurück zum Zitat Sun, Z., Tan, Z., & Sun, X., et al. (2021b). Interpolation variable rate image compression. In Proceedings of the 29th ACM international conference on multimedia (pp. 5574–5582). Sun, Z., Tan, Z., & Sun, X., et al. (2021b). Interpolation variable rate image compression. In Proceedings of the 29th ACM international conference on multimedia (pp. 5574–5582).
Zurück zum Zitat Sze, V., Budagavi, M., & Sullivan, G. J. (2014). High efficiency video coding (HEVC). Integrated Circuit and Systems, Algorithms and Architectures Springer, 39, 40. Sze, V., Budagavi, M., & Sullivan, G. J. (2014). High efficiency video coding (HEVC). Integrated Circuit and Systems, Algorithms and Architectures Springer, 39, 40.
Zurück zum Zitat Wang, S., Wang, S., Yang, W., et al. (2021). Towards analysis-friendly face representation with scalable feature and texture compression. IEEE Transactions on Multimedia, 24, 3169–3181.CrossRef Wang, S., Wang, S., Yang, W., et al. (2021). Towards analysis-friendly face representation with scalable feature and texture compression. IEEE Transactions on Multimedia, 24, 3169–3181.CrossRef
Zurück zum Zitat Wang, T. C., Liu, M. Y., & Zhu, J. Y., et al. (2018a). High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8798–8807). Wang, T. C., Liu, M. Y., & Zhu, J. Y., et al. (2018a). High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 8798–8807).
Zurück zum Zitat Wang, X., Girshick, R., & Gupta, A., et al. (2018b). Non-local neural networks. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 7794–7803). Wang, X., Girshick, R., & Gupta, A., et al. (2018b). Non-local neural networks. In Proceedings of the IEEE international conference on computer vision (CVPR) (pp. 7794–7803).
Zurück zum Zitat Wang, Y., Liu, D., Ma, S., et al. (2020). Ensemble learning-based rate-distortion optimization for end-to-end image compression. IEEE Transactions on Circuits and Systems for Video Technology, 31(3), 1193–1207.CrossRef Wang, Y., Liu, D., Ma, S., et al. (2020). Ensemble learning-based rate-distortion optimization for end-to-end image compression. IEEE Transactions on Circuits and Systems for Video Technology, 31(3), 1193–1207.CrossRef
Zurück zum Zitat Xia, Q., Liu, H., & Ma, Z. (2020). Object-based image coding: A learning-driven revisit. In 2020 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE. Xia, Q., Liu, H., & Ma, Z. (2020). Object-based image coding: A learning-driven revisit. In 2020 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.
Zurück zum Zitat Yan, N., Liu, D., & Li, H., et al. (2020). Towards semantically scalable image coding using semantic map. In 2020 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE. Yan, N., Liu, D., & Li, H., et al. (2020). Towards semantically scalable image coding using semantic map. In 2020 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.
Zurück zum Zitat Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision (ECCV) (pp. 818–833). Springer. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision (ECCV) (pp. 818–833). Springer.
Zurück zum Zitat Zhang, H., Zhang, Z., & Odena, A., et al. (2020). Consistency regularization for generative adversarial networks. In Proceedings of international conference on learning representations (ICLR). Zhang, H., Zhang, Z., & Odena, A., et al. (2020). Consistency regularization for generative adversarial networks. In Proceedings of international conference on learning representations (ICLR).
Zurück zum Zitat Zhang, P., Wang, S., & Wang, M., et al. (2023). Rethinking semantic image compression: Scalable representation with cross-modality transfer. IEEE Transactions on Circuits and Systems for Video Technology. Zhang, P., Wang, S., & Wang, M., et al. (2023). Rethinking semantic image compression: Scalable representation with cross-modality transfer. IEEE Transactions on Circuits and Systems for Video Technology.
Zurück zum Zitat Zhang, R., Isola, P., & Efros, A. A., et al. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 586–595). Zhang, R., Isola, P., & Efros, A. A., et al. (2018). The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR) (pp. 586–595).
Zurück zum Zitat Zhao, Z., Jia, C., & Wang, S., et al. (2021). Learned image compression using adaptive block-wise encoding and reconstruction network. In 2021 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE. Zhao, Z., Jia, C., & Wang, S., et al. (2021). Learned image compression using adaptive block-wise encoding and reconstruction network. In 2021 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5). IEEE.
Zurück zum Zitat Zhou, B., Zhao, H., & Puig, X., et al. (2017). Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Zhou, B., Zhao, H., & Puig, X., et al. (2017). Scene parsing through ade20k dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Zhu, H., Wu, W., & Zhu, W., et al. (2022a). Celebv-hq: A large-scale video facial attributes dataset. In European conference on computer vision (pp. 650–667). Springer. Zhu, H., Wu, W., & Zhu, W., et al. (2022a). Celebv-hq: A large-scale video facial attributes dataset. In European conference on computer vision (pp. 650–667). Springer.
Zurück zum Zitat Zhu, L., Yang, W., Chen, B., et al. (2022). Enlightening low-light images with dynamic guidance for context enrichment. IEEE Transactions on Circuits and Systems for Video Technology, 32, 5068–5079.CrossRef Zhu, L., Yang, W., Chen, B., et al. (2022). Enlightening low-light images with dynamic guidance for context enrichment. IEEE Transactions on Circuits and Systems for Video Technology, 32, 5068–5079.CrossRef
Zurück zum Zitat Zhu, P., Abdal, R., & Qin, Y., et al. (2020). Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Zhu, P., Abdal, R., & Qin, Y., et al. (2020). Sean: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Zurück zum Zitat Zhu, W., Ding, W., Xu, J., et al. (2014). Screen content coding based on HEVC framework. IEEE Transactions on Multimedia, 16(5), 1316–1326. Zhu, W., Ding, W., Xu, J., et al. (2014). Screen content coding based on HEVC framework. IEEE Transactions on Multimedia, 16(5), 1316–1326.
Metadaten
Titel
Semantic-Aware Visual Decomposition for Image Coding
verfasst von
Jianhui Chang
Jian Zhang
Jiguo Li
Shiqi Wang
Qi Mao
Chuanmin Jia
Siwei Ma
Wen Gao
Publikationsdatum
02.06.2023
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 9/2023
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-023-01809-7

Weitere Artikel der Ausgabe 9/2023

International Journal of Computer Vision 9/2023 Zur Ausgabe

Premium Partner