Skip to main content

2020 | OriginalPaper | Buchkapitel

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

verfasst von : Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

Erschienen in: Computer Vision – ECCV 2020

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is \(3.8\times \) parameter-efficient and \(27\times \) computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (2016) Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (2016)
2.
Zurück zum Zitat Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)CrossRef Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)CrossRef
3.
Zurück zum Zitat Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv:​1409.​0473 (2014)
4.
Zurück zum Zitat Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: CVPR (2017) Bai, M., Urtasun, R.: Deep watershed transform for instance segmentation. In: CVPR (2017)
5.
Zurück zum Zitat Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recogn. 3, 111–122 (1981)CrossRef Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recogn. 3, 111–122 (1981)CrossRef
6.
Zurück zum Zitat Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. In: ICCV (2019) Bello, I., Zoph, B., Vaswani, A., Shlens, J., Le, Q.V.: Attention augmented convolutional networks. In: ICCV (2019)
7.
8.
Zurück zum Zitat Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2019) Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2019)
9.
Zurück zum Zitat Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: CVPR (2005) Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: CVPR (2005)
10.
Zurück zum Zitat Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: ICASSP (2016) Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: ICASSP (2016)
11.
Zurück zum Zitat Chen, L.C., et al.: Searching for efficient multi-scale architectures for dense image prediction. In: NeurIPS (2018) Chen, L.C., et al.: Searching for efficient multi-scale architectures for dense image prediction. In: NeurIPS (2018)
12.
Zurück zum Zitat Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015) Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Semantic image segmentation with deep convolutional nets and fully connected CRFs. In: ICLR (2015)
13.
Zurück zum Zitat Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI (2017) Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE TPAMI (2017)
14.
Zurück zum Zitat Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:1706.05587 (2017) Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv:​1706.​05587 (2017)
15.
16.
Zurück zum Zitat Chen, Q., Cheng, A., He, X., Wang, P., Cheng, J.: SpatialFlow: bridging all tasks for panoptic segmentation. arXiv:1910.08787 (2019) Chen, Q., Cheng, A., He, X., Wang, P., Cheng, J.: SpatialFlow: bridging all tasks for panoptic segmentation. arXiv:​1910.​08787 (2019)
17.
Zurück zum Zitat Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: A\(\hat{\,}\) 2-nets: double attention networks. In: NeurIPS (2018) Chen, Y., Kalantidis, Y., Li, J., Yan, S., Feng, J.: A\(\hat{\,}\) 2-nets: double attention networks. In: NeurIPS (2018)
18.
Zurück zum Zitat Cheng, B., et al.: Panoptic-deeplab. In: ICCV COCO + Mapillary Joint Recognition Challenge Workshop (2019) Cheng, B., et al.: Panoptic-deeplab. In: ICCV COCO + Mapillary Joint Recognition Challenge Workshop (2019)
19.
Zurück zum Zitat Cheng, B., et al.: Panoptic-deeplab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR (2020) Cheng, B., et al.: Panoptic-deeplab: a simple, strong, and fast baseline for bottom-up panoptic segmentation. In: CVPR (2020)
20.
Zurück zum Zitat Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR (2017) Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR (2017)
21.
Zurück zum Zitat Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: NeurIPS (2015) Chorowski, J.K., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. In: NeurIPS (2015)
22.
Zurück zum Zitat Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016) Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
23.
Zurück zum Zitat Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017) Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)
24.
Zurück zum Zitat Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q., Salakhutdinov, R.: Transformer-XL: Attentive language models beyond a fixed-length context. In: ACL (2019) Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q., Salakhutdinov, R.: Transformer-XL: Attentive language models beyond a fixed-length context. In: ACL (2019)
25.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:​1810.​04805 (2018)
26.
Zurück zum Zitat Fu, J., et al.: Dual attention network for scene segmentation. In: CVPR (2019) Fu, J., et al.: Dual attention network for scene segmentation. In: CVPR (2019)
27.
Zurück zum Zitat Gao, H., Zhu, X., Lin, S., Dai, J.: Deformable kernels: adapting effective receptive fields for object deformation. arXiv:1910.02940 (2019) Gao, H., Zhu, X., Lin, S., Dai, J.: Deformable kernels: adapting effective receptive fields for object deformation. arXiv:​1910.​02940 (2019)
28.
Zurück zum Zitat Gao, N., et al.: SSAP: single-shot instance segmentation with affinity pyramid. In: ICCV (2019) Gao, N., et al.: SSAP: single-shot instance segmentation with affinity pyramid. In: ICCV (2019)
30.
Zurück zum Zitat He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
31.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
32.
Zurück zum Zitat Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. arXiv:1912.12180 (2019) Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. arXiv:​1912.​12180 (2019)
34.
Zurück zum Zitat Howard, A., et al.: Searching for mobilenetv3. In: ICCV (2019) Howard, A., et al.: Searching for mobilenetv3. In: ICCV (2019)
35.
Zurück zum Zitat Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 (2017) Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:​1704.​04861 (2017)
36.
Zurück zum Zitat Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: CVPR (2018) Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: CVPR (2018)
37.
Zurück zum Zitat Hu, H., Zhang, Z., Xie, Z., Lin, S.: Local relation networks for image recognition. In: ICCV (2019) Hu, H., Zhang, Z., Xie, Z., Lin, S.: Local relation networks for image recognition. In: ICCV (2019)
38.
Zurück zum Zitat Huang, C.A., et al.: Music transformer: Generating music with long-term structure. In: ICLR (2019) Huang, C.A., et al.: Music transformer: Generating music with long-term structure. In: ICLR (2019)
39.
Zurück zum Zitat Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: ICCV (2019) Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: ICCV (2019)
40.
Zurück zum Zitat Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)
41.
Zurück zum Zitat Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: BMVC (2014) Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: BMVC (2014)
42.
Zurück zum Zitat Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR (2018) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR (2018)
43.
Zurück zum Zitat Keuper, M., Levinkov, E., Bonneel, N., Lavoué, G., Brox, T., Andres, B.: Efficient decomposition of image and mesh graphs by lifted multicuts. In: ICCV (2015) Keuper, M., Levinkov, E., Bonneel, N., Lavoué, G., Brox, T., Andres, B.: Efficient decomposition of image and mesh graphs by lifted multicuts. In: ICCV (2015)
44.
Zurück zum Zitat Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: CVPR (2019) Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: CVPR (2019)
45.
Zurück zum Zitat Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: CVPR (2019) Kirillov, A., He, K., Girshick, R., Rother, C., Dollár, P.: Panoptic segmentation. In: CVPR (2019)
46.
Zurück zum Zitat Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NeurIPS (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NeurIPS (2012)
47.
Zurück zum Zitat LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
48.
Zurück zum Zitat Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004) Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)
49.
51.
52.
Zurück zum Zitat Li, Y., Chen, X., Zhu, Z., Xie, L., Huang, G., Du, D., Wang, X.: Attention-guided unified network for panoptic segmentation. In: CVPR (2019) Li, Y., Chen, X., Zhu, Z., Xie, L., Huang, G., Du, D., Wang, X.: Attention-guided unified network for panoptic segmentation. In: CVPR (2019)
53.
Zurück zum Zitat Li, Y., et al.: Neural architecture search for lightweight non-local networks. In: CVPR (2020) Li, Y., et al.: Neural architecture search for lightweight non-local networks. In: CVPR (2020)
54.
Zurück zum Zitat Liang, J., Homayounfar, N., Ma, W.C., Xiong, Y., Hu, R., Urtasun, R.: PolyTransform: deep polygon transformer for instance segmentation. arXiv:1912.02801 (2019) Liang, J., Homayounfar, N., Ma, W.C., Xiong, Y., Hu, R., Urtasun, R.: PolyTransform: deep polygon transformer for instance segmentation. arXiv:​1912.​02801 (2019)
55.
Zurück zum Zitat Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017) Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
57.
Zurück zum Zitat Liu, C., et al.: Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: CVPR (2019) Liu, C., et al.: Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In: CVPR (2019)
58.
Zurück zum Zitat Liu, L., et al.: On the variance of the adaptive learning rate and beyond. In: ICLR (2020) Liu, L., et al.: On the variance of the adaptive learning rate and beyond. In: ICLR (2020)
59.
Zurück zum Zitat Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR (2018) Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR (2018)
61.
Zurück zum Zitat Liu1, H., et al.: An end-to-end network for panoptic segmentation. In: CVPR (2019) Liu1, H., et al.: An end-to-end network for panoptic segmentation. In: CVPR (2019)
62.
Zurück zum Zitat Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: ICCV (2017) Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: ICCV (2017)
63.
Zurück zum Zitat Neven, D., Brabandere, B.D., Proesmans, M., Gool, L.V.: Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In: CVPR (2019) Neven, D., Brabandere, B.D., Proesmans, M., Gool, L.V.: Instance segmentation by jointly optimizing spatial embeddings and clustering bandwidth. In: CVPR (2019)
64.
Zurück zum Zitat Papandreou, G., Kokkinos, I., Savalle, P.A.: Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: CVPR (2015) Papandreou, G., Kokkinos, I., Savalle, P.A.: Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: CVPR (2015)
65.
Zurück zum Zitat Parmar, N., Ramachandran, P., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. In: NeurIPS (2019) Parmar, N., Ramachandran, P., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. In: NeurIPS (2019)
66.
Zurück zum Zitat Parmar, N., et al.: Image transformer. In: ICML (2018) Parmar, N., et al.: Image transformer. In: ICML (2018)
67.
Zurück zum Zitat Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. In: CVPR (2017) Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters-improve semantic segmentation by global convolutional network. In: CVPR (2017)
68.
Zurück zum Zitat Porzi, L., Bulò, S.R., Colovic, A., Kontschieder, P.: Seamless scene segmentation. In: CVPR (2019) Porzi, L., Bulò, S.R., Colovic, A., Kontschieder, P.: Seamless scene segmentation. In: CVPR (2019)
69.
Zurück zum Zitat Qi, H., et al.: Deformable convolutional networks - COCO detection and segmentation challenge 2017 entry. In: ICCV COCO Challenge Workshop (2017) Qi, H., et al.: Deformable convolutional networks - COCO detection and segmentation challenge 2017 entry. In: ICCV COCO Challenge Workshop (2017)
70.
71.
Zurück zum Zitat Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018) Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)
72.
Zurück zum Zitat Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL (2018) Shaw, P., Uszkoreit, J., Vaswani, A.: Self-attention with relative position representations. In: NAACL (2018)
73.
74.
Zurück zum Zitat Shensa, M.J.: The discrete wavelet transform: wedding the a trous and mallat algorithms. IEEE Trans. Signal Process. 40(10), 2464–2482 (1992)CrossRef Shensa, M.J.: The discrete wavelet transform: wedding the a trous and mallat algorithms. IEEE Trans. Signal Process. 40(10), 2464–2482 (1992)CrossRef
75.
Zurück zum Zitat Sifre, L.: Rigid-motion scattering for image classification. Ph.D. thesis (2014) Sifre, L.: Rigid-motion scattering for image classification. Ph.D. thesis (2014)
76.
77.
Zurück zum Zitat Sofiiuk, K., Barinova, O., Konushin, A.: AdaptiS: adaptive instance selection network. In: ICCV (2019) Sofiiuk, K., Barinova, O., Konushin, A.: AdaptiS: adaptive instance selection network. In: ICCV (2019)
78.
Zurück zum Zitat Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR (2016)
79.
Zurück zum Zitat Uhrig, J., Rehder, E., Fröhlich, B., Franke, U., Brox, T.: Box2pix: single-shot instance segmentation by assigning pixels to object boxes. In: IEEE Intelligent Vehicles Symposium (IV) (2018) Uhrig, J., Rehder, E., Fröhlich, B., Franke, U., Brox, T.: Box2pix: single-shot instance segmentation by assigning pixels to object boxes. In: IEEE Intelligent Vehicles Symposium (IV) (2018)
80.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017) Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
81.
Zurück zum Zitat Vincent, L., Soille, P.: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE TPAMI (1991) Vincent, L., Soille, P.: Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE TPAMI (1991)
82.
Zurück zum Zitat Wang, H., Kembhavi, A., Farhadi, A., Yuille, A.L., Rastegari, M.: Elastic: improving CNNs with dynamic scaling policies. In: CVPR (2019) Wang, H., Kembhavi, A., Farhadi, A., Yuille, A.L., Rastegari, M.: Elastic: improving CNNs with dynamic scaling policies. In: CVPR (2019)
84.
Zurück zum Zitat Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018) Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR (2018)
85.
Zurück zum Zitat Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:1609.08144 (2016) Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv:​1609.​08144 (2016)
86.
Zurück zum Zitat Xie, C., Wu, Y., Maaten, L.v.d., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: CVPR (2019) Xie, C., Wu, Y., Maaten, L.v.d., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: CVPR (2019)
87.
Zurück zum Zitat Xiong, Y., et al.: UPSNet: a unified panoptic segmentation network. In: CVPR (2019) Xiong, Y., et al.: UPSNet: a unified panoptic segmentation network. In: CVPR (2019)
88.
Zurück zum Zitat Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: ICML (2015) Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: ICML (2015)
90.
Zurück zum Zitat Yang, Y., Li, H., Li, X., Zhao, Q., Wu, J., Lin, Z.: SOGNet: scene overlap graph network for panoptic segmentation. arXiv:1911.07527 (2019) Yang, Y., Li, H., Li, X., Zhao, Q., Wu, J., Lin, Z.: SOGNet: scene overlap graph network for panoptic segmentation. arXiv:​1911.​07527 (2019)
91.
92.
Zurück zum Zitat Zhang, M., Lucas, J., Ba, J., Hinton, G.E.: Lookahead optimizer: k steps forward, 1 step back. In: NeurIPS (2019) Zhang, M., Lucas, J., Ba, J., Hinton, G.E.: Lookahead optimizer: k steps forward, 1 step back. In: NeurIPS (2019)
93.
Zurück zum Zitat Zhang, R.: Making convolutional networks shift-invariant again. In: ICML (2019) Zhang, R.: Making convolutional networks shift-invariant again. In: ICML (2019)
94.
Zurück zum Zitat Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017) Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: CVPR (2017)
95.
Zurück zum Zitat Zhu, X., Cheng, D., Zhang, Z., Lin, S., Dai, J.: An empirical study of spatial attention mechanisms in deep networks. In: ICCV, pp. 6688–6697 (2019) Zhu, X., Cheng, D., Zhang, Z., Lin, S., Dai, J.: An empirical study of spatial attention mechanisms in deep networks. In: ICCV, pp. 6688–6697 (2019)
96.
Zurück zum Zitat Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets v2: more deformable, better results. In: CVPR (2019) Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets v2: more deformable, better results. In: CVPR (2019)
97.
Zurück zum Zitat Zhu, Y., et al.: Improving semantic segmentation via video propagation and label relaxation. In: CVPR (2019) Zhu, Y., et al.: Improving semantic segmentation via video propagation and label relaxation. In: CVPR (2019)
98.
Zurück zum Zitat Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In: CVPR (2019) Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: Asymmetric non-local neural networks for semantic segmentation. In: CVPR (2019)
99.
Zurück zum Zitat Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: ICLR (2017) Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: ICLR (2017)
Metadaten
Titel
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
verfasst von
Huiyu Wang
Yukun Zhu
Bradley Green
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-58548-8_7

Premium Partner