nach oben

International Journal of Computer Vision

Erschienen in:

03.11.2020

Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild

verfasst von: Xin Chen, Lingxi Xie, Jun Wu, Qi Tian

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

With the rapid development of neural architecture search (NAS), researchers found powerful network architectures for a wide range of vision tasks. Like the manually designed counterparts, we desire the automatically searched architectures to have the ability of being freely transferred to different scenarios. This paper formally puts forward this problem, referred to as NAS in the wild, which explores the possibility of finding the optimal architecture in a proxy dataset and then deploying it to mostly unseen scenarios. We instantiate this setting using a currently popular algorithm named differentiable architecture search (DARTS), which often suffers unsatisfying performance while being transferred across different tasks. We argue that the accuracy drop originates from the formulation that uses a super-network for search but a sub-network for re-training. The different properties of these stages have resulted in a significant optimization gap, and consequently, the architectural parameters “over-fit” the super-network. To alleviate the gap, we present a progressive method that gradually increases the network depth during the search stage, which leads to the Progressive DARTS (P-DARTS) algorithm. With a reduced search cost (7 hours on a single GPU), P-DARTS achieves improved performance on both the proxy dataset (CIFAR10) and a few target problems (ImageNet classification, COCO detection and three ReID benchmarks). Our code is available at https://github.com/chenxin061/pdarts.

Vorheriger Artikel Residual Dual Scale Scene Text Spotting by Fusing Bottom-Up and Top-Down Processing

Nächster Artikel Entrack: Probabilistic Spherical Regression with Entropy Regularization for Fiber Tractography

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

We also tried to start with architectural parameters learned from the previous stage, \({\mathfrak {S}}_{k-1}\), and adjust them according to Eq. 1 to ensure that the weights of preserved operations should still sum to one. This strategy reported slightly lower accuracy. Actually, we find that only an average of 5.3 (out of 14 normal edges) most significant operations in \({\mathfrak {S}}_1\) continue to have the largest weight in \({\mathfrak {S}}_2\), and the number is only slightly increased to 6.7 from \({\mathfrak {S}}_2\) to \({\mathfrak {S}}_3\) – this is to say, deeper architectures may have altered preferences.

Here, we do not change the batch size to fit into the GPU memory because even under a fixed batch size, the usage of GPU memory can vary since the set of preserved candidates can differ, for example, a convolutional operator occupies more memory than a pooling operator. This is why we need to discuss the stability of GPU memory usage.

The mean test error of these three trials is \(3.61\%\pm 0.21\%\) (the corresponding errors are \(3.43\%\), \(3.51\%\) and \(3.89\%\), respectively).

Individually, swish activation function reduced the top-1 test error of NASNet-A from \(26.4\%\) to \(25.0\%\)(Ramachandran et al. 2017), SE module brought an performance gain of \(0.7\%\) (from \(25.5\%\) to \(24.8\%\)) on MnasNet (Tan et al. 2019), and AutoAugment achieved an accuracy gain of \(1.3\%\) on ResNet-50 (Cubuk et al. 2018). With swish activation function, SE module and AutoAugment, the compound gain is \(2.5\%\) (from \(25.2\%\) of MnasNet-92 to \(22.7\%\) of EfficientNet-B0. )

Baker, B., Gupta, O., Naik, N., & Raskar, R. (2017). Designing neural network architectures using reinforcement learning. In ICLR.

Bi, K., Hu, C., Xie, L., Chen, X., Wei, L., & Tian, Q. (2019). Stabilizing darts with amended gradient estimation on architectural parameters. arXiv:1910.11831.

Cai, H., Chen, T., Zhang, W., Yu, Y., & Wang, J. (2018). Efficient architecture search by network transformation. In AAAI.

Cai, H., Zhu, L., & Han, S. (2019). ProxylessNAS: Direct neural architecture search on target task and hardware. In ICLR.

Chen, X., Xie, L., Wu, J., & Tian, Q. (2019a). Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In ICCV.

Chen, Y., Yang, T., Zhang, X., Meng, G., Xiao, X., & Sun, J. (2019b). Detnas: Backbone search for object detection. In NeurIPS.

Chu, X., Zhang, B., Xu, R., & Li, J. (2019). Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. arXiv:1907.01845.

Cubuk, E. D., Zoph, B., Mane, D., Vasudevan, V., & Le, Q. V. (2018). Autoaugment: Learning augmentation policies from data. arXiv:1805.09501.

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In CVPR.

DeVries, T., & Taylor, G. W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv:1708.04552.

Dong, X., & Yang, Y. (2019a). One-shot neural architecture search via self-evaluated template network. In ICCV.

Dong, X., & Yang, Y. (2019b). Searching for a robust neural architecture in four gpu hours. In CVPR.

Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., & Tian, Q. (2019). Centernet: Keypoint triplets for object detection. In ICCV.

Elsken, T., Metzen, J. H., & Hutter, F. (2018). Neural architecture search: A survey. arXiv:1808.05377.

Ghiasi, G., Lin, T. Y., & Le, Q. V. (2019). Nas-fpn: Learning scalable feature pyramid architecture for object detection. In CVPR.

Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., & He, K. (2017). Accurate, large minibatch SGD: Training ImageNet in 1 hour. arXiv:1706.02677.

Han, D., Kim, J., & Kim, J. (2017). Deep pyramidal residual networks. In CVPR.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.

Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al. (2019) Searching for mobilenetv3. In ICCV.

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861.

Huang, G., Sun, Y., Liu, Z., Sedra, D., & Weinberger, K. Q. (2016). Deep networks with stochastic depth. In ECCV, Springer.

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In CVPR.

Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML.

Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech. rep., Citeseer.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In NIPS.

Larsson, G., Maire, M., & Shakhnarovich, G. (2017). FractalNet: Ultra-deep neural networks without residuals. In ICLR.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRef

Li, J., Ma, A. J., & Yuen, P. C. (2018). Semi-supervised region metric learning for person re-identification. IJCV, 126(8), 855–874.CrossRef

Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In ECCV.

Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In CVPR.

Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L. J., Fei-Fei, L., Yuille, A., Huang, J., & Murphy, K. (2018a). Progressive neural architecture search. In ECCV.

Liu, H., Simonyan, K., Vinyals, O., Fernando, C., & Kavukcuoglu, K. (2018b). Hierarchical representations for efficient architecture search. In ICLR.

Liu, H., Simonyan, K., & Yang, Y. (2019a). DARTS: Differentiable architecture search. In ICLR.

Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2019b). Deep learning for generic object detection: A survey. In IJCV.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). Ssd: Single shot multibox detector. In ECCV.

Ma, N., Zhang, X., Zheng, H. T., & Sun, J. (2018). ShuffleNet V2: Practical guidelines for efficient cnn architecture design. In ECCV.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). Pytorch: An imperative style, high-performance deep learning library. In NeurIPS.

Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018). Efficient neural architecture search via parameter sharing. In ICML.

Quan, R., Dong, X., Wu, Y., Zhu, L., & Yang, Y. (2019). Auto-reid: Searching for a part-aware convnet for person re-identification. In ICCV.

Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv:1710.05941.

Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2018). Regularized evolution for image classifier architecture search. arXiv:1802.01548.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. IJCV, 115(3), 211–252.MathSciNetCrossRef

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR.

Shu, Y., Wang, W., & Cai, S. (2020). Understanding architectures learnt by cell-based neural architecture search. In ICLR.

Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. JMLR, 15(1), 1929–1958.MathSciNetMATH

Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015). Training very deep networks. In NIPS.

Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2), 99–127.CrossRef

Suganuma, M., Shirakawa, S., & Nagao, T. (2017). A genetic programming approach to designing convolutional neural network architectures. In GECCO.

Sun, Y., Zheng, L., Yang, Y., Tian, Q., & Wang, S. (2018). Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline). In ECCV.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In CVPR.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In CVPR.

Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML.

Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., & Le, Q. V. (2019). Mnasnet: Platform-aware neural architecture search for mobile. In CVPR.

Tan, M., Pang, R., & Le, Q. V. (2020). Efficientdet: Scalable and efficient object detection. In CVPR.

Tian, Z., Shen, C., Chen, H., & He, T. (2020). Fcos: A simple and strong anchor-free object detector. arXiv:2006.09214.

Wang, H., Zhu, X., Gong, S., & Xiang, T. (2018). Person re-identification in identity regression space. IJCV, 126(12), 1288–1310.CrossRef

Wei, L., Zhang, S., Gao, W., & Tian, Q. (2018). Person transfer gan to bridge domain gap for person re-identification. In CVPR.

Wu, B., Dai, X., Zhang, P., Wang, Y., Sun, F., Wu, Y., Tian, Y., Vajda, P., Jia, Y., & Keutzer, K. (2019). Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In CVPR.

Xie, L., & Yuille, A. (2017). Genetic CNN. In ICCV.

Xie, S., Kirillov, A., Girshick, R., & He, K. (2019a). Exploring randomly wired neural networks for image recognition. In ICCV.

Xie, S., Zheng, H., Liu, C., & Lin, L. (2019b). SNAS: Stochastic neural architecture search. In ICLR.

Xu, Y., Xie, L., Zhang, X., Chen, X., Qi, G. J., Tian, Q., & Xiong, H. (2020). PC-DARTS: Partial channel connections for memory-efficient architecture search. In ICLR.

Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. arXiv:1605.07146.

Zela, A., Elsken, T., Saikia, T., Marrakchi, Y., Brox, T., & Hutter, F. (2020). Understanding and robustifying differentiable architecture search. In ICLR.

Zhang, X., Zhou, X., Lin, M., Sun, J. (2018). ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In CVPR.

Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q. (2015). Scalable person re-identification: A benchmark. In ICCV.

Zheng, X., Ji, R., Tang, L., Zhang, B., Liu, J., & Tian, Q. (2019). Multinomial distribution learning for effective neural architecture search. In ICCV.

Zheng, Z., Zheng, L., & Yang, Y. (2017). Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV.

Zhong, Z., Zheng, L., Kang, G., Li, S., & Yang, Y. (2017) Random erasing data augmentation. arXiv:1708.04896.

Zoph, B., & Le, Q. V. (2017). Neural architecture search with reinforcement learning. In ICLR.

Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018). Learning transferable architectures for scalable image recognition. In CVPR.

Titel: Progressive DARTS: Bridging the Optimization Gap for NAS in the Wild
verfasst von: Xin Chen
Lingxi Xie
Jun Wu
Qi Tian
Publikationsdatum: 03.11.2020
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 3/2021
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-020-01396-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2021

AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild

Progressive Multi-granularity Analysis for Video Prediction

Weakly Supervised Group Mask Network for Object Detection

CDTD: A Large-Scale Cross-Domain Benchmark for Instance-Level Image-to-Image Translation and Domain Adaptive Object Detection

Deep Nets: What have They Ever Done for Vision?

Compositional Convolutional Neural Networks: A Robust and Interpretable Model for Object Recognition Under Occlusion