skip to main content
survey

Generalizing from a Few Examples: A Survey on Few-shot Learning

Authors Info & Claims
Published:12 June 2020Publication History
Skip Abstract Section

Abstract

Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this article, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications, and theories, are also proposed to provide insights for future research.1

References

  1. N. Abdo, H. Kretzschmar, L. Spinello, and C. Stachniss. 2013. Learning manipulation actions from a few demonstrations. In Proceedings of the International Conference on Robotics and Automation. 1268--1275.Google ScholarGoogle Scholar
  2. Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. 2013. Label-embedding for attribute-based classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 819--826.Google ScholarGoogle Scholar
  3. M. Al-Shedivat, T. Bansal, Y. Burda, I. Sutskever, I. Mordatch, and P. Abbeel. 2018. Continuous adaptation via meta-learning in nonstationary and competitive environments. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  4. H. Altae-Tran, B. Ramsundar, A. S. Pappu, and V. Pande. 2017. Low data drug discovery with one-shot learning. ACS Central Sci. 3, 4 (2017), 283--293.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, and N. de Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems. MIT Press, 3981--3989.Google ScholarGoogle Scholar
  6. S. Arik, J. Chen, K. Peng, W. Ping, and Y. Zhou. 2018. Neural voice cloning with a few samples. In Advances in Neural Information Processing Systems. MIT Press, 10019--10029.Google ScholarGoogle Scholar
  7. S. Azadi, M. Fisher, V. G. Kim, Z. Wang, E. Shechtman, and T. Darrell. 2018. Multi-content GAN for few-shot font-style transfer. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 7564--7573.Google ScholarGoogle Scholar
  8. P. Bachman, A. Sordoni, and A. Trischler. 2017. Learning algorithms for active learning. In Proceedings of the International Conference on Machine Learning. 301--310.Google ScholarGoogle Scholar
  9. Y. Bengio, D. Bahdanau, and K. Cho. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  10. E. Bart and S. Ullman. 2005. Cross-generalization: Learning novel classes from a single example by feature replacement. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 1. 672--679.Google ScholarGoogle Scholar
  11. S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. 2007. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems. MIT Press, 137--144.Google ScholarGoogle Scholar
  12. S. Benaim and L. Wolf. 2018. One-shot unsupervised cross domain translation. In Advances in Neural Information Processing Systems. MIT Press, 2104--2114.Google ScholarGoogle Scholar
  13. L. Bertinetto, J. F. Henriques, P. Torr, and A. Vedaldi. 2019. Meta-learning with differentiable closed-form solvers. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  14. L. Bertinetto, J. F. Henriques, J. Valmadre, P. Torr, and A. Vedaldi. 2016. Learning feed-forward one-shot learners. In Advances in Neural Information Processing Systems. MIT Press, 523--531.Google ScholarGoogle Scholar
  15. C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman. 2008. Learning bounds for domain adaptation. In Advances in Neural Information Processing Systems. MIT Press, 129--136.Google ScholarGoogle Scholar
  17. L. Bottou and O. Bousquet. 2008. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems. MIT Press, 161--168.Google ScholarGoogle Scholar
  18. L. Bottou, F. E. Curtis, and J. Nocedal. 2018. Optimization methods for large-scale machine learning. SIAM Rev. 60, 2 (2018), 223--311.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Brock, T. Lim, J.M. Ritchie, and N. Weston. 2018. SMASH: One-shot model architecture search through hypernetworks. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  20. J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah. 1994. Signature verification using a “siamese” time delay neural network. In Advances in Neural Information Processing Systems. MIT Press, 737--744.Google ScholarGoogle Scholar
  21. S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool. 2017. One-shot video object segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 221--230.Google ScholarGoogle Scholar
  22. Q. Cai, Y. Pan, T. Yao, C. Yan, and T. Mei. 2018. Memory matching networks for one-shot image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4080--4088.Google ScholarGoogle Scholar
  23. R. Caruana. 1997. Multitask learning. Mach. Learn. 28, 1 (1997), 41--75.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Choi, J. Krishnamurthy, A. Kembhavi, and A. Farhadi. 2018. Structured set matching networks for one-shot part labeling. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3627--3636.Google ScholarGoogle Scholar
  25. J. D. Co-Reyes, A. Gupta, S. Sanjeev, N. Altieri, J. DeNero, P. Abbeel, and S. Levine. 2019. Meta-learning language-guided policy learning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  26. J. J. Craig. 2009. Introduction to Robotics: Mechanics and Control. Pearson Education India.Google ScholarGoogle Scholar
  27. E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. 2019. AutoAugment: Learning augmentation policies from data. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 113--123.Google ScholarGoogle Scholar
  28. T. Deleu and Y. Bengio. 2018. The effects of negative adaptation in model-agnostic meta-learning. arXiv preprint arXiv:1812.02159.Google ScholarGoogle Scholar
  29. G. Denevi, C. Ciliberto, D. Stamos, and M. Pontil. 2018. Learning to learn around a common mean. In Advances in Neural Information Processing Systems. MIT Press, 10190--10200.Google ScholarGoogle Scholar
  30. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarGoogle Scholar
  31. X. Dong, L. Zhu, D. Zhang, Y. Yang, and F. Wu. 2018. Fast parameter adaptation for few-shot image captioning and visual question answering. In Proceedings of the ACM International Conference on Multimedia. 54--62.Google ScholarGoogle Scholar
  32. M. Douze, A. Szlam, B. Hariharan, and H. Jégou. 2018. Low-shot learning with large-scale diffusion. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3349--3358.Google ScholarGoogle Scholar
  33. Y. Duan, M. Andrychowicz, B. Stadie, J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba. 2017. One-shot imitation learning. In Advances in Neural Information Processing Systems. MIT Press, 1087--1098.Google ScholarGoogle Scholar
  34. H. Edwards and A. Storkey. 2017. Towards a neural statistician. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  35. L. Fei-Fei, R. Fergus, and P. Perona. 2006. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 4 (2006), 594--611.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Fink. 2005. Object classification from a single example utilizing class relevance metrics. In Advances in Neural Information Processing Systems. MIT Press, 449--456.Google ScholarGoogle Scholar
  37. C. Finn, P. Abbeel, and S. Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning. 1126--1135.Google ScholarGoogle Scholar
  38. C. Finn and S. Levine. 2018. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  39. C. Finn, K. Xu, and S. Levine. 2018. Probabilistic model-agnostic meta-learning. In Advances in Neural Information Processing Systems. MIT Press, 9537--9548.Google ScholarGoogle Scholar
  40. L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil. 2018. Bilevel programming for hyperparameter optimization and meta-learning. In Proceedings of the International Conference on Machine Learning. 1563--1572.Google ScholarGoogle Scholar
  41. J. Friedman, T. Hastie, and R. Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. Springer series in statistics New York.Google ScholarGoogle Scholar
  42. H. Gao, Z. Shou, A. Zareian, H. Zhang, and S. Chang. 2018. Low-shot learning via covariance-preserving adversarial augmentation networks. In Advances in Neural Information Processing Systems. MIT Press, 983--993.Google ScholarGoogle Scholar
  43. P. Germain, F. Bach, A. Lacoste, and S. Lacoste-Julien. 2016. PAC-Bayesian theory meets Bayesian inference. In Advances in Neural Information Processing Systems. MIT Press, 1884--1892.Google ScholarGoogle Scholar
  44. S. Gidaris and N. Komodakis. 2018. Dynamic few-shot visual learning without forgetting. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4367--4375.Google ScholarGoogle Scholar
  45. I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. MIT Press.Google ScholarGoogle Scholar
  46. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2672--2680.Google ScholarGoogle Scholar
  47. J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. Turner. 2019. Meta-learning probabilistic inference for prediction. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  48. E. Grant, C. Finn, S. Levine, T. Darrell, and T. Griffiths. 2018. Recasting gradient-based meta-learning as hierarchical Bayes. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  49. A. Graves, G. Wayne, and I. Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401.Google ScholarGoogle Scholar
  50. L.-Y. Gui, Y.-X. Wang, D. Ramanan, and J. Moura. 2018. Few-shot human motion prediction via meta-learning. In Proceedings of the European Conference on Computer Vision. 432--450.Google ScholarGoogle Scholar
  51. M. Hamaya, T. Matsubara, T. Noda, T. Teramae, and J. Morimoto. 2016. Learning assistive strategies from a few user-robot interactions: Model-based reinforcement learning approach. In Proceedings of the International Conference on Robotics and Automation. 3346--3351.Google ScholarGoogle Scholar
  52. X. Han, H. Zhu, P. Yu, Z. Wang, Y. Yao, Z. Liu, and M. Sun. 2018. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4803--4809.Google ScholarGoogle Scholar
  53. B. Hariharan and R. Girshick. 2017. Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the International Conference on Computer Vision.Google ScholarGoogle Scholar
  54. H. He and E. A. Garcia. 2008. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9 (2008), 1263--1284.Google ScholarGoogle Scholar
  55. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarGoogle Scholar
  56. A. Herbelot and M. Baroni. 2017. High-risk learning: Acquiring new word vectors from tiny data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 304--309.Google ScholarGoogle Scholar
  57. L. B. Hewitt, M. I. Nye, A. Gane, T. Jaakkola, and J. B. Tenenbaum. 2018. The variational homoencoder: Learning to learn high capacity generative models from few examples. In Uncertainty in Artificial Intelligence. Elsevier/North Holland, 988--997.Google ScholarGoogle Scholar
  58. S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. S. Hochreiter, A. S. Younger, and P. R. Conwell. 2001. Learning to learn using gradient descent. In Proceedings of the International Conference on Artificial Neural Networks. 87--94.Google ScholarGoogle Scholar
  60. J. Hoffman, E. Tzeng, J. Donahue, Y. Jia, K. Saenko, and T. Darrell. 2013. One-shot adaptation of supervised deep convolutional models. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  61. Z. Hu, X. Li, C. Tu, Z. Liu, and M. Sun. 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the International Conference on Computational Linguistics. 487--498.Google ScholarGoogle Scholar
  62. S. J. Hwang and L. Sigal. 2014. A unified semantic embedding: Relating taxonomies and attributes. In Advances in Neural Information Processing Systems. MIT Press, 271--279.Google ScholarGoogle Scholar
  63. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. 675--678.Google ScholarGoogle Scholar
  64. V. Joshi, M. Peters, and M. Hopkins. 2018. Extending a parser to distant domains using a few dozen partially annotated examples. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 1190--1199.Google ScholarGoogle Scholar
  65. Ł. Kaiser, O. Nachum, A. Roy, and S. Bengio. 2017. Learning to remember rare events. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  66. J. M. Kanter and K. Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In Proceedings of the International Conference on Data Science and Advanced Analytics. 1--10.Google ScholarGoogle Scholar
  67. R. Keshari, M. Vatsa, R. Singh, and A. Noore. 2018. Learning structure and strength of CNN filters for small sample size training. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 9349--9358.Google ScholarGoogle Scholar
  68. D. P. Kingma and M. Welling. 2014. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  69. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. U.S.A. 114, 13 (2017), 3521--3526.Google ScholarGoogle ScholarCross RefCross Ref
  70. G. Koch. 2015. Siamese Neural Networks for One-shot Image Recognition. Ph.D. Dissertation. University of Toronto.Google ScholarGoogle Scholar
  71. L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18, 1 (2017), 826--830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. J. Kozerawski and M. Turk. 2018. CLEAR: Cumulative learning for one-shot one-class image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3446--3455.Google ScholarGoogle Scholar
  73. A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 1097--1105.Google ScholarGoogle Scholar
  74. R. Kwitt, S. Hegenbart, and M. Niethammer. 2016. One-shot learning of scene locations via feature trajectory transfer. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 78--86.Google ScholarGoogle Scholar
  75. B. Lake, C.-Y. Lee, J. Glass, and J. Tenenbaum. 2014. One-shot learning of generative speech concepts. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 36.Google ScholarGoogle Scholar
  76. B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science 350, 6266 (2015), 1332--1338.Google ScholarGoogle Scholar
  77. B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. 2017. Building machines that learn and think like people. Behav. Brain Sci. 40 (2017).Google ScholarGoogle Scholar
  78. C. H. Lampert, H. Nickisch, and S. Harmeling. 2009. Learning to detect unseen object classes by between-class attribute transfer. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 951--958.Google ScholarGoogle Scholar
  79. Y. Lee and S. Choi. 2018. Gradient-based meta-learning with learned layerwise metric and subspace. In Proceedings of the International Conference on Machine Learning. 2933--2942.Google ScholarGoogle Scholar
  80. K. Li and J. Malik. 2017. Learning to optimize. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  81. X.-L. Li, P. S. Yu, B. Liu, and S.-K. Ng. 2009. Positive unlabeled learning for data stream classification. In Proceedings of the SIAM International Conference on Data Mining. 259--270.Google ScholarGoogle ScholarCross RefCross Ref
  82. B. Liu, X. Wang, M. Dixit, R. Kwitt, and N. Vasconcelos. 2018. Feature space transfer for data augmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 9090--9098.Google ScholarGoogle Scholar
  83. H. Liu, K. Simonyan, and Y. Yang. 2019. DARTS: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  84. Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. Hwang, and Y Yang. 2019. Learning to propopagate labels: Transductive propagation network for few-shot learning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  85. Z. Luo, Y. Zou, J. Hoffman, and L. Fei-Fei. 2017. Label efficient learning of transferable representations acrosss domains and tasks. In Advances in Neural Information Processing Systems. MIT Press, 165--177.Google ScholarGoogle Scholar
  86. S. Mahadevan and P. Tadepalli. 1994. Quantifying prior determination knowledge using the PAC learning model. Mach. Learn. 17, 1 (1994), 69--105.Google ScholarGoogle ScholarCross RefCross Ref
  87. D. McNamara and M.-F. Balcan. 2017. Risk bounds for transferring representations with and without fine-tuning. In Proceedings of the International Conference on Machine Learning. 2373--2381.Google ScholarGoogle Scholar
  88. T. Mensink, E. Gavves, and C. Snoek. 2014. Costa: Co-occurrence statistics for zero-shot classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2441--2448.Google ScholarGoogle Scholar
  89. A. Miller, A. Fisch, J. Dodge, A.-H. Karimi, A. Bordes, and J. Weston. 2016. Key-value memory networks for directly reading documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1400--1409.Google ScholarGoogle Scholar
  90. E. G. Miller, N. E. Matsakis, and P. A. Viola. 2000. Learning from one example through shared densities on transforms. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 1. 464--471.Google ScholarGoogle Scholar
  91. N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. 2018. A simple neural attentive meta-learner. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  92. M. T. Mitchell. 1997. Machine Learning. McGraw-Hill.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. S. H. Mohammadi and T. Kim. 2018. Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 2833--2837.Google ScholarGoogle Scholar
  94. M. Mohri, A. Rostamizadeh, and A. Talwalkar. 2018. Foundations of Machine Learning. MIT Press.Google ScholarGoogle Scholar
  95. S. Motiian, Q. Jones, S. Iranmanesh, and G. Doretto. 2017. Few-shot adversarial domain adaptation. In Advances in Neural Information Processing Systems. MIT Press, 6670--6680.Google ScholarGoogle Scholar
  96. T. Munkhdalai and H. Yu. 2017. Meta networks. In Proceedings of the International Conference on Machine Learning. 2554--2563.Google ScholarGoogle Scholar
  97. T. Munkhdalai, X. Yuan, S. Mehri, and A. Trischler. 2018. Rapid adaptation with conditionally shifted neurons. In Proceedings of the International Conference on Machine Learning. 3661--3670.Google ScholarGoogle Scholar
  98. A. Nagabandi, C. Finn, and S. Levine. 2018. Deep online learning via meta-learning: Continual adaptation for model-based RL. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  99. H. Nguyen and L. Zakynthinou. 2018. Improved algorithms for collaborative PAC learning. In Advances in Neural Information Processing Systems. MIT Press, 7631--7639.Google ScholarGoogle Scholar
  100. B. Oreshkin, P. R. López, and A. Lacoste. 2018. TADAM: Task-dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 719--729.Google ScholarGoogle Scholar
  101. S. J. Pan and Q. Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 10, 22 (2010), 1345--1359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. T. Pfister, J. Charles, and A. Zisserman. 2014. Domain-adaptive discriminative one-shot learning of gestures. In Proceedings of the European Conference on Computer Vision. 814--829.Google ScholarGoogle Scholar
  103. H. Qi, M. Brown, and D. G. Lowe. 2018. Low-shot learning with imprinted weights. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 5822--5830.Google ScholarGoogle Scholar
  104. T. Ramalho and M. Garnelo. 2019. Adaptive posterior learning: Few-shot learning with a surprise-based memory module. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  105. S. Ravi and A. Beatson. 2019. Amortized Bayesian meta-learning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  106. S. Ravi and H. Larochelle. 2017. Optimization as a model for few-shot learning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  107. S. Reed, Y. Chen, T. Paine, A. van den Oord, S. M. A. Eslami, D. Rezende, O. Vinyals, and N. de Freitas. 2018. Few-shot autoregressive density estimation: Towards learning to learn distributions. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  108. M. Ren, S. Ravi, E. Triantafillou, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel. 2018. Meta-learning for semi-supervised few-shot classification. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  109. D. Rezende, I. Danihelka, K. Gregor, and D. Wierstra. 2016. One-shot generalization in deep generative models. In Proceedings of the International Conference on Machine Learning. 1521--1529.Google ScholarGoogle Scholar
  110. A. Rios and R. Kavuluru. 2018. Few-shot and zero-shot multi-label learning for structured label spaces. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 3132.Google ScholarGoogle Scholar
  111. A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell. 2019. Meta-learning with latent embedding optimization. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  112. R. Salakhutdinov and G. Hinton. 2009. Deep boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 448--455.Google ScholarGoogle Scholar
  113. R. Salakhutdinov, J. Tenenbaum, and A. Torralba. 2012. One-shot learning with a hierarchical nonparametric Bayesian model. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning. 195--206.Google ScholarGoogle Scholar
  114. A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. 2016. Meta-learning with memory-augmented neural networks. In Proceedings of the International Conference on Machine Learning. 1842--1850.Google ScholarGoogle Scholar
  115. V. G. Satorras and J. B. Estrach. 2018. Few-shot learning with graph neural networks. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  116. E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein. 2018. Delta-encoder: An effective sample synthesis method for few-shot object recognition. In Advances in Neural Information Processing Systems. MIT Press, 2850--2860.Google ScholarGoogle Scholar
  117. B. Settles. 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google ScholarGoogle Scholar
  118. J. Shu, Z. Xu, and D Meng. 2018. Small sample learning in big data era. arXiv preprint arXiv:1808.04572.Google ScholarGoogle Scholar
  119. P. Shyam, S. Gupta, and A. Dukkipati. 2017. Attentive recurrent comparators. In Proceedings of the International Conference on Machine Learning. 3173--3181.Google ScholarGoogle Scholar
  120. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.Google ScholarGoogle Scholar
  121. J. Snell, K. Swersky, and R. S. Zemel. 2017. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 4077--4087.Google ScholarGoogle Scholar
  122. M. D. Spivak. 1970. A Comprehensive Introduction to Differential Geometry. Publish or Perish.Google ScholarGoogle Scholar
  123. R. K. Srivastava, K. Greff, and J. Schmidhuber. 2015. Training very deep networks. In Advances in Neural Information Processing Systems. MIT Press, 2377--2385.Google ScholarGoogle Scholar
  124. S. Sukhbaatar, J. Weston, R. Fergus, et al. 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems. MIT Press, 2440--2448.Google ScholarGoogle Scholar
  125. J. Sun, S. Wang, and C. Zong. 2018. Memory, show the way: Memory based few shot word representation learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1435--1444.Google ScholarGoogle Scholar
  126. F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1199--1208.Google ScholarGoogle Scholar
  127. K. D. Tang, M. F. Tappen, R. Sukthankar, and C. H. Lampert. 2010. Optimizing one-shot recognition with micro-set learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3027--3034.Google ScholarGoogle Scholar
  128. A. Tjandra, S. Sakti, and S. Nakamura. 2018. Machine speech chain with one-shot speaker adaptation. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 887--891.Google ScholarGoogle Scholar
  129. A. Torralba, J. B. Tenenbaum, and R. R. Salakhutdinov. 2011. Learning to learn with compound HD models. In Advances in Neural Information Processing Systems. MIT Press, 2061--2069.Google ScholarGoogle Scholar
  130. E. Triantafillou, R. Zemel, and R. Urtasun. 2017. Few-shot learning through an information retrieval lens. In Advances in Neural Information Processing Systems. MIT Press, 2255--2265.Google ScholarGoogle Scholar
  131. E. Triantafillou, T. Zhu, V. Dumoulin, P. Lamblin, K. Xu, R. Goroshin, C. Gelada, K. Swersky, P.-A. Manzagol, et al. 2019. Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096.Google ScholarGoogle Scholar
  132. Y.-H. Tsai, L.-K. Huang, and R. Salakhutdinov. 2017. Learning robust visual-semantic embeddings. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3571--3580.Google ScholarGoogle Scholar
  133. Y. H. Tsai and R. Salakhutdinov. 2017. Improving one-shot learning through fusing side information. arXiv preprint arXiv:1710.08347.Google ScholarGoogle Scholar
  134. M. A. Turing. 1950. Computing machinery and intelligence. Mind 59, 236 (1950), 433--433.Google ScholarGoogle ScholarCross RefCross Ref
  135. A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. 2016. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems. MIT Press, 4790--4798.Google ScholarGoogle Scholar
  136. V. N. Vapnik. 1992. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems. MIT Press, 831--838.Google ScholarGoogle Scholar
  137. M. Vartak, A. Thiagarajan, C. Miranda, J. Bratman, and H. Larochelle. 2017. A meta-learning perspective on cold-start recommendations for items. In Advances in Neural Information Processing Systems. MIT Press, 6904--6914.Google ScholarGoogle Scholar
  138. O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. 2016. Matching networks for one shot learning. In Advances in Neural Information Processing Systems. MIT Press, 3630--3638.Google ScholarGoogle Scholar
  139. P. Wang, L. Liu, C. Shen, Z. Huang, A. van den Hengel, and H. Tao Shen. 2017. Multi-attention network for one shot learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2721--2729.Google ScholarGoogle Scholar
  140. X. Wang, Y. Ye, and A. Gupta. 2018. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 6857--6866.Google ScholarGoogle Scholar
  141. Y.-X. Wang, R. Girshick, M. Hebert, and B. Hariharan. 2018. Low-shot learning from imaginary data. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 7278--7286.Google ScholarGoogle Scholar
  142. Y.-X. Wang and M. Hebert. 2016. Learning from small sample sets by combining unsupervised meta-training with CNNs. In Advances in Neural Information Processing Systems. MIT Press, 244--252.Google ScholarGoogle Scholar
  143. Y.-X. Wang and M. Hebert. 2016. Learning to learn: Model regression networks for easy small sample learning. In Proceedings of the European Conference on Computer Vision. 616--634.Google ScholarGoogle Scholar
  144. J. Wei and K. Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing. 6383--6389.Google ScholarGoogle Scholar
  145. J. Weston, S. Chopra, and A. Bordes. 2014. Memory networks. arXiv preprint arXiv:1410.3916.Google ScholarGoogle Scholar
  146. M. Woodward and C. Finn. 2017. Active one-shot learning. arXiv preprint arXiv:1702.06559.Google ScholarGoogle Scholar
  147. Y. Wu and Y. Demiris. 2010. Towards one shot learning by imitation for humanoid robots. In Proceedings of the International Conference on Robotics and Automation. 2889--2894.Google ScholarGoogle Scholar
  148. Y. Wu, Y. Lin, X. Dong, Y. Yan, W. Ouyang, and Y. Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 5177--5186.Google ScholarGoogle Scholar
  149. Z. Xu, L. Zhu, and Y. Yang. 2017. Few-shot object recognition from machine-labeled web images. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1164--1172.Google ScholarGoogle Scholar
  150. L. Yan, Y. Zheng, and J. Cao. 2018. Few-shot learning for short text classification. Multimedia Tools Appl. 77, 22 (2018), 29799--29810.Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. W. Yan, J. Yap, and G. Mori. 2015. Multi-task transfer methods to improve one-shot learning for multimedia event detection. In Proceedings of the British Machine Vision Conference.Google ScholarGoogle Scholar
  152. H. Yang, X. He, and F. Porikli. 2018. One-shot action localization by learning sequence matching network. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1450--1459.Google ScholarGoogle Scholar
  153. Q. Yao, M. Wang, E. H. Jair, I. Guyon, Y.-Q. Hu, Y.-F. Li, W.-W. Tu, Q. Yang, and Y. Yu. 2018. Taking human out of learning applications: A survey on automated machine learning. arXiv preprint arXiv:1810.13306.Google ScholarGoogle Scholar
  154. Q. Yao, J. Xu, W.-W. Tu, and Z. Zhu. 2020. Efficient neural architecture search via proximal iterations. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  155. D. Yoo, H. Fan, V. N. Boddeti, and K. M. Kitani. 2018. Efficient k-shot learning with regularized deep networks. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  156. J. Yoon, T. Kim, O. Dia, S. Kim, Y. Bengio, and S. Ahn. 2018. Bayesian model-agnostic meta-learning. In Advances in Neural Information Processing Systems. MIT Press, 7343--7353.Google ScholarGoogle Scholar
  157. M. Yu, X. Guo, J. Yi, S. Chang, S. Potdar, Y. Cheng, G. Tesauro, H. Wang, and B. Zhou. 2018. Diverse few-shot text classification with multiple metrics. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1206--1215.Google ScholarGoogle Scholar
  158. C. Zhang, J. Butepage, H. Kjellstrom, and S. Mandt. 2019. Advances in variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 41, 8 (2019), 2008--2026.Google ScholarGoogle ScholarCross RefCross Ref
  159. R. Zhang, T. Che, Z. Ghahramani, Y. Bengio, and Y. Song. 2018. MetaGAN: An adversarial approach to few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 2371--2380.Google ScholarGoogle Scholar
  160. Y. Zhang, H. Tang, and K. Jia. 2018. Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data. In Proceedings of the European Conference on Computer Vision. 233--248.Google ScholarGoogle Scholar
  161. Y. Zhang and Q. Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114.Google ScholarGoogle Scholar
  162. F. Zhao, J. Zhao, S. Yan, and J. Feng. 2018. Dynamic conditional networks for few-shot learning. In Proceedings of the European Conference on Computer Vision.Google ScholarGoogle Scholar
  163. Z.-H. Zhou. 2017. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5, 1 (2017), 44--53.Google ScholarGoogle ScholarCross RefCross Ref
  164. L. Zhu and Y. Yang. 2018. Compound memory networks for few-shot video classification. In Proceedings of the European Conference on Computer Vision. 751--766.Google ScholarGoogle Scholar
  165. X. J. Zhu. 2005. Semi-supervised Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google ScholarGoogle Scholar
  166. B. Zoph and Q. V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar

Index Terms

  1. Generalizing from a Few Examples: A Survey on Few-shot Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Computing Surveys
        ACM Computing Surveys  Volume 53, Issue 3
        May 2021
        787 pages
        ISSN:0360-0300
        EISSN:1557-7341
        DOI:10.1145/3403423
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 June 2020
        • Online AM: 7 May 2020
        • Accepted: 1 March 2020
        • Revised: 1 January 2020
        • Received: 1 May 2019
        Published in csur Volume 53, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • survey
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format