Abstract
Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this article, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications, and theories, are also proposed to provide insights for future research.1
- N. Abdo, H. Kretzschmar, L. Spinello, and C. Stachniss. 2013. Learning manipulation actions from a few demonstrations. In Proceedings of the International Conference on Robotics and Automation. 1268--1275.Google Scholar
- Z. Akata, F. Perronnin, Z. Harchaoui, and C. Schmid. 2013. Label-embedding for attribute-based classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 819--826.Google Scholar
- M. Al-Shedivat, T. Bansal, Y. Burda, I. Sutskever, I. Mordatch, and P. Abbeel. 2018. Continuous adaptation via meta-learning in nonstationary and competitive environments. In Proceedings of the International Conference on Learning Representations.Google Scholar
- H. Altae-Tran, B. Ramsundar, A. S. Pappu, and V. Pande. 2017. Low data drug discovery with one-shot learning. ACS Central Sci. 3, 4 (2017), 283--293.Google ScholarCross Ref
- M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, and N. de Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems. MIT Press, 3981--3989.Google Scholar
- S. Arik, J. Chen, K. Peng, W. Ping, and Y. Zhou. 2018. Neural voice cloning with a few samples. In Advances in Neural Information Processing Systems. MIT Press, 10019--10029.Google Scholar
- S. Azadi, M. Fisher, V. G. Kim, Z. Wang, E. Shechtman, and T. Darrell. 2018. Multi-content GAN for few-shot font-style transfer. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 7564--7573.Google Scholar
- P. Bachman, A. Sordoni, and A. Trischler. 2017. Learning algorithms for active learning. In Proceedings of the International Conference on Machine Learning. 301--310.Google Scholar
- Y. Bengio, D. Bahdanau, and K. Cho. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations.Google Scholar
- E. Bart and S. Ullman. 2005. Cross-generalization: Learning novel classes from a single example by feature replacement. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 1. 672--679.Google Scholar
- S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. 2007. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems. MIT Press, 137--144.Google Scholar
- S. Benaim and L. Wolf. 2018. One-shot unsupervised cross domain translation. In Advances in Neural Information Processing Systems. MIT Press, 2104--2114.Google Scholar
- L. Bertinetto, J. F. Henriques, P. Torr, and A. Vedaldi. 2019. Meta-learning with differentiable closed-form solvers. In Proceedings of the International Conference on Learning Representations.Google Scholar
- L. Bertinetto, J. F. Henriques, J. Valmadre, P. Torr, and A. Vedaldi. 2016. Learning feed-forward one-shot learners. In Advances in Neural Information Processing Systems. MIT Press, 523--531.Google Scholar
- C. M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.Google ScholarDigital Library
- J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Wortman. 2008. Learning bounds for domain adaptation. In Advances in Neural Information Processing Systems. MIT Press, 129--136.Google Scholar
- L. Bottou and O. Bousquet. 2008. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems. MIT Press, 161--168.Google Scholar
- L. Bottou, F. E. Curtis, and J. Nocedal. 2018. Optimization methods for large-scale machine learning. SIAM Rev. 60, 2 (2018), 223--311.Google ScholarCross Ref
- A. Brock, T. Lim, J.M. Ritchie, and N. Weston. 2018. SMASH: One-shot model architecture search through hypernetworks. In Proceedings of the International Conference on Learning Representations.Google Scholar
- J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah. 1994. Signature verification using a “siamese” time delay neural network. In Advances in Neural Information Processing Systems. MIT Press, 737--744.Google Scholar
- S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool. 2017. One-shot video object segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 221--230.Google Scholar
- Q. Cai, Y. Pan, T. Yao, C. Yan, and T. Mei. 2018. Memory matching networks for one-shot image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4080--4088.Google Scholar
- R. Caruana. 1997. Multitask learning. Mach. Learn. 28, 1 (1997), 41--75.Google ScholarDigital Library
- J. Choi, J. Krishnamurthy, A. Kembhavi, and A. Farhadi. 2018. Structured set matching networks for one-shot part labeling. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3627--3636.Google Scholar
- J. D. Co-Reyes, A. Gupta, S. Sanjeev, N. Altieri, J. DeNero, P. Abbeel, and S. Levine. 2019. Meta-learning language-guided policy learning. In Proceedings of the International Conference on Learning Representations.Google Scholar
- J. J. Craig. 2009. Introduction to Robotics: Mechanics and Control. Pearson Education India.Google Scholar
- E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le. 2019. AutoAugment: Learning augmentation policies from data. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 113--123.Google Scholar
- T. Deleu and Y. Bengio. 2018. The effects of negative adaptation in model-agnostic meta-learning. arXiv preprint arXiv:1812.02159.Google Scholar
- G. Denevi, C. Ciliberto, D. Stamos, and M. Pontil. 2018. Learning to learn around a common mean. In Advances in Neural Information Processing Systems. MIT Press, 10190--10200.Google Scholar
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 248--255.Google Scholar
- X. Dong, L. Zhu, D. Zhang, Y. Yang, and F. Wu. 2018. Fast parameter adaptation for few-shot image captioning and visual question answering. In Proceedings of the ACM International Conference on Multimedia. 54--62.Google Scholar
- M. Douze, A. Szlam, B. Hariharan, and H. Jégou. 2018. Low-shot learning with large-scale diffusion. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3349--3358.Google Scholar
- Y. Duan, M. Andrychowicz, B. Stadie, J. Ho, J. Schneider, I. Sutskever, P. Abbeel, and W. Zaremba. 2017. One-shot imitation learning. In Advances in Neural Information Processing Systems. MIT Press, 1087--1098.Google Scholar
- H. Edwards and A. Storkey. 2017. Towards a neural statistician. In Proceedings of the International Conference on Learning Representations.Google Scholar
- L. Fei-Fei, R. Fergus, and P. Perona. 2006. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 4 (2006), 594--611.Google ScholarDigital Library
- M. Fink. 2005. Object classification from a single example utilizing class relevance metrics. In Advances in Neural Information Processing Systems. MIT Press, 449--456.Google Scholar
- C. Finn, P. Abbeel, and S. Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning. 1126--1135.Google Scholar
- C. Finn and S. Levine. 2018. Meta-learning and universality: Deep representations and gradient descent can approximate any learning algorithm. In Proceedings of the International Conference on Learning Representations.Google Scholar
- C. Finn, K. Xu, and S. Levine. 2018. Probabilistic model-agnostic meta-learning. In Advances in Neural Information Processing Systems. MIT Press, 9537--9548.Google Scholar
- L. Franceschi, P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil. 2018. Bilevel programming for hyperparameter optimization and meta-learning. In Proceedings of the International Conference on Machine Learning. 1563--1572.Google Scholar
- J. Friedman, T. Hastie, and R. Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. Springer series in statistics New York.Google Scholar
- H. Gao, Z. Shou, A. Zareian, H. Zhang, and S. Chang. 2018. Low-shot learning via covariance-preserving adversarial augmentation networks. In Advances in Neural Information Processing Systems. MIT Press, 983--993.Google Scholar
- P. Germain, F. Bach, A. Lacoste, and S. Lacoste-Julien. 2016. PAC-Bayesian theory meets Bayesian inference. In Advances in Neural Information Processing Systems. MIT Press, 1884--1892.Google Scholar
- S. Gidaris and N. Komodakis. 2018. Dynamic few-shot visual learning without forgetting. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 4367--4375.Google Scholar
- I. Goodfellow, Y. Bengio, and A. Courville. 2016. Deep Learning. MIT Press.Google Scholar
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. MIT Press, 2672--2680.Google Scholar
- J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. Turner. 2019. Meta-learning probabilistic inference for prediction. In Proceedings of the International Conference on Learning Representations.Google Scholar
- E. Grant, C. Finn, S. Levine, T. Darrell, and T. Griffiths. 2018. Recasting gradient-based meta-learning as hierarchical Bayes. In Proceedings of the International Conference on Learning Representations.Google Scholar
- A. Graves, G. Wayne, and I. Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401.Google Scholar
- L.-Y. Gui, Y.-X. Wang, D. Ramanan, and J. Moura. 2018. Few-shot human motion prediction via meta-learning. In Proceedings of the European Conference on Computer Vision. 432--450.Google Scholar
- M. Hamaya, T. Matsubara, T. Noda, T. Teramae, and J. Morimoto. 2016. Learning assistive strategies from a few user-robot interactions: Model-based reinforcement learning approach. In Proceedings of the International Conference on Robotics and Automation. 3346--3351.Google Scholar
- X. Han, H. Zhu, P. Yu, Z. Wang, Y. Yao, Z. Liu, and M. Sun. 2018. FewRel: A large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 4803--4809.Google Scholar
- B. Hariharan and R. Girshick. 2017. Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the International Conference on Computer Vision.Google Scholar
- H. He and E. A. Garcia. 2008. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 9 (2008), 1263--1284.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 770--778.Google Scholar
- A. Herbelot and M. Baroni. 2017. High-risk learning: Acquiring new word vectors from tiny data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 304--309.Google Scholar
- L. B. Hewitt, M. I. Nye, A. Gane, T. Jaakkola, and J. B. Tenenbaum. 2018. The variational homoencoder: Learning to learn high capacity generative models from few examples. In Uncertainty in Artificial Intelligence. Elsevier/North Holland, 988--997.Google Scholar
- S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780.Google ScholarDigital Library
- S. Hochreiter, A. S. Younger, and P. R. Conwell. 2001. Learning to learn using gradient descent. In Proceedings of the International Conference on Artificial Neural Networks. 87--94.Google Scholar
- J. Hoffman, E. Tzeng, J. Donahue, Y. Jia, K. Saenko, and T. Darrell. 2013. One-shot adaptation of supervised deep convolutional models. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Z. Hu, X. Li, C. Tu, Z. Liu, and M. Sun. 2018. Few-shot charge prediction with discriminative legal attributes. In Proceedings of the International Conference on Computational Linguistics. 487--498.Google Scholar
- S. J. Hwang and L. Sigal. 2014. A unified semantic embedding: Relating taxonomies and attributes. In Advances in Neural Information Processing Systems. MIT Press, 271--279.Google Scholar
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. 675--678.Google Scholar
- V. Joshi, M. Peters, and M. Hopkins. 2018. Extending a parser to distant domains using a few dozen partially annotated examples. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. 1190--1199.Google Scholar
- Ł. Kaiser, O. Nachum, A. Roy, and S. Bengio. 2017. Learning to remember rare events. In Proceedings of the International Conference on Learning Representations.Google Scholar
- J. M. Kanter and K. Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In Proceedings of the International Conference on Data Science and Advanced Analytics. 1--10.Google Scholar
- R. Keshari, M. Vatsa, R. Singh, and A. Noore. 2018. Learning structure and strength of CNN filters for small sample size training. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 9349--9358.Google Scholar
- D. P. Kingma and M. Welling. 2014. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations.Google Scholar
- J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. U.S.A. 114, 13 (2017), 3521--3526.Google ScholarCross Ref
- G. Koch. 2015. Siamese Neural Networks for One-shot Image Recognition. Ph.D. Dissertation. University of Toronto.Google Scholar
- L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. 2017. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18, 1 (2017), 826--830.Google ScholarDigital Library
- J. Kozerawski and M. Turk. 2018. CLEAR: Cumulative learning for one-shot one-class image recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3446--3455.Google Scholar
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. MIT Press, 1097--1105.Google Scholar
- R. Kwitt, S. Hegenbart, and M. Niethammer. 2016. One-shot learning of scene locations via feature trajectory transfer. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 78--86.Google Scholar
- B. Lake, C.-Y. Lee, J. Glass, and J. Tenenbaum. 2014. One-shot learning of generative speech concepts. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 36.Google Scholar
- B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. 2015. Human-level concept learning through probabilistic program induction. Science 350, 6266 (2015), 1332--1338.Google Scholar
- B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. 2017. Building machines that learn and think like people. Behav. Brain Sci. 40 (2017).Google Scholar
- C. H. Lampert, H. Nickisch, and S. Harmeling. 2009. Learning to detect unseen object classes by between-class attribute transfer. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 951--958.Google Scholar
- Y. Lee and S. Choi. 2018. Gradient-based meta-learning with learned layerwise metric and subspace. In Proceedings of the International Conference on Machine Learning. 2933--2942.Google Scholar
- K. Li and J. Malik. 2017. Learning to optimize. In Proceedings of the International Conference on Learning Representations.Google Scholar
- X.-L. Li, P. S. Yu, B. Liu, and S.-K. Ng. 2009. Positive unlabeled learning for data stream classification. In Proceedings of the SIAM International Conference on Data Mining. 259--270.Google ScholarCross Ref
- B. Liu, X. Wang, M. Dixit, R. Kwitt, and N. Vasconcelos. 2018. Feature space transfer for data augmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 9090--9098.Google Scholar
- H. Liu, K. Simonyan, and Y. Yang. 2019. DARTS: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. Hwang, and Y Yang. 2019. Learning to propopagate labels: Transductive propagation network for few-shot learning. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Z. Luo, Y. Zou, J. Hoffman, and L. Fei-Fei. 2017. Label efficient learning of transferable representations acrosss domains and tasks. In Advances in Neural Information Processing Systems. MIT Press, 165--177.Google Scholar
- S. Mahadevan and P. Tadepalli. 1994. Quantifying prior determination knowledge using the PAC learning model. Mach. Learn. 17, 1 (1994), 69--105.Google ScholarCross Ref
- D. McNamara and M.-F. Balcan. 2017. Risk bounds for transferring representations with and without fine-tuning. In Proceedings of the International Conference on Machine Learning. 2373--2381.Google Scholar
- T. Mensink, E. Gavves, and C. Snoek. 2014. Costa: Co-occurrence statistics for zero-shot classification. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2441--2448.Google Scholar
- A. Miller, A. Fisch, J. Dodge, A.-H. Karimi, A. Bordes, and J. Weston. 2016. Key-value memory networks for directly reading documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1400--1409.Google Scholar
- E. G. Miller, N. E. Matsakis, and P. A. Viola. 2000. Learning from one example through shared densities on transforms. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 1. 464--471.Google Scholar
- N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel. 2018. A simple neural attentive meta-learner. In Proceedings of the International Conference on Learning Representations.Google Scholar
- M. T. Mitchell. 1997. Machine Learning. McGraw-Hill.Google ScholarDigital Library
- S. H. Mohammadi and T. Kim. 2018. Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 2833--2837.Google Scholar
- M. Mohri, A. Rostamizadeh, and A. Talwalkar. 2018. Foundations of Machine Learning. MIT Press.Google Scholar
- S. Motiian, Q. Jones, S. Iranmanesh, and G. Doretto. 2017. Few-shot adversarial domain adaptation. In Advances in Neural Information Processing Systems. MIT Press, 6670--6680.Google Scholar
- T. Munkhdalai and H. Yu. 2017. Meta networks. In Proceedings of the International Conference on Machine Learning. 2554--2563.Google Scholar
- T. Munkhdalai, X. Yuan, S. Mehri, and A. Trischler. 2018. Rapid adaptation with conditionally shifted neurons. In Proceedings of the International Conference on Machine Learning. 3661--3670.Google Scholar
- A. Nagabandi, C. Finn, and S. Levine. 2018. Deep online learning via meta-learning: Continual adaptation for model-based RL. In Proceedings of the International Conference on Learning Representations.Google Scholar
- H. Nguyen and L. Zakynthinou. 2018. Improved algorithms for collaborative PAC learning. In Advances in Neural Information Processing Systems. MIT Press, 7631--7639.Google Scholar
- B. Oreshkin, P. R. López, and A. Lacoste. 2018. TADAM: Task-dependent adaptive metric for improved few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 719--729.Google Scholar
- S. J. Pan and Q. Yang. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 10, 22 (2010), 1345--1359.Google ScholarDigital Library
- T. Pfister, J. Charles, and A. Zisserman. 2014. Domain-adaptive discriminative one-shot learning of gestures. In Proceedings of the European Conference on Computer Vision. 814--829.Google Scholar
- H. Qi, M. Brown, and D. G. Lowe. 2018. Low-shot learning with imprinted weights. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 5822--5830.Google Scholar
- T. Ramalho and M. Garnelo. 2019. Adaptive posterior learning: Few-shot learning with a surprise-based memory module. In Proceedings of the International Conference on Learning Representations.Google Scholar
- S. Ravi and A. Beatson. 2019. Amortized Bayesian meta-learning. In Proceedings of the International Conference on Learning Representations.Google Scholar
- S. Ravi and H. Larochelle. 2017. Optimization as a model for few-shot learning. In Proceedings of the International Conference on Learning Representations.Google Scholar
- S. Reed, Y. Chen, T. Paine, A. van den Oord, S. M. A. Eslami, D. Rezende, O. Vinyals, and N. de Freitas. 2018. Few-shot autoregressive density estimation: Towards learning to learn distributions. In Proceedings of the International Conference on Learning Representations.Google Scholar
- M. Ren, S. Ravi, E. Triantafillou, J. Snell, K. Swersky, J. B. Tenenbaum, H. Larochelle, and R. S. Zemel. 2018. Meta-learning for semi-supervised few-shot classification. In Proceedings of the International Conference on Learning Representations.Google Scholar
- D. Rezende, I. Danihelka, K. Gregor, and D. Wierstra. 2016. One-shot generalization in deep generative models. In Proceedings of the International Conference on Machine Learning. 1521--1529.Google Scholar
- A. Rios and R. Kavuluru. 2018. Few-shot and zero-shot multi-label learning for structured label spaces. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 3132.Google Scholar
- A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell. 2019. Meta-learning with latent embedding optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
- R. Salakhutdinov and G. Hinton. 2009. Deep boltzmann machines. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 448--455.Google Scholar
- R. Salakhutdinov, J. Tenenbaum, and A. Torralba. 2012. One-shot learning with a hierarchical nonparametric Bayesian model. In Proceedings of the ICML Workshop on Unsupervised and Transfer Learning. 195--206.Google Scholar
- A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. 2016. Meta-learning with memory-augmented neural networks. In Proceedings of the International Conference on Machine Learning. 1842--1850.Google Scholar
- V. G. Satorras and J. B. Estrach. 2018. Few-shot learning with graph neural networks. In Proceedings of the International Conference on Learning Representations.Google Scholar
- E. Schwartz, L. Karlinsky, J. Shtok, S. Harary, M. Marder, A. Kumar, R. Feris, R. Giryes, and A. Bronstein. 2018. Delta-encoder: An effective sample synthesis method for few-shot object recognition. In Advances in Neural Information Processing Systems. MIT Press, 2850--2860.Google Scholar
- B. Settles. 2009. Active Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar
- J. Shu, Z. Xu, and D Meng. 2018. Small sample learning in big data era. arXiv preprint arXiv:1808.04572.Google Scholar
- P. Shyam, S. Gupta, and A. Dukkipati. 2017. Attentive recurrent comparators. In Proceedings of the International Conference on Machine Learning. 3173--3181.Google Scholar
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489.Google Scholar
- J. Snell, K. Swersky, and R. S. Zemel. 2017. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 4077--4087.Google Scholar
- M. D. Spivak. 1970. A Comprehensive Introduction to Differential Geometry. Publish or Perish.Google Scholar
- R. K. Srivastava, K. Greff, and J. Schmidhuber. 2015. Training very deep networks. In Advances in Neural Information Processing Systems. MIT Press, 2377--2385.Google Scholar
- S. Sukhbaatar, J. Weston, R. Fergus, et al. 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems. MIT Press, 2440--2448.Google Scholar
- J. Sun, S. Wang, and C. Zong. 2018. Memory, show the way: Memory based few shot word representation learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1435--1444.Google Scholar
- F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1199--1208.Google Scholar
- K. D. Tang, M. F. Tappen, R. Sukthankar, and C. H. Lampert. 2010. Optimizing one-shot recognition with micro-set learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3027--3034.Google Scholar
- A. Tjandra, S. Sakti, and S. Nakamura. 2018. Machine speech chain with one-shot speaker adaptation. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 887--891.Google Scholar
- A. Torralba, J. B. Tenenbaum, and R. R. Salakhutdinov. 2011. Learning to learn with compound HD models. In Advances in Neural Information Processing Systems. MIT Press, 2061--2069.Google Scholar
- E. Triantafillou, R. Zemel, and R. Urtasun. 2017. Few-shot learning through an information retrieval lens. In Advances in Neural Information Processing Systems. MIT Press, 2255--2265.Google Scholar
- E. Triantafillou, T. Zhu, V. Dumoulin, P. Lamblin, K. Xu, R. Goroshin, C. Gelada, K. Swersky, P.-A. Manzagol, et al. 2019. Meta-dataset: A dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096.Google Scholar
- Y.-H. Tsai, L.-K. Huang, and R. Salakhutdinov. 2017. Learning robust visual-semantic embeddings. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3571--3580.Google Scholar
- Y. H. Tsai and R. Salakhutdinov. 2017. Improving one-shot learning through fusing side information. arXiv preprint arXiv:1710.08347.Google Scholar
- M. A. Turing. 1950. Computing machinery and intelligence. Mind 59, 236 (1950), 433--433.Google ScholarCross Ref
- A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. 2016. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems. MIT Press, 4790--4798.Google Scholar
- V. N. Vapnik. 1992. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems. MIT Press, 831--838.Google Scholar
- M. Vartak, A. Thiagarajan, C. Miranda, J. Bratman, and H. Larochelle. 2017. A meta-learning perspective on cold-start recommendations for items. In Advances in Neural Information Processing Systems. MIT Press, 6904--6914.Google Scholar
- O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. 2016. Matching networks for one shot learning. In Advances in Neural Information Processing Systems. MIT Press, 3630--3638.Google Scholar
- P. Wang, L. Liu, C. Shen, Z. Huang, A. van den Hengel, and H. Tao Shen. 2017. Multi-attention network for one shot learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 2721--2729.Google Scholar
- X. Wang, Y. Ye, and A. Gupta. 2018. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 6857--6866.Google Scholar
- Y.-X. Wang, R. Girshick, M. Hebert, and B. Hariharan. 2018. Low-shot learning from imaginary data. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 7278--7286.Google Scholar
- Y.-X. Wang and M. Hebert. 2016. Learning from small sample sets by combining unsupervised meta-training with CNNs. In Advances in Neural Information Processing Systems. MIT Press, 244--252.Google Scholar
- Y.-X. Wang and M. Hebert. 2016. Learning to learn: Model regression networks for easy small sample learning. In Proceedings of the European Conference on Computer Vision. 616--634.Google Scholar
- J. Wei and K. Zou. 2019. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing. 6383--6389.Google Scholar
- J. Weston, S. Chopra, and A. Bordes. 2014. Memory networks. arXiv preprint arXiv:1410.3916.Google Scholar
- M. Woodward and C. Finn. 2017. Active one-shot learning. arXiv preprint arXiv:1702.06559.Google Scholar
- Y. Wu and Y. Demiris. 2010. Towards one shot learning by imitation for humanoid robots. In Proceedings of the International Conference on Robotics and Automation. 2889--2894.Google Scholar
- Y. Wu, Y. Lin, X. Dong, Y. Yan, W. Ouyang, and Y. Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 5177--5186.Google Scholar
- Z. Xu, L. Zhu, and Y. Yang. 2017. Few-shot object recognition from machine-labeled web images. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1164--1172.Google Scholar
- L. Yan, Y. Zheng, and J. Cao. 2018. Few-shot learning for short text classification. Multimedia Tools Appl. 77, 22 (2018), 29799--29810.Google ScholarDigital Library
- W. Yan, J. Yap, and G. Mori. 2015. Multi-task transfer methods to improve one-shot learning for multimedia event detection. In Proceedings of the British Machine Vision Conference.Google Scholar
- H. Yang, X. He, and F. Porikli. 2018. One-shot action localization by learning sequence matching network. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 1450--1459.Google Scholar
- Q. Yao, M. Wang, E. H. Jair, I. Guyon, Y.-Q. Hu, Y.-F. Li, W.-W. Tu, Q. Yang, and Y. Yu. 2018. Taking human out of learning applications: A survey on automated machine learning. arXiv preprint arXiv:1810.13306.Google Scholar
- Q. Yao, J. Xu, W.-W. Tu, and Z. Zhu. 2020. Efficient neural architecture search via proximal iterations. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- D. Yoo, H. Fan, V. N. Boddeti, and K. M. Kitani. 2018. Efficient k-shot learning with regularized deep networks. In Proceedings of the AAAI Conference on Artificial Intelligence.Google Scholar
- J. Yoon, T. Kim, O. Dia, S. Kim, Y. Bengio, and S. Ahn. 2018. Bayesian model-agnostic meta-learning. In Advances in Neural Information Processing Systems. MIT Press, 7343--7353.Google Scholar
- M. Yu, X. Guo, J. Yi, S. Chang, S. Potdar, Y. Cheng, G. Tesauro, H. Wang, and B. Zhou. 2018. Diverse few-shot text classification with multiple metrics. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1206--1215.Google Scholar
- C. Zhang, J. Butepage, H. Kjellstrom, and S. Mandt. 2019. Advances in variational inference. IEEE Trans. Pattern Anal. Mach. Intell. 41, 8 (2019), 2008--2026.Google ScholarCross Ref
- R. Zhang, T. Che, Z. Ghahramani, Y. Bengio, and Y. Song. 2018. MetaGAN: An adversarial approach to few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 2371--2380.Google Scholar
- Y. Zhang, H. Tang, and K. Jia. 2018. Fine-grained visual categorization using meta-learning optimization with sample selection of auxiliary data. In Proceedings of the European Conference on Computer Vision. 233--248.Google Scholar
- Y. Zhang and Q. Yang. 2017. A survey on multi-task learning. arXiv preprint arXiv:1707.08114.Google Scholar
- F. Zhao, J. Zhao, S. Yan, and J. Feng. 2018. Dynamic conditional networks for few-shot learning. In Proceedings of the European Conference on Computer Vision.Google Scholar
- Z.-H. Zhou. 2017. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5, 1 (2017), 44--53.Google ScholarCross Ref
- L. Zhu and Y. Yang. 2018. Compound memory networks for few-shot video classification. In Proceedings of the European Conference on Computer Vision. 751--766.Google Scholar
- X. J. Zhu. 2005. Semi-supervised Learning Literature Survey. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.Google Scholar
- B. Zoph and Q. V. Le. 2017. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations.Google Scholar
Index Terms
- Generalizing from a Few Examples: A Survey on Few-shot Learning
Recommendations
A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities
Few-shot learning (FSL) has emerged as an effective learning method and shows great potential. Despite the recent creative works in tackling FSL tasks, learning valid information rapidly from just a few or even zero samples remains a serious challenge. In ...
Towards well-generalizing meta-learning via adversarial task augmentation
AbstractMeta-learning aims to use the knowledge from previous tasks to facilitate the learning of novel tasks. Many meta-learning models elaborately design various task-shared inductive bias, and learn it from a large number of tasks, so the ...
Can we improve meta-learning model in few-shot learning by aligning data distributions?
AbstractMeta-learning becomes a promising way to solve the few-shot learning problem in recent research. This paradigm mainly relies on hierarchical architecture and episodic training to achieve good generalization on the new learning task. ...
Comments