Skip to main content
Top

2019 | OriginalPaper | Chapter

10. Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning

Authors : Uday Kamath, John Liu, James Whitaker

Published in: Deep Learning for NLP and Speech Recognition

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Most supervised machine learning techniques, such as classification, rely on some underlying assumptions, such as: (a) the data distributions during training and prediction time are similar; (b) the label space during training and prediction time are similar; and (c) the feature space between the training and prediction time remains the same. In many real-world scenarios, these assumptions do not hold due to the changing nature of the data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
[AS17]
go back to reference Isabelle Augenstein and Anders Søgaard. “Multi-Task Learning of Keyphrase Boundary Classification”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 341–346. Isabelle Augenstein and Anders Søgaard. “Multi-Task Learning of Keyphrase Boundary Classification”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 341–346.
[BMA17]
go back to reference Georgios Balikas, Simon Moura, and Massih-Reza Amini. “Multitask Learning for Fine-Grained Twitter Sentiment Analysis”. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017, pp. 1005–1008. Georgios Balikas, Simon Moura, and Massih-Reza Amini. “Multitask Learning for Fine-Grained Twitter Sentiment Analysis”. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017, pp. 1005–1008.
[BBS07]
go back to reference Steffen Bickel, Michael Brückner, and Tobias Scheffer. “Discriminative Learning for Differing Training and Test Distributions”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 81–88. Steffen Bickel, Michael Brückner, and Tobias Scheffer. “Discriminative Learning for Differing Training and Test Distributions”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 81–88.
[Car93]
go back to reference Richard Caruana. “Multitask Learning: A Knowledge-Based Source of Inductive Bias”. In: Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, 1993, pp. 41–48. Richard Caruana. “Multitask Learning: A Knowledge-Based Source of Inductive Bias”. In: Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, 1993, pp. 41–48.
[Cho+17]
go back to reference Eunsol Choi et al. “Coarse-to-Fine Question Answering for Long Documents”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 209–220. Eunsol Choi et al. “Coarse-to-Fine Question Answering for Long Documents”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 209–220.
[CW08]
go back to reference Ronan Collobert and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multi-task Learning”. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08. 2008, pp. 160–167. Ronan Collobert and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multi-task Learning”. In: Proceedings of the 25th International Conference on Machine Learning. ICML ’08. 2008, pp. 160–167.
[Dah+12]
go back to reference George E. Dahl et al. “Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition”. In: IEEE Trans. Audio, Speech & Language Processing 20.1 (2012), pp. 30–42. George E. Dahl et al. “Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition”. In: IEEE Trans. Audio, Speech & Language Processing 20.1 (2012), pp. 30–42.
[DL15]
go back to reference Andrew M Dai and Quoc V Le. “Semi-supervised Sequence Learning”. In: Advances in Neural Information Processing Systems 28. Ed. by C. Cortes et al. 2015, pp. 3079–3087. Andrew M Dai and Quoc V Le. “Semi-supervised Sequence Learning”. In: Advances in Neural Information Processing Systems 28. Ed. by C. Cortes et al. 2015, pp. 3079–3087.
[Dai+07a]
go back to reference Wenyuan Dai et al. “Boosting for Transfer Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 193–200. Wenyuan Dai et al. “Boosting for Transfer Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 193–200.
[Dai+07b]
go back to reference Wenyuan Dai et al. “Transferring Naive Bayes Classifiers for Text Classification”. In: Proceedings of the 22nd National Conference on Artificial Intelligence - Volume 1. AAAI’07. 2007, pp. 540–545. Wenyuan Dai et al. “Transferring Naive Bayes Classifiers for Text Classification”. In: Proceedings of the 22nd National Conference on Artificial Intelligence - Volume 1. AAAI’07. 2007, pp. 540–545.
[DM06]
go back to reference Hal Daumé III and Daniel Marcu. “Domain Adaptation for Statistical Classifiers”. In: J. Artif. Int. Res. 26.1 (May 2006), pp. 101–126. Hal Daumé III and Daniel Marcu. “Domain Adaptation for Statistical Classifiers”. In: J. Artif. Int. Res. 26.1 (May 2006), pp. 101–126.
[Die+16]
go back to reference Adji B. Dieng et al. “TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency.” In: CoRR abs/1611.01702 (2016). Adji B. Dieng et al. “TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency.” In: CoRR abs/1611.01702 (2016).
[Don+15]
go back to reference Daxiang Dong et al. “Multi-Task Learning for Multiple Language Translation.” In: ACL (1). 2015, pp. 1723–1732. Daxiang Dong et al. “Multi-Task Learning for Multiple Language Translation.” In: ACL (1). 2015, pp. 1723–1732.
[Duo+15]
go back to reference Long Duong et al. “Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser”. In: Proceedings of the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015, pp. 845–850. Long Duong et al. “Low Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser”. In: Proceedings of the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015, pp. 845–850.
[Erh+10]
go back to reference Dumitru Erhan et al. “Why Does Unsupervised Pre-training Help Deep Learning?” In: J. Mach. Learn. Res. 11 (Mar. 2010). Dumitru Erhan et al. “Why Does Unsupervised Pre-training Help Deep Learning?” In: J. Mach. Learn. Res. 11 (Mar. 2010).
[FC17]
go back to reference Meng Fang and Trevor Cohn. “Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary”. In: CoRR abs/1705.00424 (2017). Meng Fang and Trevor Cohn. “Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary”. In: CoRR abs/1705.00424 (2017).
[Fun+06]
go back to reference Gabriel Pui Cheong Fung et al. “Text Classification Without Negative Examples Revisit”. In: IEEE Trans. on Knowl. and Data Eng. 18.1 (Jan. 2006), pp. 6–20. Gabriel Pui Cheong Fung et al. “Text Classification Without Negative Examples Revisit”. In: IEEE Trans. on Knowl. and Data Eng. 18.1 (Jan. 2006), pp. 6–20.
[Has+16]
go back to reference Kazuma Hashimoto et al. “A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks”. In: CoRR abs/1611.01587 (2016). Kazuma Hashimoto et al. “A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks”. In: CoRR abs/1611.01587 (2016).
[Hin+12]
go back to reference Geoffrey Hinton et al. “Deep Neural Networks for Acoustic Modeling in Speech Recognition”. In: Signal Processing Magazine (2012). Geoffrey Hinton et al. “Deep Neural Networks for Acoustic Modeling in Speech Recognition”. In: Signal Processing Magazine (2012).
[Iso+17]
go back to reference Masaru Isonuma et al. “Extractive Summarization Using Multi-Task Learning with Document Classification”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017. 2017, pp. 2101–2110. Masaru Isonuma et al. “Extractive Summarization Using Multi-Task Learning with Document Classification”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017. 2017, pp. 2101–2110.
[Jia09]
go back to reference Jing Jiang. “Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction”. In: ACL 2009, Proceedings of the 4th International Joint Conference on Natural Language Processing of the AFNL. 2009, pp. 1012–1020. Jing Jiang. “Multi-Task Transfer Learning for Weakly-Supervised Relation Extraction”. In: ACL 2009, Proceedings of the 4th International Joint Conference on Natural Language Processing of the AFNL. 2009, pp. 1012–1020.
[JZ07]
go back to reference Jing Jiang and Chengxiang Zhai. “Instance weighting for domain adaptation in NLP”. In: In ACL 2007. 2007, pp. 264–271. Jing Jiang and Chengxiang Zhai. “Instance weighting for domain adaptation in NLP”. In: In ACL 2007. 2007, pp. 264–271.
[Joh+16]
go back to reference Melvin Johnson et al. “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”. In: CoRR abs/1611.04558 (2016). Melvin Johnson et al. “Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”. In: CoRR abs/1611.04558 (2016).
[KC17]
go back to reference Arzoo Katiyar and Claire Cardie. “Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 917–928. Arzoo Katiyar and Claire Cardie. “Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees”. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017, pp. 917–928.
[Lee+09]
go back to reference Honglak Lee et al. “Unsupervised feature learning for audio classification using convolutional deep belief networks”. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems. 2009, pp. 1096–1104. Honglak Lee et al. “Unsupervised feature learning for audio classification using convolutional deep belief networks”. In: Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems. 2009, pp. 1096–1104.
[Liu+15]
go back to reference Xiaodong Liu et al. “Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval”. In: NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics. Xiaodong Liu et al. “Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval”. In: NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics.
[LW15]
go back to reference Mingsheng Long and Jianmin Wang. “Learning Multiple Tasks with Deep Relationship Networks”. In: CoRR abs/1506.02117 (2015). Mingsheng Long and Jianmin Wang. “Learning Multiple Tasks with Deep Relationship Networks”. In: CoRR abs/1506.02117 (2015).
[Lu+16]
go back to reference Yongxi Lu et al. “Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification”. In: CoRR abs/1611.05377 (2016). Yongxi Lu et al. “Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification”. In: CoRR abs/1611.05377 (2016).
[Luo+17]
go back to reference Bingfeng Luo et al. “Learning to Predict Charges for Criminal Cases with Legal Basis”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, 2017, pp. 2727–2736. Bingfeng Luo et al. “Learning to Predict Charges for Criminal Cases with Legal Basis”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, 2017, pp. 2727–2736.
[Luo+15]
go back to reference Minh-Thang Luong et al. “Multi-task Sequence to Sequence Learning”. In: CoRR abs/1511.06114 (2015). Minh-Thang Luong et al. “Multi-task Sequence to Sequence Learning”. In: CoRR abs/1511.06114 (2015).
[Mis+16]
go back to reference Ishan Misra et al. “Cross-stitch Networks for Multi-task Learning”. In: CoRR abs/1604.03539 (2016). Ishan Misra et al. “Cross-stitch Networks for Multi-task Learning”. In: CoRR abs/1604.03539 (2016).
[NC17]
go back to reference Jan Niehues and Eunah Cho. “Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning”. In: Proceedings of the Second Conference on Machine Translation. Association for Computational Linguistics, 2017, pp. 80–89. Jan Niehues and Eunah Cho. “Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning”. In: Proceedings of the Second Conference on Machine Translation. Association for Computational Linguistics, 2017, pp. 80–89.
[PY10]
go back to reference Sinno Jialin Pan and Qiang Yang. “A Survey on Transfer Learning”. In: IEEE Trans. on Knowl. and Data Eng. 22.10 (Oct. 2010), pp. 1345–1359. Sinno Jialin Pan and Qiang Yang. “A Survey on Transfer Learning”. In: IEEE Trans. on Knowl. and Data Eng. 22.10 (Oct. 2010), pp. 1345–1359.
[Pan+08]
go back to reference Sinno Jialin Pan et al. “Transfer Learning for WiFi-based Indoor Localization”. In: 2008. Sinno Jialin Pan et al. “Transfer Learning for WiFi-based Indoor Localization”. In: 2008.
[Rai+07]
go back to reference Rajat Raina et al. “Self-taught Learning: Transfer Learning from Unlabeled Data”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 759–766. Rajat Raina et al. “Self-taught Learning: Transfer Learning from Unlabeled Data”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 759–766.
[RLL17]
go back to reference Prajit Ramachandran, Peter J. Liu, and Quoc V. Le. “Unsupervised Pretraining for Sequence to Sequence Learning”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9–11, 2017. 2017, pp. 383–391. Prajit Ramachandran, Peter J. Liu, and Quoc V. Le. “Unsupervised Pretraining for Sequence to Sequence Learning”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9–11, 2017. 2017, pp. 383–391.
[Rei17]
go back to reference Marek Rei. “Semi-supervised Multitask Learning for Sequence Labeling”. In: CoRR abs/1704.07156 (2017). Marek Rei. “Semi-supervised Multitask Learning for Sequence Labeling”. In: CoRR abs/1704.07156 (2017).
[Rud17]
go back to reference Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017). Sebastian Ruder. “An Overview of Multi-Task Learning in Deep Neural Networks”. In: CoRR abs/1706.05098 (2017).
[SG16]
go back to reference Anders Søgaard and Yoav Goldberg. “Deep multi-task learning with low level tasks supervised at lower layers”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2: Short Papers. 2016. Anders Søgaard and Yoav Goldberg. “Deep multi-task learning with low level tasks supervised at lower layers”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7–12, 2016, Berlin, Germany, Volume 2: Short Papers. 2016.
[TS07]
go back to reference Matthew E. Taylor and Peter Stone. “Cross-domain Transfer for Reinforcement Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 879–886. Matthew E. Taylor and Peter Stone. “Cross-domain Transfer for Reinforcement Learning”. In: Proceedings of the 24th International Conference on Machine Learning. ICML ’07. 2007, pp. 879–886.
[Dar05]
go back to reference “Transfer Learning Proposer Information Pamphlet (PIP) for Broad Agency Announcement”. In: Defense Advanced Research Projects Agency (DARPA), 2005. “Transfer Learning Proposer Information Pamphlet (PIP) for Broad Agency Announcement”. In: Defense Advanced Research Projects Agency (DARPA), 2005.
[TRB10]
go back to reference Joseph Turian, Lev Ratinov, and Yoshua Bengio. “Word Representations: A Simple and General Method for Semi-supervised Learning”. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. ACL ’10. 2010. Joseph Turian, Lev Ratinov, and Yoshua Bengio. “Word Representations: A Simple and General Method for Semi-supervised Learning”. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. ACL ’10. 2010.
[Wan+18]
go back to reference Shuohang Wang et al. “R3: Reinforced Ranker-Reader for Open-Domain Question Answering”. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 2018. Shuohang Wang et al. “R3: Reinforced Ranker-Reader for Open-Domain Question Answering”. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
[WSZ08]
go back to reference Zheng Wang, Yangqiu Song, and Changshui Zhang. “Transferred Dimensionality Reduction”. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Part II. ECML PKDD ’08. 2008, pp. 550–565. Zheng Wang, Yangqiu Song, and Changshui Zhang. “Transferred Dimensionality Reduction”. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases - Part II. ECML PKDD ’08. 2008, pp. 550–565.
[WHH17]
go back to reference Shinji Watanabe, Takaaki Hori, and John R. Hershey. “Language independent end-to-end architecture for joint language identification and speech recognition”. In: ASRU. IEEE, 2017, pp. 265–271. Shinji Watanabe, Takaaki Hori, and John R. Hershey. “Language independent end-to-end architecture for joint language identification and speech recognition”. In: ASRU. IEEE, 2017, pp. 265–271.
[Wat+17]
go back to reference Shinji Watanabe et al. “Hybrid CTC/Attention Architecture for End-to-End Speech Recognition”. In: J. Sel. Topics Signal Processing 11.8 (2017), pp. 1240–1253. Shinji Watanabe et al. “Hybrid CTC/Attention Architecture for End-to-End Speech Recognition”. In: J. Sel. Topics Signal Processing 11.8 (2017), pp. 1240–1253.
[Wat+18]
go back to reference Shinji Watanabe et al. “A Purely End-to-End System for Multi-speaker Speech Recognition”. In: ACL (1). Association for Computational Linguistics, 2018, pp. 2620–2630. Shinji Watanabe et al. “A Purely End-to-End System for Multi-speaker Speech Recognition”. In: ACL (1). Association for Computational Linguistics, 2018, pp. 2620–2630.
[YM17]
go back to reference Bishan Yang and Tom M. Mitchell. “A Joint Sequential and Relational Model for Frame-Semantic Parsing”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing EMNLP 2017. 2017, pp. 1247–1256. Bishan Yang and Tom M. Mitchell. “A Joint Sequential and Relational Model for Frame-Semantic Parsing”. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing EMNLP 2017. 2017, pp. 1247–1256.
[YSC16]
go back to reference Zhilin Yang, Ruslan Salakhutdinov, and William W. Cohen. “Multi-Task Cross-Lingual Sequence Tagging from Scratch”. In: CoRR abs/1603.06270 (2016). Zhilin Yang, Ruslan Salakhutdinov, and William W. Cohen. “Multi-Task Cross-Lingual Sequence Tagging from Scratch”. In: CoRR abs/1603.06270 (2016).
[ZK16]
go back to reference Barret Zoph and Kevin Knight. “Multi-Source Neural Translation”. In: CoRR abs/1601.00710 (2016). Barret Zoph and Kevin Knight. “Multi-Source Neural Translation”. In: CoRR abs/1601.00710 (2016).
Metadata
Title
Transfer Learning: Scenarios, Self-Taught Learning, and Multitask Learning
Authors
Uday Kamath
John Liu
James Whitaker
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-14596-5_10

Premium Partner