Abstract
Context-sensitive Multiple Task Learning, or csMTL, is presented as a method of inductive transfer which uses a single output neural network and additional contextual inputs for learning multiple tasks. Motivated by problems with the application of MTL networks to machine lifelong learning systems, csMTL encoding of multiple task examples was developed and found to improve predictive performance. As evidence, the csMTL method is tested on seven task domains and shown to produce hypotheses for primary tasks that are often better than standard MTL hypotheses when learning in the presence of related and unrelated tasks. We argue that the reason for this performance improvement is a reduction in the number of effective free parameters in the csMTL network brought about by the shared output node and weight update constraints due to the context inputs. An examination of IDT and SVM models developed from csMTL encoded data provides initial evidence that this improvement is not shared across all machine learning models.
Article PDF
Similar content being viewed by others
References
Abu-Mostafa, Y. S. (1995). Hints. Neural Computation, 7, 639–671.
Allenby, G. M., & Rossi, P. E. (1999). Marketing models of consumer heterogeneity. Journal of Econometrics, 89, 57–78.
Allenby, G. M., & Rossi, P. E. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.
Arora, N., Allenby, G. M., & Ginter, J. (1998). A hierarchical Bayes model of primary and secondary demand. Marketing Science, 17(1), 29–44.
Bakker, B., & Heskes, T. (2003). Task clustering and gating for Bayesian multi-task learning. Journal of Machine Learning Research, 4, 83–99.
Baxter, J. (1996). Learning model bias. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 169–175). Cambridge: The MIT Press.
Baxter, J. (1997). Theoretical models of learning to learn. Learning to Learn, 71–94.
Ben-David, S., & Schuller, R. (2003). Exploiting task relatedness for multiple task learning. In Proceedings of computational learning theory (COLT) (pp. 185–192).
Boser, B. E., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Computational learning theory (pp. 144–152).
Breiman, L., & Friedman, J. H. (1998). Predicting multivariate responses in multiple linear regression. Royal Statistical Society Series B, 1, 3–54.
Caruana, R. A. (1997). Multitask learning. Machine Learning, 28, 41–75.
Chang, C., & Lin, C. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Greene, W. (2002). Econometric analysis (5th ed.). Englewood Cliffs: Prentice-Hall.
Gross, H., Stephan, V., & Krabbes, M. (1998). A neural field approach to topological reinforcement learning in continuous action spaces. In Procedings of the international joint conference on neural networks (IJCNN’98) (pp. 1992–1997). Anchorage, IEEE Press.
Heskes, T. (2000). Empirical Bayes for learning to learn. In P. Langley (Ed.), Proceedings of the international conference on machine learning (ICML’00) (pp. 367–374).
Jebara, T. (2004). Multi-task feature and kernel selection for svms. In Proceedings of the international conference on machine learning (ICML’04) (pp. 185–192).
Matwin, S., & Kubat, M. (1996). The role of context in concept learning. In Proceedings of ICML-96, workshop on learning in context-sensitive domains (pp. 1–5). Bari, Italy.
O’Quinn, R., Silver, D. L., & Poirier, R. (2005). Continued practice and consolidation of a learning task. In Proceedings of the meta-learning workshop, 22nd international conference on machine learning (ICML 2005). Bonn, Germany.
Quinlan, R. J. (1993). C4.5: programs for machine learning. Los Altos: Morgan Kaufmann.
Santamaria, J., Sutton, R., & Ram, A. (1998). Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior, 6, 163–218.
Silver, D. L., & McCracken, P. (2003). Selective transfer of task knowledge using stochastic noise. In Y. Xiang & B. Chaib-draa (Eds.), Advances in artificial intelligence, 16th conference of the Canadian society for computational studies of intelligence (AI’2003) (pp. 190–205). New York.
Silver, D. L., & Mercer, R. E. (1996). The parallel transfer of task knowledge using dynamic learning rates based on a measure of relatedness. Connection Science Special Issue: Transfer in Inductive Systems, 8(2), 277–294.
Silver, D. L., & Mercer, R. E. (2002). The task rehearsal method of life-long learning: overcoming impoverished data. In Advances in artificial intelligence, 15th conference of the Canadian society for computational studies of intelligence (AI’2002) (pp. 90–101).
Silver, D. L., & Poirier, R. (2004). Sequential consolidation of learned task knowledge. In Lecture notes in artificial intelligence, 17th conference of the Canadian society for computational studies of intelligence (AI’2004) (pp. 217–232).
Silver, D. L., & Poirier, R. (2005). Requirements for machine lifelong learning (Jodrey School of Computer Science, TR-2005-009). November.
Smola, A. J., & Schoelkopf, B. (1998). A tutorial on support vector regression (Technical Report NC2-TR-1998-030). NeuroCOLT2.
Thrun, S. (1996). Is learning the nth thing any easier than learning the first?. Advances in Neural Information Processing Systems, 8, 8.
Thrun, S., & Pratt, L. Y. (Eds.) (1997). Learning to learn. Boston: Kluwer Academic.
Turney, P. D. (1996a). The identification of context-sensitive features: A formal definition of context for concept learning. In 13th international conference on machine learning (ICML96), workshop on learning in context-sensitive domains (Vol. NRC 39222, pp. 53–59). Bari, Italy.
Turney, P. D. (1996b). The management of context-sensitive features: A review of strategies. In 13th international conference on machine learning (ICML96), workshop on learning in context-sensitive domains (Vol. NRC 39222, pp. 60–65). Bari, Italy.
Utgoff, P. E. (1986). Machine learning of inductive bias. Boston: Kluwer Academic.
Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.
Zellner, A. (1962). An efficient method for estimating seemingly unrelated regression equations and tests for aggregation bias. Journal of the American Statistical Association, 57, 348–368.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Risto Miikkulainen
Rights and permissions
About this article
Cite this article
Silver, D.L., Poirier, R. & Currie, D. Inductive transfer with context-sensitive neural networks. Mach Learn 73, 313–336 (2008). https://doi.org/10.1007/s10994-008-5088-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5088-0