Skip to main content
Top

2019 | OriginalPaper | Chapter

2. Machine Learning Overview

Authors : Jiawei Zhang, Philip S. Yu

Published in: Broad Learning Through Fusions

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Learning denotes the process of acquiring new declarative knowledge, the organization of new knowledge into general yet effective representations, and the discovery of new facts and theories through observation and experimentation. Learning is one of the most important skills that mankind can master, which also renders us different from the other animals on this planet. To provide an example, according to our past experiences, we know the sun rises from the east and falls to the west; the moon rotates around the earth; 1 year has 365 days, which are all knowledge we derive from our past life experiences.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Machine learning models usually denote the well trained learning algorithms by some training data. In the sequel of this book, we will not differentiate the differences between machine learning models and machine learning algorithms by default.
 
Literature
1.
go back to reference Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRef Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRef
2.
go back to reference T. Bengtsson, P. Bickel, B. Li, Curse-of-dimensionality revisited: collapse of the particle filter in very large scale systems, in Probability and Statistics: Essays in Honor of David A. Freedman, vol. 2 (2008), pp. 316–334MATH T. Bengtsson, P. Bickel, B. Li, Curse-of-dimensionality revisited: collapse of the particle filter in very large scale systems, in Probability and Statistics: Essays in Honor of David A. Freedman, vol. 2 (2008), pp. 316–334MATH
3.
go back to reference S. Berchtold, C. Bohm, H. Kriegel, The pyramid-technique: towards breaking the curse of dimensionality, in Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD ’02), vol. 27, pp. 142–153 (1998)CrossRef S. Berchtold, C. Bohm, H. Kriegel, The pyramid-technique: towards breaking the curse of dimensionality, in Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD ’02), vol. 27, pp. 142–153 (1998)CrossRef
4.
go back to reference J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (Kluwer Academic Publishers, Norwell, 1981)MATHCrossRef J. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms (Kluwer Academic Publishers, Norwell, 1981)MATHCrossRef
5.
go back to reference L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees (Wadsworth and Brooks, Monterey, 1984) L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression Trees (Wadsworth and Brooks, Monterey, 1984)
6.
go back to reference C. Brodley, P. Utgoff, Multivariate decision trees. Mach. Learn. 19(1), 45–77 (1995)MATH C. Brodley, P. Utgoff, Multivariate decision trees. Mach. Learn. 19(1), 45–77 (1995)MATH
7.
go back to reference O. Chapelle, B. Schlkopf, A. Zien, Semi-supervised Learning, 1st edn. (MIT Press, Cambridge, 2010) O. Chapelle, B. Schlkopf, A. Zien, Semi-supervised Learning, 1st edn. (MIT Press, Cambridge, 2010)
8.
go back to reference J. Chung, Ç. Gülçehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555 (2014) J. Chung, Ç. Gülçehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555 (2014)
9.
go back to reference C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH
10.
go back to reference J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetMATH J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetMATH
12.
go back to reference M. Ester, H. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (AAAI Press, Menlo Park, 1996) M. Ester, H. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (AAAI Press, Menlo Park, 1996)
13.
go back to reference S. Fahlman, C. Lebiere, The cascade-correlation learning architecture, in Advances in Neural Information Processing Systems 2 (Morgan-Kaufmann, Burlington, 1990) S. Fahlman, C. Lebiere, The cascade-correlation learning architecture, in Advances in Neural Information Processing Systems 2 (Morgan-Kaufmann, Burlington, 1990)
15.
go back to reference I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–11182 (2003)MATH I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–11182 (2003)MATH
16.
go back to reference J. Hartigan, M. Wong, A k-means clustering algorithm. JSTOR Appl. Stat. 28(1), 100–108 (1979)MATHCrossRef J. Hartigan, M. Wong, A k-means clustering algorithm. JSTOR Appl. Stat. 28(1), 100–108 (1979)MATHCrossRef
18.
go back to reference S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd edn. (Prentice Hall PTR, Upper Saddle River, 1998) S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd edn. (Prentice Hall PTR, Upper Saddle River, 1998)
19.
go back to reference S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
20.
go back to reference A. Hoerl, R. Kennard, Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42(1), 80–86 (2000)MATHCrossRef A. Hoerl, R. Kennard, Ridge regression: biased estimation for nonorthogonal problems. Technometrics 42(1), 80–86 (2000)MATHCrossRef
21.
go back to reference Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)CrossRef Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2(3), 283–304 (1998)CrossRef
22.
go back to reference T. Joachims, Text categorization with support vector machines: learning with many relevant features, in European Conference on Machine Learning (Springer, Berlin, 1998) T. Joachims, Text categorization with support vector machines: learning with many relevant features, in European Conference on Machine Learning (Springer, Berlin, 1998)
23.
go back to reference L. Kaufmann, P. Rousseeuw, Clustering by Means of Medoids (North Holland/Elsevier, Amsterdam, 1987) L. Kaufmann, P. Rousseeuw, Clustering by Means of Medoids (North Holland/Elsevier, Amsterdam, 1987)
24.
go back to reference Y. Kim, Convolutional neural networks for sentence classification, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, Doha, 2014) Y. Kim, Convolutional neural networks for sentence classification, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, Doha, 2014)
25.
go back to reference R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in International Joint Conference on Artificial Intelligence (IJCA) (Morgan Kaufmann Publishers Inc., San Francisco, 1995) R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in International Joint Conference on Artificial Intelligence (IJCA) (Morgan Kaufmann Publishers Inc., San Francisco, 1995)
26.
go back to reference A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, in Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12) (Curran Associates Inc., Red Hook, 2012) A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, in Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12) (Curran Associates Inc., Red Hook, 2012)
27.
go back to reference Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, in Proceedings of the IEEE (IEEE, Piscataway, 1998) Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, in Proceedings of the IEEE (IEEE, Piscataway, 1998)
28.
go back to reference J. Liu, S. Ji, J. Ye, SLEP: sparse learning with efficient projections. Technical report (2010) J. Liu, S. Ji, J. Ye, SLEP: sparse learning with efficient projections. Technical report (2010)
29.
30.
go back to reference M. Minsky, S. Papert, Perceptrons: Expanded Edition (MIT Press, Cambridge, 1988)MATH M. Minsky, S. Papert, Perceptrons: Expanded Edition (MIT Press, Cambridge, 1988)MATH
31.
go back to reference S. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRef S. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRef
32.
go back to reference N. Parikh, S. Boyd, Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2014) N. Parikh, S. Boyd, Proximal algorithms. Found. Trends Optim. 1(3), 123–231 (2014)
33.
go back to reference R. Pascanu, C. Gulcehre, K. Cho, Y. Bengio, How to construct deep recurrent neural networks. CoRR, abs/1312.6026 (2013) R. Pascanu, C. Gulcehre, K. Cho, Y. Bengio, How to construct deep recurrent neural networks. CoRR, abs/1312.6026 (2013)
34.
go back to reference R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training recurrent neural networks, in Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML’13) (2013) R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training recurrent neural networks, in Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML’13) (2013)
35.
go back to reference D. Pelleg, A. Moore, X-means: extending k-means with efficient estimation of the number of clusters, in Proceedings of the 17th International Conference on Machine Learning, Stanford (2000) D. Pelleg, A. Moore, X-means: extending k-means with efficient estimation of the number of clusters, in Proceedings of the 17th International Conference on Machine Learning, Stanford (2000)
36.
go back to reference J. Platt, Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report. Adv. Kernel Methods Support Vector Learning 208 (1998) J. Platt, Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report. Adv. Kernel Methods Support Vector Learning 208 (1998)
37.
go back to reference J. Quinlan, Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986) J. Quinlan, Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
38.
go back to reference J. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann Publishers Inc., San Francisco, 1993) J. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann Publishers Inc., San Francisco, 1993)
39.
go back to reference L. Raileanu, K. Stoffel. Theoretical comparison between the Gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)MathSciNetMATHCrossRef L. Raileanu, K. Stoffel. Theoretical comparison between the Gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)MathSciNetMATHCrossRef
40.
go back to reference C. Rasmussen, The infinite Gaussian mixture model, in Advances in Neural Information Processing Systems 12 (MIT Press, Cambridge, 2000) C. Rasmussen, The infinite Gaussian mixture model, in Advances in Neural Information Processing Systems 12 (MIT Press, Cambridge, 2000)
41.
go back to reference J. Rawlings, S. Pantula, D. Dickey, Applied Regression Analysis, 2nd edn. (Springer, Berlin, 1998)MATHCrossRef J. Rawlings, S. Pantula, D. Dickey, Applied Regression Analysis, 2nd edn. (Springer, Berlin, 1998)MATHCrossRef
42.
go back to reference F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386 (1958)CrossRef F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386 (1958)CrossRef
43.
go back to reference D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition (MIT Press, Cambridge, 1986) D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation, in Parallel Distributed Processing: Explorations in the Microstructure of Cognition (MIT Press, Cambridge, 1986)
44.
go back to reference D. Rumelhart, G. Hinton, R. Williams, Learning representations by back-propagating errors, in Neurocomputing: Foundations of Research (MIT Press, Cambridge, 1988) D. Rumelhart, G. Hinton, R. Williams, Learning representations by back-propagating errors, in Neurocomputing: Foundations of Research (MIT Press, Cambridge, 1988)
45.
go back to reference D. Rumelhart, R. Durbin, R. Golden, Y. Chauvin, Backpropagation: the basic theory, in Developments in Connectionist Theory. Backpropagation: Theory, Architectures, and Applications (Lawrence Erlbaum Associates, Inc., Hillsdale, 1995) D. Rumelhart, R. Durbin, R. Golden, Y. Chauvin, Backpropagation: the basic theory, in Developments in Connectionist Theory. Backpropagation: Theory, Architectures, and Applications (Lawrence Erlbaum Associates, Inc., Hillsdale, 1995)
46.
go back to reference R. Salakhutdinov, G. Hinton, Deep Boltzmann machines, in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (2009) R. Salakhutdinov, G. Hinton, Deep Boltzmann machines, in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (2009)
47.
48.
go back to reference D. Svozil, V. Kvasnicka, J. Pospichal, Introduction to multi-layer feed-forward neural networks. Chemom. Intell. Lab. Syst. 39(1), 43–62 (1997)CrossRef D. Svozil, V. Kvasnicka, J. Pospichal, Introduction to multi-layer feed-forward neural networks. Chemom. Intell. Lab. Syst. 39(1), 43–62 (1997)CrossRef
49.
go back to reference P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (First Edition) (Addison-Wesley Longman Publishing Co., Inc., Boston, 2005) P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (First Edition) (Addison-Wesley Longman Publishing Co., Inc., Boston, 2005)
50.
go back to reference R. Tibshirani, The lasso method for variable selection in the cox model. Stat. Med. 16, 385–395 (1997)CrossRef R. Tibshirani, The lasso method for variable selection in the cox model. Stat. Med. 16, 385–395 (1997)CrossRef
51.
go back to reference L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009) L. Van Der Maaten, E. Postma, J. Van den Herik, Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2009)
52.
go back to reference M. Verleysen, D. François, The curse of dimensionality in data mining and time series prediction, in Computational Intelligence and Bioinspired Systems. International Work-Conference on Artificial Neural Networks (Springer, Berlin, 2005)CrossRef M. Verleysen, D. François, The curse of dimensionality in data mining and time series prediction, in Computational Intelligence and Bioinspired Systems. International Work-Conference on Artificial Neural Networks (Springer, Berlin, 2005)CrossRef
53.
go back to reference P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetMATH
54.
go back to reference P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, Cambridge, 1974 P. J. Werbos, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, Cambridge, 1974
55.
go back to reference X. Yan, X. Su, Linear Regression Analysis: Theory and Computing (World Scientific Publishing Co., Inc., River Edge, 2009)MATHCrossRef X. Yan, X. Su, Linear Regression Analysis: Theory and Computing (World Scientific Publishing Co., Inc., River Edge, 2009)MATHCrossRef
56.
go back to reference J. Zhang, L. Cui, Y. Fu, F. Gouza, Fake news detection with deep diffusive network model. CoRR, abs/1805.08751 (2018) J. Zhang, L. Cui, Y. Fu, F. Gouza, Fake news detection with deep diffusive network model. CoRR, abs/1805.08751 (2018)
57.
go back to reference X. Zhu, Semi-supervised learning literature survey. Comput. Sci. 2(3), 4 (2006) X. Zhu, Semi-supervised learning literature survey. Comput. Sci. 2(3), 4 (2006)
Metadata
Title
Machine Learning Overview
Authors
Jiawei Zhang
Philip S. Yu
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-12528-8_2

Premium Partner