Skip to main content

2019 | OriginalPaper | Buchkapitel

2. Fundamentals of Machine Learning

verfasst von : Ke-Lin Du, M. N. S. Swamy

Erschienen in: Neural Networks and Statistical Learning

Verlag: Springer London

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter deals with the fundamental concepts and theories of machine learning. It first introduces various learning and inference methods, followed by learning and generalization, model selection, and neural networks as universal machines. Some other important topics are also covered.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Akaike, H. (1969). Fitting autoregressive models for prediction. Annals of the Institute of Statistical Mathematics, 21, 425–439.MathSciNetMATH Akaike, H. (1969). Fitting autoregressive models for prediction. Annals of the Institute of Statistical Mathematics, 21, 425–439.MathSciNetMATH
2.
3.
Zurück zum Zitat Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.MathSciNetMATHCrossRef Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.MathSciNetMATHCrossRef
4.
Zurück zum Zitat Amari, S., Murata, N., Muller, K. R., Finke, M., & Yang, H. (1996). Statistical theory of overtraining: Is cross-validation asymptotically effective? In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 176–182). Cambridge, MA: MIT Press. Amari, S., Murata, N., Muller, K. R., Finke, M., & Yang, H. (1996). Statistical theory of overtraining: Is cross-validation asymptotically effective? In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 176–182). Cambridge, MA: MIT Press.
5.
Zurück zum Zitat Arlot, S., & Lerasle, M. (2016). Choice of V for V-fold cross-validation in least-squares density estimation. Journal of Machine Learning Research, 17, 1–50.MathSciNetMATH Arlot, S., & Lerasle, M. (2016). Choice of V for V-fold cross-validation in least-squares density estimation. Journal of Machine Learning Research, 17, 1–50.MathSciNetMATH
6.
Zurück zum Zitat Auer, P., Herbster, M., & Warmuth, M. K. (1996). Exponentially many local minima for single neurons. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 316–322). Cambridge, MA: MIT Press. Auer, P., Herbster, M., & Warmuth, M. K. (1996). Exponentially many local minima for single neurons. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 316–322). Cambridge, MA: MIT Press.
7.
Zurück zum Zitat Baldi, P., & Sadowski, P. (2013). Understanding dropout. In Advances in neural information processing systems (Vol. 27, pp. 2814–2822). Baldi, P., & Sadowski, P. (2013). Understanding dropout. In Advances in neural information processing systems (Vol. 27, pp. 2814–2822).
8.
Zurück zum Zitat Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930–945.MathSciNetMATHCrossRef Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930–945.MathSciNetMATHCrossRef
9.
Zurück zum Zitat Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834–846.CrossRef Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834–846.CrossRef
10.
Zurück zum Zitat Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525–536.MathSciNetMATHCrossRef Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525–536.MathSciNetMATHCrossRef
11.
Zurück zum Zitat Baum, E. B., & Wilczek, F. (1988). Supervised learning of probability distributions by neural networks. In D. Z. Anderson (Ed.), Neural information processing systems (pp. 52–61). New York: American Institute Physics. Baum, E. B., & Wilczek, F. (1988). Supervised learning of probability distributions by neural networks. In D. Z. Anderson (Ed.), Neural information processing systems (pp. 52–61). New York: American Institute Physics.
12.
Zurück zum Zitat Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances neural information processing systems (Vol. 14, pp. 585–591). Cambridge, MA: MIT Press. Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances neural information processing systems (Vol. 14, pp. 585–591). Cambridge, MA: MIT Press.
13.
Zurück zum Zitat Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.MATHCrossRef Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396.MATHCrossRef
14.
Zurück zum Zitat Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.MathSciNetMATH Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.MathSciNetMATH
15.
Zurück zum Zitat Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of \(K\)-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.MathSciNetMATH Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of \(K\)-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.MathSciNetMATH
16.
Zurück zum Zitat Bernier, J. L., Ortega, J., Ros, E., Rojas, I., & Prieto, A. (2000). A quantitative study of fault tolerance, noise immunity, and generalization ability of MLPs. Neural Computation, 12, 2941–2964.CrossRef Bernier, J. L., Ortega, J., Ros, E., Rojas, I., & Prieto, A. (2000). A quantitative study of fault tolerance, noise immunity, and generalization ability of MLPs. Neural Computation, 12, 2941–2964.CrossRef
17.
Zurück zum Zitat Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford Press.MATH Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford Press.MATH
18.
Zurück zum Zitat Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1), 108–116.CrossRef Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1), 108–116.CrossRef
19.
Zurück zum Zitat Blum, A. L., & Rivest, R. L. (1992). Training a 3-node neural network is NP-complete. Neural Networks, 5(1), 117–127.CrossRef Blum, A. L., & Rivest, R. L. (1992). Training a 3-node neural network is NP-complete. Neural Networks, 5(1), 117–127.CrossRef
20.
Zurück zum Zitat Bousquet, O., & Elisseeff, A. (2002). Stability and Generalization. Journal of Machine Learning Research, 2, 499–526.MathSciNetMATH Bousquet, O., & Elisseeff, A. (2002). Stability and Generalization. Journal of Machine Learning Research, 2, 499–526.MathSciNetMATH
21.
Zurück zum Zitat Breiman, L., & Spector, P. (1992). Submodel selection and evaluation in regression: The X-random case. International Statistical Review, 60(3), 291–319.CrossRef Breiman, L., & Spector, P. (1992). Submodel selection and evaluation in regression: The X-random case. International Statistical Review, 60(3), 291–319.CrossRef
22.
Zurück zum Zitat Burges, C. J. C. (2010). From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010-82, Microsoft Research. Burges, C. J. C. (2010). From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010-82, Microsoft Research.
23.
Zurück zum Zitat Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75.CrossRef Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75.CrossRef
24.
Zurück zum Zitat Cawley, G. C., & Talbot, N. L. C. (2007). Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters. Journal of Machine Learning Research, 8, 841–861.MATH Cawley, G. C., & Talbot, N. L. C. (2007). Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters. Journal of Machine Learning Research, 8, 841–861.MATH
25.
Zurück zum Zitat Cawley, G. C., & Talbot, N. L. C. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11, 2079–2107.MathSciNetMATH Cawley, G. C., & Talbot, N. L. C. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11, 2079–2107.MathSciNetMATH
26.
Zurück zum Zitat Chapelle, O., & Chang, Y. (2011). Yahoo! learning to rank challenge overview. In JMLR workshop and conference proceedings: Workshop on Yahoo! learning to rank challenge (Vol. 14, pp. 1–24). Chapelle, O., & Chang, Y. (2011). Yahoo! learning to rank challenge overview. In JMLR workshop and conference proceedings: Workshop on Yahoo! learning to rank challenge (Vol. 14, pp. 1–24).
27.
Zurück zum Zitat Chawla, N., Bowyer, K., & Kegelmeyer, W. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.MATHCrossRef Chawla, N., Bowyer, K., & Kegelmeyer, W. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.MATHCrossRef
28.
Zurück zum Zitat Chen, D. S., & Jain, R. C. (1994). A robust backpropagation learning algorithm for function approximation. IEEE Transactions on Neural Networks, 5(3), 467–479.CrossRef Chen, D. S., & Jain, R. C. (1994). A robust backpropagation learning algorithm for function approximation. IEEE Transactions on Neural Networks, 5(3), 467–479.CrossRef
29.
Zurück zum Zitat Chiu, C., Mehrotra, K., Mohan, C. K., & Ranka, S. (1994). Modifying training algorithms for improved fault tolerance. In Proceedings of IEEE International Conference on Neural Networks, Orlando, FL, USA (Vol. 4, pp. 333–338). Chiu, C., Mehrotra, K., Mohan, C. K., & Ranka, S. (1994). Modifying training algorithms for improved fault tolerance. In Proceedings of IEEE International Conference on Neural Networks, Orlando, FL, USA (Vol. 4, pp. 333–338).
30.
Zurück zum Zitat Cichocki, A., & Unbehauen, R. (1992). Neural networks for optimization and signal processing. New York: Wiley.MATH Cichocki, A., & Unbehauen, R. (1992). Neural networks for optimization and signal processing. New York: Wiley.MATH
31.
Zurück zum Zitat Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, 14, 326–334.MATHCrossRef Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, 14, 326–334.MATHCrossRef
32.
Zurück zum Zitat Dasgupta, S., Littman, M., & McAllester, D. (2002). PAC generalization bounds for co-training. In: Advances in neural information processing systems (Vol. 14, pp. 375–382). Dasgupta, S., Littman, M., & McAllester, D. (2002). PAC generalization bounds for co-training. In: Advances in neural information processing systems (Vol. 14, pp. 375–382).
33.
Zurück zum Zitat Denker, J. S., Schwartz, D., Wittner, B., Solla, S. A., Howard, R., Jackel, L., et al. (1987). Large automatic learning, rule extraction, and generalization. Complex Systems, 1, 877–922.MathSciNetMATH Denker, J. S., Schwartz, D., Wittner, B., Solla, S. A., Howard, R., Jackel, L., et al. (1987). Large automatic learning, rule extraction, and generalization. Complex Systems, 1, 877–922.MathSciNetMATH
34.
Zurück zum Zitat Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89, 31–71.MATHCrossRef Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89, 31–71.MATHCrossRef
35.
Zurück zum Zitat Domingos, P. (1999). The role of Occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery, 3, 409–425.CrossRef Domingos, P. (1999). The role of Occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery, 3, 409–425.CrossRef
36.
Zurück zum Zitat Edwards, P. J., & Murray, A. F. (1998). Towards optimally distributed computation. Neural Computation, 10, 997–1015.CrossRef Edwards, P. J., & Murray, A. F. (1998). Towards optimally distributed computation. Neural Computation, 10, 997–1015.CrossRef
37.
Zurück zum Zitat Fedorov, V. V. (1972). Theory of optimal experiments. San Diego: Academic Press. Fedorov, V. V. (1972). Theory of optimal experiments. San Diego: Academic Press.
38.
Zurück zum Zitat Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.MathSciNetMATH Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.MathSciNetMATH
39.
Zurück zum Zitat Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133–168.MATHCrossRef Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133–168.MATHCrossRef
40.
Zurück zum Zitat Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.MathSciNetMATHCrossRef Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.MathSciNetMATHCrossRef
41.
Zurück zum Zitat Friedrichs, F., & Schmitt, M. (2005). On the power of Boolean computations in generalized RBF neural networks. Neurocomputing, 63, 483–498.CrossRef Friedrichs, F., & Schmitt, M. (2005). On the power of Boolean computations in generalized RBF neural networks. Neurocomputing, 63, 483–498.CrossRef
42.
Zurück zum Zitat Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on Machine Learning (Vol. 48, pp. 1050–1059). Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on Machine Learning (Vol. 48, pp. 1050–1059).
43.
Zurück zum Zitat Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.
44.
Zurück zum Zitat Ghodsi, A., & Schuurmans, D. (2003). Automatic basis selection techniques for RBF networks. Neural Networks, 16, 809–816.CrossRef Ghodsi, A., & Schuurmans, D. (2003). Automatic basis selection techniques for RBF networks. Neural Networks, 16, 809–816.CrossRef
45.
Zurück zum Zitat Gish, H. (1990). A probabilistic approach to the understanding and training of neural network classifiers. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 1361–1364). Gish, H. (1990). A probabilistic approach to the understanding and training of neural network classifiers. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 1361–1364).
46.
Zurück zum Zitat Hanson, S. J., & Burr, D. J. (1988). Minkowski back-propagation: Learning in connectionist models with non-Euclidean error signals. In D. Z. Anderson (Ed.), Neural information processing systems (pp. 348–357). New York: American Institute Physics. Hanson, S. J., & Burr, D. J. (1988). Minkowski back-propagation: Learning in connectionist models with non-Euclidean error signals. In D. Z. Anderson (Ed.), Neural information processing systems (pp. 348–357). New York: American Institute Physics.
47.
Zurück zum Zitat Hassoun, M. H. (1995). Fundamentals of artificial neural networks. Cambridge, MA: MIT Press.MATH Hassoun, M. H. (1995). Fundamentals of artificial neural networks. Cambridge, MA: MIT Press.MATH
48.
Zurück zum Zitat Hecht-Nielsen, R. (1987). Kolmogorov’s mapping neural network existence theorem. In Proceedings of the 1st IEEE International Conference on Neural Networks (Vol. 3, pp. 11–14). San Diego, CA. Hecht-Nielsen, R. (1987). Kolmogorov’s mapping neural network existence theorem. In Proceedings of the 1st IEEE International Conference on Neural Networks (Vol. 3, pp. 11–14). San Diego, CA.
49.
Zurück zum Zitat Helmbold, D. P., & Long, P. M. (2018). Surprising properties of dropout in deep networks. Journal of Machine Learning Research, 18, 1–28.MATH Helmbold, D. P., & Long, P. M. (2018). Surprising properties of dropout in deep networks. Journal of Machine Learning Research, 18, 1–28.MATH
50.
Zurück zum Zitat Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In P. J. Bartlett, B. Scholkopf, D. Schuurmans, & A. J. Smola (Eds.), Advances in large margin classifiers (pp. 115–132). Cambridge, MA: MIT Press. Herbrich, R., Graepel, T., & Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In P. J. Bartlett, B. Scholkopf, D. Schuurmans, & A. J. Smola (Eds.), Advances in large margin classifiers (pp. 115–132). Cambridge, MA: MIT Press.
51.
Zurück zum Zitat Hinton, G. E. (1989). Connectionist learning procedure. Artificial Intelligence, 40, 185–234.CrossRef Hinton, G. E. (1989). Connectionist learning procedure. Artificial Intelligence, 40, 185–234.CrossRef
52.
Zurück zum Zitat Hinton, G. E. (2012). Dropout: A simple and effective way to improve neural networks. videolectures.net. Hinton, G. E. (2012). Dropout: A simple and effective way to improve neural networks. videolectures.net.
53.
Zurück zum Zitat Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. The Computing Research Repository (CoRR), abs/1207.0580. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. The Computing Research Repository (CoRR), abs/1207.0580.
54.
Zurück zum Zitat Hinton, G. E., & van Camp, D. (1993). Keeping neural networks simple by minimizing the description length of the weights. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory (pp. 5–13). Santa Cruz, CA. Hinton, G. E., & van Camp, D. (1993). Keeping neural networks simple by minimizing the description length of the weights. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory (pp. 5–13). Santa Cruz, CA.
55.
Zurück zum Zitat Ho, K. I.-J., Leung, C.-S., & Sum, J. (2010). Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks. IEEE Transactions on Neural Networks, 21(6), 938–947.CrossRef Ho, K. I.-J., Leung, C.-S., & Sum, J. (2010). Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks. IEEE Transactions on Neural Networks, 21(6), 938–947.CrossRef
56.
Zurück zum Zitat Hoi, S. C. H., Jin, R., & Lyu, M. R. (2009). Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1233–1248.CrossRef Hoi, S. C. H., Jin, R., & Lyu, M. R. (2009). Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1233–1248.CrossRef
57.
Zurück zum Zitat Holmstrom, L., & Koistinen, P. (1992). Using additive noise in back-propagation training. IEEE Transactions on Neural Networks, 3(1), 24–38.CrossRef Holmstrom, L., & Koistinen, P. (1992). Using additive noise in back-propagation training. IEEE Transactions on Neural Networks, 3(1), 24–38.CrossRef
58.
Zurück zum Zitat Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377.MATHCrossRef Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28, 321–377.MATHCrossRef
60.
Zurück zum Zitat Janssen, P., Stoica, P., Soderstrom, T., & Eykhoff, P. (1988). Model structure selection for multivariable systems by cross-validation. International Journal of Control, 47, 1737–1758.MathSciNetMATHCrossRef Janssen, P., Stoica, P., Soderstrom, T., & Eykhoff, P. (1988). Model structure selection for multivariable systems by cross-validation. International Journal of Control, 47, 1737–1758.MathSciNetMATHCrossRef
62.
Zurück zum Zitat Khan, S. H., Hayat, M., & Porikli, F. (2019). Regularization of deep neural networks with spectral dropout. Neural Networks, 110, 82–90.CrossRef Khan, S. H., Hayat, M., & Porikli, F. (2019). Regularization of deep neural networks with spectral dropout. Neural Networks, 110, 82–90.CrossRef
63.
Zurück zum Zitat Kokiopoulou, E., & Saad, Y. (2007). Orthogonal neighborhood preserving projections: A projection-based dimensionality reduction technique. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2143–2156.CrossRef Kokiopoulou, E., & Saad, Y. (2007). Orthogonal neighborhood preserving projections: A projection-based dimensionality reduction technique. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2143–2156.CrossRef
64.
Zurück zum Zitat Kolmogorov, A. N. (1957). On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition. Doklady Akademii Nauk USSR, 114(5), 953–956.MATH Kolmogorov, A. N. (1957). On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition. Doklady Akademii Nauk USSR, 114(5), 953–956.MATH
65.
Zurück zum Zitat Krogh, A., & Hertz, J. A. (1992). A simple weight decay improves generalization. In Proceedings of Neural Information and Processing Systems (NIPS) Conference (pp. 950–957). San Mateo, CA: Morgan Kaufmann. Krogh, A., & Hertz, J. A. (1992). A simple weight decay improves generalization. In Proceedings of Neural Information and Processing Systems (NIPS) Conference (pp. 950–957). San Mateo, CA: Morgan Kaufmann.
66.
Zurück zum Zitat Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El Ghaoui, L., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.MathSciNetMATH Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El Ghaoui, L., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.MathSciNetMATH
67.
Zurück zum Zitat Lin, Y., Lee, Y., & Wahba, G. (2002). Support vector machines for classification in nonstandard situations. Machine Learning, 46, 191–202.MATHCrossRef Lin, Y., Lee, Y., & Wahba, G. (2002). Support vector machines for classification in nonstandard situations. Machine Learning, 46, 191–202.MATHCrossRef
68.
Zurück zum Zitat Liu, W., Pokharel, P. P., & Principe, J. C. (2007). Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Transactions on Signal Processing, 55(11), 5286–5298.MathSciNetMATHCrossRef Liu, W., Pokharel, P. P., & Principe, J. C. (2007). Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Transactions on Signal Processing, 55(11), 5286–5298.MathSciNetMATHCrossRef
69.
Zurück zum Zitat Liu, Y., Starzyk, J. A., & Zhu, Z. (2008). Optimized approximation algorithm in neural networks without overfitting. IEEE Transactions on Neural Networks, 19(6), 983–995.CrossRef Liu, Y., Starzyk, J. A., & Zhu, Z. (2008). Optimized approximation algorithm in neural networks without overfitting. IEEE Transactions on Neural Networks, 19(6), 983–995.CrossRef
70.
Zurück zum Zitat Maass, W. (2000). On the computational power of winner-take-all. Neural Computation, 12, 2519–2535.CrossRef Maass, W. (2000). On the computational power of winner-take-all. Neural Computation, 12, 2519–2535.CrossRef
71.
Zurück zum Zitat MacKay, D. (1992). Information-based objective functions for active data selection. Neural Computation, 4(4), 590–604.CrossRef MacKay, D. (1992). Information-based objective functions for active data selection. Neural Computation, 4(4), 590–604.CrossRef
72.
Zurück zum Zitat Markatou, M., Tian, H., Biswas, S., & Hripcsak, G. (2005). Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research, 6, 1127–1168.MathSciNetMATH Markatou, M., Tian, H., Biswas, S., & Hripcsak, G. (2005). Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research, 6, 1127–1168.MathSciNetMATH
73.
Zurück zum Zitat Matsuoka, K., & Yi, J. (1991). Backpropagation based on the logarithmic error function and elimination of local minima. In Proceedings of the International Joint Conference on Neural Networks (pp. 1117–1122). Seattle, WA. Matsuoka, K., & Yi, J. (1991). Backpropagation based on the logarithmic error function and elimination of local minima. In Proceedings of the International Joint Conference on Neural Networks (pp. 1117–1122). Seattle, WA.
74.
Zurück zum Zitat McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society: Series B, 42(2), 109–142.MathSciNetMATH McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society: Series B, 42(2), 109–142.MathSciNetMATH
75.
Zurück zum Zitat Muller, B., Reinhardt, J., & Strickland, M. (1995). Neural networks: An introduction (2nd ed.). Berlin: Springer.MATHCrossRef Muller, B., Reinhardt, J., & Strickland, M. (1995). Neural networks: An introduction (2nd ed.). Berlin: Springer.MATHCrossRef
76.
Zurück zum Zitat Murray, A. F., & Edwards, P. J. (1994). Synaptic weight noise euring MLP training: Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, 5(5), 792–802.CrossRef Murray, A. F., & Edwards, P. J. (1994). Synaptic weight noise euring MLP training: Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, 5(5), 792–802.CrossRef
77.
Zurück zum Zitat Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239–281.MATHCrossRef Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239–281.MATHCrossRef
78.
Zurück zum Zitat Niyogi, P., & Girosi, F. (1999). Generalization bounds for function approximation from scattered noisy data. Advances in Computational Mathematics, 10, 51–80.MathSciNetMATHCrossRef Niyogi, P., & Girosi, F. (1999). Generalization bounds for function approximation from scattered noisy data. Advances in Computational Mathematics, 10, 51–80.MathSciNetMATHCrossRef
79.
Zurück zum Zitat Nowlan, S. J., & Hinton, G. E. (1992). Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), 473–493.CrossRef Nowlan, S. J., & Hinton, G. E. (1992). Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), 473–493.CrossRef
80.
Zurück zum Zitat Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRef Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.CrossRef
81.
Zurück zum Zitat Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(1), 1065–1076.MathSciNetMATHCrossRef Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(1), 1065–1076.MathSciNetMATHCrossRef
82.
Zurück zum Zitat Phatak, D. S. (1999). Relationship between fault tolerance, generalization and the Vapnik-Cervonenkis (VC) dimension of feedforward ANNs. Proceedings of International Joint Conference on Neural Networks, 1, 705–709.CrossRef Phatak, D. S. (1999). Relationship between fault tolerance, generalization and the Vapnik-Cervonenkis (VC) dimension of feedforward ANNs. Proceedings of International Joint Conference on Neural Networks, 1, 705–709.CrossRef
83.
Zurück zum Zitat Plutowski, M. E. P. (1996). Survey: Cross-validation in theory and in practice. Research Report. Princeton, NJ: Department of Computational Science Research, David Sarnoff Research Center. Plutowski, M. E. P. (1996). Survey: Cross-validation in theory and in practice. Research Report. Princeton, NJ: Department of Computational Science Research, David Sarnoff Research Center.
84.
Zurück zum Zitat Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.MATHCrossRef Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.MATHCrossRef
85.
Zurück zum Zitat Prechelt, L. (1998). Automatic early stopping using cross validation: Quantifying the criteria. Neural Networks, 11, 761–767.CrossRef Prechelt, L. (1998). Automatic early stopping using cross validation: Quantifying the criteria. Neural Networks, 11, 761–767.CrossRef
86.
Zurück zum Zitat Reed, R., Marks, R. J, I. I., & Oh, S. (1995). Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Transactions on Neural Networks, 6(3), 529–538.CrossRef Reed, R., Marks, R. J, I. I., & Oh, S. (1995). Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Transactions on Neural Networks, 6(3), 529–538.CrossRef
87.
Zurück zum Zitat Rimer, M., & Martinez, T. (2006). Classification-based objective functions. Machine Learning, 63(2), 183–205.MATHCrossRef Rimer, M., & Martinez, T. (2006). Classification-based objective functions. Machine Learning, 63(2), 183–205.MATHCrossRef
88.
Zurück zum Zitat Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5), 465–477.MATHCrossRef Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5), 465–477.MATHCrossRef
90.
Zurück zum Zitat Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRef Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRef
91.
Zurück zum Zitat Royden, H. L. (1968). Real analysis (2nd ed.). New York: Macmillan.MATH Royden, H. L. (1968). Real analysis (2nd ed.). New York: Macmillan.MATH
92.
Zurück zum Zitat Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 318–362). Cambridge, MA: MIT Press. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 318–362). Cambridge, MA: MIT Press.
93.
Zurück zum Zitat Rumelhart, D. E., Durbin, R., Golden, R., & Chauvin, Y. (1995). Backpropagation: the basic theory. In Y. Chauvin & D. E. Rumelhart (Eds.), Backpropagation: Theory, architecture, and applications (pp. 1–34). Hillsdale, NJ: Lawrence Erlbaum. Rumelhart, D. E., Durbin, R., Golden, R., & Chauvin, Y. (1995). Backpropagation: the basic theory. In Y. Chauvin & D. E. Rumelhart (Eds.), Backpropagation: Theory, architecture, and applications (pp. 1–34). Hillsdale, NJ: Lawrence Erlbaum.
94.
Zurück zum Zitat Sabato, S., & Tishby, N. (2012). Multi-instance learning with any hypothesis class. Journal of Machine Learning Research, 13, 2999–3039.MathSciNetMATH Sabato, S., & Tishby, N. (2012). Multi-instance learning with any hypothesis class. Journal of Machine Learning Research, 13, 2999–3039.MathSciNetMATH
95.
Zurück zum Zitat Sarbo, J. J., & Cozijn, R. (2019). Belief in reasoning. Cognitive Systems Research, 55, 245–256.CrossRef Sarbo, J. J., & Cozijn, R. (2019). Belief in reasoning. Cognitive Systems Research, 55, 245–256.CrossRef
96.
98.
99.
Zurück zum Zitat Siegelmann, H. T., & Sontag, E. D. (1995). On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), 132–150.MathSciNetMATHCrossRef Siegelmann, H. T., & Sontag, E. D. (1995). On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), 132–150.MathSciNetMATHCrossRef
100.
Zurück zum Zitat Silva, L. M., de Sa, J. M., & Alexandre, L. A. (2008). Data classification with multilayer perceptrons using a generalized error function. Neural Networks, 21, 1302–1310.MATHCrossRef Silva, L. M., de Sa, J. M., & Alexandre, L. A. (2008). Data classification with multilayer perceptrons using a generalized error function. Neural Networks, 21, 1302–1310.MATHCrossRef
101.
Zurück zum Zitat Sima, J. (1996). Back-propagation is not efficient. Neural Networks, 9(6), 1017–1023.CrossRef Sima, J. (1996). Back-propagation is not efficient. Neural Networks, 9(6), 1017–1023.CrossRef
102.
Zurück zum Zitat Singh, A., Pokharel, R., & Principe, J. C. (2014). The C-loss function for pattern classification. Pattern Recognition, 47(1), 441–453.MATHCrossRef Singh, A., Pokharel, R., & Principe, J. C. (2014). The C-loss function for pattern classification. Pattern Recognition, 47(1), 441–453.MATHCrossRef
103.
Zurück zum Zitat Solla, S. A., Levin, E., & Fleisher, M. (1988). Accelerated learning in layered neural networks. Complex Systems, 2, 625–640.MathSciNetMATH Solla, S. A., Levin, E., & Fleisher, M. (1988). Accelerated learning in layered neural networks. Complex Systems, 2, 625–640.MathSciNetMATH
104.
Zurück zum Zitat Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.MathSciNetMATH Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.MathSciNetMATH
105.
Zurück zum Zitat Stoica, P., & Selen, Y. (2004). A review of information criterion rules. EEE Signal Processing Magazine, 21(4), 36–47.CrossRef Stoica, P., & Selen, Y. (2004). A review of information criterion rules. EEE Signal Processing Magazine, 21(4), 36–47.CrossRef
106.
Zurück zum Zitat Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society Series B, 36, 111–147.MathSciNetMATH Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society Series B, 36, 111–147.MathSciNetMATH
107.
Zurück zum Zitat Sugiyama, M., & Ogawa, H. (2000). Incremental active learning for optimal generalization. Neural Computation, 12, 2909–2940.CrossRef Sugiyama, M., & Ogawa, H. (2000). Incremental active learning for optimal generalization. Neural Computation, 12, 2909–2940.CrossRef
108.
Zurück zum Zitat Sugiyama, M., & Nakajima, S. (2009). Pool-based active learning in approximate linear regression. Machine Learning, 75, 249–274.CrossRef Sugiyama, M., & Nakajima, S. (2009). Pool-based active learning in approximate linear regression. Machine Learning, 75, 249–274.CrossRef
109.
Zurück zum Zitat Sum, J. P.-F., Leung, C.-S., & Ho, K. I.-J. (2012). On-line node fault injection training algorithm for MLP networks: Objective function and convergence analysis. IEEE Transactions on Neural Networks and Learning Systems, 23(2), 211–222. Sum, J. P.-F., Leung, C.-S., & Ho, K. I.-J. (2012). On-line node fault injection training algorithm for MLP networks: Objective function and convergence analysis. IEEE Transactions on Neural Networks and Learning Systems, 23(2), 211–222.
110.
Zurück zum Zitat Tabatabai, M. A., & Argyros, I. K. (1993). Robust estimation and testing for general nonlinear regression models. Applied Mathematics and Computation, 58, 85–101.MathSciNetMATHCrossRef Tabatabai, M. A., & Argyros, I. K. (1993). Robust estimation and testing for general nonlinear regression models. Applied Mathematics and Computation, 58, 85–101.MathSciNetMATHCrossRef
111.
Zurück zum Zitat Tecuci, G., Kaiser, L., Marcu, D., Uttamsingh, C., & Boicu, M. (2018). Evidence-based reasoning in intelligence analysis: Structured methodology and system. Computing in Science & Engineering, 20(6), 9–21.CrossRef Tecuci, G., Kaiser, L., Marcu, D., Uttamsingh, C., & Boicu, M. (2018). Evidence-based reasoning in intelligence analysis: Structured methodology and system. Computing in Science & Engineering, 20(6), 9–21.CrossRef
112.
Zurück zum Zitat Tikhonov, A. N. (1963). On solving incorrectly posed problems and method of regularization. Doklady Akademii Nauk USSR, 151, 501–504. Tikhonov, A. N. (1963). On solving incorrectly posed problems and method of regularization. Doklady Akademii Nauk USSR, 151, 501–504.
113.
Zurück zum Zitat Tucker, L. R. (1964). The extension of factor analysis to three-dimensional matrices. Contributions to mathematical psychology (pp. 109–127). Holt, Rinehardt & Winston: New York, NY. Tucker, L. R. (1964). The extension of factor analysis to three-dimensional matrices. Contributions to mathematical psychology (pp. 109–127). Holt, Rinehardt & Winston: New York, NY.
114.
Zurück zum Zitat Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.MATH Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.MATH
115.
Zurück zum Zitat Wan, L., Zeiler, M., Zhang, S., LeCun, Y., Fergus, R. (2013). Regularization of neural networks using dropconnect. In Proceedings of International Conference on Machine Learning (pp. 1058–1066). Wan, L., Zeiler, M., Zhang, S., LeCun, Y., Fergus, R. (2013). Regularization of neural networks using dropconnect. In Proceedings of International Conference on Machine Learning (pp. 1058–1066).
116.
Zurück zum Zitat Widrow, B., & Lehr, M. A. (1990). 30 years of adaptive neural networks: Perceptron, Madaline, and backpropagation. Proceedings of the IEEE, 78(9), 1415–1442.CrossRef Widrow, B., & Lehr, M. A. (1990). 30 years of adaptive neural networks: Perceptron, Madaline, and backpropagation. Proceedings of the IEEE, 78(9), 1415–1442.CrossRef
117.
Zurück zum Zitat Wu, G., & Cheng, E. (2003). Class-boundary alignment for imbalanced dataset learning. In Proceedings of ICML 2003 Workshop on Learning Imbalanced Data Sets II (pp. 49–56). Washington, DC. Wu, G., & Cheng, E. (2003). Class-boundary alignment for imbalanced dataset learning. In Proceedings of ICML 2003 Workshop on Learning Imbalanced Data Sets II (pp. 49–56). Washington, DC.
118.
Zurück zum Zitat Xiao, Y., Feng, R.-B., Leung, C.-S., & Sum, J. (2016). Objective function and learning algorithm for the general node fault situation. IEEE Transactions on Neural Networks and Learning Systems, 27(4), 863–874.MathSciNetCrossRef Xiao, Y., Feng, R.-B., Leung, C.-S., & Sum, J. (2016). Objective function and learning algorithm for the general node fault situation. IEEE Transactions on Neural Networks and Learning Systems, 27(4), 863–874.MathSciNetCrossRef
119.
Zurück zum Zitat Xu, H., Caramanis, C., & Mannor, S. (2010). Robust regression and Lasso. IEEE Transactions on Information Theory, 56(7), 3561–3574.MathSciNetMATHCrossRef Xu, H., Caramanis, C., & Mannor, S. (2010). Robust regression and Lasso. IEEE Transactions on Information Theory, 56(7), 3561–3574.MathSciNetMATHCrossRef
120.
Zurück zum Zitat Xu, H., Caramanis, C., & Mannor, S. (2012). Sparse algorithms are not stable: A no-free-lunch theorem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 187–193.CrossRef Xu, H., Caramanis, C., & Mannor, S. (2012). Sparse algorithms are not stable: A no-free-lunch theorem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 187–193.CrossRef
121.
Zurück zum Zitat Yang, L., Hanneke, S., & Carbonell, J. (2013). A theory of transfer learning with applications to active learning. Machine Learning, 90(2), 161–189.MathSciNetMATHCrossRef Yang, L., Hanneke, S., & Carbonell, J. (2013). A theory of transfer learning with applications to active learning. Machine Learning, 90(2), 161–189.MathSciNetMATHCrossRef
122.
Zurück zum Zitat Zahalka, J., & Zelezny, F. (2011). An experimental test of Occam’s razor in classification. Machine Learning, 82, 475–481.MathSciNetCrossRef Zahalka, J., & Zelezny, F. (2011). An experimental test of Occam’s razor in classification. Machine Learning, 82, 475–481.MathSciNetCrossRef
123.
Zurück zum Zitat Zhang, M.-L., & Zhou, Z.-H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.MATHCrossRef Zhang, M.-L., & Zhou, Z.-H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.MATHCrossRef
Metadaten
Titel
Fundamentals of Machine Learning
verfasst von
Ke-Lin Du
M. N. S. Swamy
Copyright-Jahr
2019
Verlag
Springer London
DOI
https://doi.org/10.1007/978-1-4471-7452-3_2