Top

Published in:

2010 | OriginalPaper | Chapter

1. Information Theory, Machine Learning, and Reproducing Kernel Hilbert Spaces

Author : José C. Principe

Published in: Information Theoretic Learning

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The common problem faced by many data processing professionals is how to best extract the information contained in data. In our daily lives and in our professions, we are bombarded by huge amounts of data, but most often data are not our primary interest. Data hides, either in time structure or in spatial redundancy, important clues to answer the information-processing questions we pose. We are using the term information in the colloquial sense, and therefore it may mean different things to different people, which is OK for now. We all realize that the use of computers and the Web accelerated tremendously the accessibility and the amount of data being generated. Therefore the pressure to distill information from data will mount at an increasing pace in the future, and old ways of dealing with this problem will be forced to evolve and adapt to the new reality. To many (including the author) this represents nothing less than a paradigm shift, from hypothesis-based, to evidence-based science and it will affect the core design strategies in many disciplines including learning theory and adaptive systems.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

next chapter Renyi’s Entropy, Divergence and Their Nonparametric Estimators

Aczél J., Daróczy Z., On measures of information and their characterizations, Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975.

Aronszajn N., The theory of reproducing kernels and their applications, Cambridge Philos. Soc. Proc., vol. 39:133–153, 1943.CrossRefMathSciNet

Aronszajn N., Theory of reproducing kernels, Trans. of the Amer. Math. Soc., 68(3):337–404, 1950.CrossRefMATHMathSciNet

35.

Berlinet A., Thomas-Agnan C., Reproducing Kernel Hilbert Spaces in Probability and Statistics, Kluwer, Norwell, MA, 2003.

41.

Bregman L.M. (1967). The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7:200–217.CrossRef

43.

Burbea J., Rao C, Entropy differential metric, distance and divergence measures in probability spaces: A unified approach, J. Multivar. Anal., 12:575–596, 1982.CrossRefMATHMathSciNet

51.

Casals J., Jutten C., Taleb A., Source separation techniques applied to linear prediction, Proc. ICA’00, Helsinki, Finland, pp. 193–204, 2000.

57.

Chi C., Chen C., Cumulant-based inverse filter criteria for MIMO blind deconvolution: Properties, algorithms, and application to D/CDMA systems in multipath, IEEE Trans. Signal Process., 49(7):1282–1299, 2001CrossRef

65.

Cover T., Thomas J., Elements of Information Theory, Wiley, New York, 1991CrossRefMATH

66.

Csiszar I., Information type measures of difference of probability distributions and indirect observations, Stuia Sci. Math. Hungary, 2: 299–318, 1967.MATHMathSciNet

68.

Deco G., Obradovic D., An Information-Theoretic Approach to Neural Computing, Springer, New York, 1996.CrossRefMATH

69.

DeFigueiredo R., A generalized Fock space framework for nonlinear system and signal analysis, IEEE Trans. Circuits Syst., CAS-30(9):637–647, Sept. 1983.CrossRefMathSciNet

72.

Dhillon I., Guan Y., Kulisweifeng B., Kernel k-means, spectral clustering and normalized cuts”, Proc.Tenth ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining (KDD), pp. 551–556, August 2004.

95.

Erdogmus D., Principe J. From linear adaptive filtering to nonlinear signal processing” IEEE SP Mag., 23:14–33, 2006.CrossRef

97.

Fano R., Transmission of Information: A Statistical Theory of Communications, MIT Press, New York, 1961.

99.

Feng X., Loparo K., Fang Y., Optimal state estimation for stochastic systems: An information theoretic approach, IEEE Trans. Autom. Control, 42(6):771–785, 1997.CrossRefMATHMathSciNet

102.

Fisher J., Ihler A., Viola P., Learning informative statistics: A nonparametric approach, Proceedings of NIPS’00, pp. 900–906, 2000.

103.

Fock V., The Theory of Space Time and Gravitation”, Pergamon Press, New York, 1959.

106.

Fu K., Statistical pattern recognition, in Adaptive, Learning and Pattern Recognition Systems, Mendel and Fu Eds., Academic Press, New York, 1970, pp. 35–76.CrossRef

108.

Fukunaga K., An Introduction to Statistical Pattern Recognition, Academic Press, New York, 1972

112.

Girolami M., Orthogonal series density estimation and the kernel eigenvalue problem. Neural Comput., 14(3):669–688, 2002.CrossRefMATHMathSciNet

133.

Hardle W., Applied Nonparametric Regression, Econometric Society Monographs vol 19, Cambridge University Press, New York, 1990.

137.

Hartley R., Transmission of information, Bell Syst. Tech. J., 7:535, 1928.CrossRef

139.

Haykin S. (ed.), Blind Deconvolution, Prentice-Hall, Upper Saddle River, NJ, 1994.

141.

Haykin S., Neural Networks: A Comprehensive Foundation, Prentice Hall, Upper Saddle River, NJ, 1999.MATH

152.

Hinton G. and Sejnowski T., Unsupervised learning: Foundations of neural computation, MIT Press, Cambridge, MA, 1999.

156.

Hyvarinen A., Karhunen J., Oja E., Independent Component Analysis, Wiley, New York, 2001.CrossRef

170.

Jones M., McKay I., Hu T., Variable location and scale density estimation, Ann. Inst. Statist. Math., 46:345–52, 1994.MathSciNet

171.

Jumarie G., Relative Information, Springer Verlag, New York, 1990CrossRefMATH

173.

Kailath T., RKHS approach to detection and estimation problems–part I: Deterministic signals in Gaussian noise, IEEE Trans. Inf. Theor., IT-17(5):530–549, Sept. 1971.CrossRefMathSciNet

174.

Kailath T. and Duttweiler D., An RKHS approach to detection and estimation problems-part III: Generalized innovations representations and a likelihood-ratio formula, IEEE Trans. Inf. Theor., IT-18(6):30–45, November 1972.MathSciNet

175.

Kailath T. and Weinert H., An RKHS approach to detection and estimation problems-part II: Gaussian signal detection, IEEE Trans. Inf. Theor., IT-21(1):15–23, January 1975.CrossRefMathSciNet

177.

Kapur J., Measures of Information and their Applications, Wiley Eastern Ltd, New Delhi, 1994.MATH

178.

Kass R. and Vos P., Geometrical Foundations of Asymptotic Inference, Wiley, New York, 1997.CrossRefMATH

185.

Kolmogorov A., Interpolation and extrapolation of stationary random processes, Rand Co. (translation from the Russian), Santa Monica, CA, 1962.

196.

LeCun Y., Chopra S., Hadsell R., Ranzato M., Huang F., A tutorial on energy-based learning, in Predicting Structured Data, Bakir, Hofman, Scholkopf, Smola, Taskar (Eds.), MIT Press, Boston, 2006.

199.

Linsker R., Towards an organizing principle for a layered perceptual network. In D. Z. Anderson (Ed.), Neural Information Processing Systems - Natural and Synthetic. American Institute of Physics, New York, 1988.

202.

Liu W., Pokarel P., Principe J., The kernel LMS algorithm, IEEE Trans. Signal Process., 56(2):543–554, Feb. 2008.CrossRefMathSciNet

203.

Loève, M.M., Probability Theory, VanNostrand, Princeton, NJ, 1955.MATH

214.

Mate L., Hilbert Space Methods in Science and Engineering, Hilger, New York, 1989.MATH

216.

Menendez M., Morales D., Pardo L., Salicru M., Asymptotic behavior and stastistical applications of divergence measures in multinomial populations: a unified study, Statistical Papers, 36–129, 1995.

217.

Mercer J., Functions of positive and negative type, and their connection with the theory of integral equations, Philosoph. Trans. Roy. Soc. Lond., 209:415–446, 1909.CrossRefMATH

225.

Muller K., Smola A., Ratsch G., Scholkopf B., Kohlmorgen J., Vapnik V., Predicting time series with support vector machines. In Proceedings of International Conference on Artificial Neural Networks, Lecture Notes in Computer Science, volume 1327, pages 999–1004, Springer-Verlag, Berlin, 1997.

232.

Nilsson N., Learning Machines, Morgan Kauffman, San Mateo, Ca, 1933.

235.

Papoulis A., Probability, Random Variables and Stochastic Processes, McGraw-Hill, New York, 1965.MATH

238.

Parzen E., Statistical inference on time series by Hilbert space methods, Tech. Report 23, Stat. Dept., Stanford Univ., 1959.

241.

Parzen E., On the estimation of a probability density function and the mode, Ann. Math. Statist., 33:1065–1067, 1962.CrossRefMATHMathSciNet

252.

Principe, J., Xu D., Fisher J., Information theoretic learning, in unsupervised adaptive filtering, Simon Haykin (Ed.), pp. 265–319, Wiley, New York, 2000.

264.

Renyi A., Probability Theory, North-Holland, University Amsterdam, 1970.

272.

Rosenblatt M., Remarks on some nonparametric estimates of a density function, Ann. Math. Statist., 27:832–837, 1956.CrossRefMATHMathSciNet

278.

Salicru M., Menendez M., Morales D., Pardo L., Asymptotic distribution of (h,ϕ)-entropies, Comm. Statist. Theor. Meth., 22(7):2015–2031, 1993.CrossRefMATHMathSciNet

287.

Schölkopf B., Smola A., Muller K., Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput., 10:1299–1319, 1998.CrossRef

289.

Schölkopf B. and Smola A., Learning with Kernels. MIT Press, Cambridge, MA, 2002

293.

Shannon C., and Weaver W., The mathematical Theory of Communication, University of Illinois Press, Urbana, 1949.MATH

294.

Shawe-Taylor J. Cristianini N., Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004.

311.

Suykens J., Gestel T., Brabanter J., Moor B., Vandewalle J., Least Squares Support Vector Machines, Word Scientific, Singapore, 2002.

316.

Tishby N., Pereira F., and Bialek W., The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377, 1999.

323.

Vapnik V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995CrossRefMATH

327.

Wahba G., Spline Models for Observational Data, SIAM,. Philadelphia, PA, 1990, vol. 49.

330.

Watanabe S., Pattern Recognition: Human and Mechanical. Wiley, New York, 1985.

331.

Werbos P., Beyond regression: New tools for prediction and analysis in the behavioral sciences, Ph.D. Thesis, Harvard University, Cambridge, 1974.

332.

Widrow B., S. Stearns, Adaptive Signal Processing, Prentice Hall, Englewood Cliffs, NJ, 1985.

333.

Wiener N., Nonlinear Problems in Random Theory, MIT, Boston, 1958.MATH

Title: Information Theory, Machine Learning, and Reproducing Kernel Hilbert Spaces
Author: José C. Principe
Publisher: Springer New York
Book: Information Theoretic Learning
Print ISBN: 978-1-4419-1569-6

Electronic ISBN: 978-1-4419-1570-2

Copyright Year: 2010
DOI: https://doi.org/10.1007/978-1-4419-1570-2_1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner