Skip to main content
Top

2010 | OriginalPaper | Chapter

2. Renyi’s Entropy, Divergence and Their Nonparametric Estimators

Authors : Dongxin Xu, Deniz Erdogmuns

Published in: Information Theoretic Learning

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

It is evident from Chapter 1 that Shannon’s entropy occupies a central role in information-theoretic studies. Yet, the concept of information is so rich that perhaps there is no single definition that will be able to quantify information properly. Moreover, from an engineering perspective, one must estimate entropy from data which is a nontrivial matter. In this book we concentrate on Alfred Renyi’s seminal work on information theory to derive a set of estimators to apply entropy and divergence as cost functions in adaptation and learning. Therefore, we are mainly interested in computationally simple, nonparametric estimators that are continuous and differentiable in terms of the samples to yield well-behaved gradient algorithms that can optimize adaptive system parameters. There are many factors that affect the determination of the optimum of the performance surface, such as gradient noise, learning rates, and misadjustment, therefore in these types of applications the entropy estimator’s bias and variance are not as critical as, for instance, in coding or rate distortion theories. Moreover in adaptation one is only interested in the extremum (maximum or minimum) of the cost, with creates independence from its actual values, because only relative assessments are necessary. Following our nonparametric goals, what matters most in learning is to develop cost functions or divergence measures that can be derived directly from data without further assumptions to capture as much structure as possible within the data’s probability density function (PDF).

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Aczél J., Daróczy Z., On measures of information and their characterizations,Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975. Aczél J., Daróczy Z., On measures of information and their characterizations,Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975.
19.
go back to reference Basu A., Lindsay B. Minimum disparity estimation in the continuous case: Efficiency, distributions, robustness,Ann. Inst.Statist. Math.., 46:683–705, 1994.CrossRefMATHMathSciNet Basu A., Lindsay B. Minimum disparity estimation in the continuous case: Efficiency, distributions, robustness,Ann. Inst.Statist. Math.., 46:683–705, 1994.CrossRefMATHMathSciNet
39.
go back to reference Bhattacharyya A., On a measure of divergence between two statistical populations defined by their probability distributions,Bul. Calcutta Math. Soc., 35:99–109, 1943.MATH Bhattacharyya A., On a measure of divergence between two statistical populations defined by their probability distributions,Bul. Calcutta Math. Soc., 35:99–109, 1943.MATH
40.
go back to reference Bourbaki N.,Topological Vector Spaces, Springer, 1987 Bourbaki N.,Topological Vector Spaces, Springer, 1987
44.
56.
go back to reference Chernoff H., A measure of asymptotic efficiency of tests for a hypothesis based on a sum of observations.Ann. Math. Stat., 23:493–507, 1952.CrossRefMATHMathSciNet Chernoff H., A measure of asymptotic efficiency of tests for a hypothesis based on a sum of observations.Ann. Math. Stat., 23:493–507, 1952.CrossRefMATHMathSciNet
86.
go back to reference Erdogmus D., Information theoretic learning: Renyi’s entropy and its applications to adaptive systems training, Ph.D. Dissertation, University of Florida, Gainesville, 2002. Erdogmus D., Information theoretic learning: Renyi’s entropy and its applications to adaptive systems training, Ph.D. Dissertation, University of Florida, Gainesville, 2002.
91.
go back to reference Erdogmus D., Hild K., Principe J., Beyond second order statistics for learning: a pairwise interaction model for entropy estimation,J. Natural Comput., 1(1):85–108, 2003.CrossRefMathSciNet Erdogmus D., Hild K., Principe J., Beyond second order statistics for learning: a pairwise interaction model for entropy estimation,J. Natural Comput., 1(1):85–108, 2003.CrossRefMathSciNet
100.
go back to reference Fine S., Scheinberg K., Cristianini N., Shawe-Taylor J., Williamson B., Efficient SVM training using low-rank kernel representations,J. Mach. Learn. Res., 2:243–264, 2001. Fine S., Scheinberg K., Cristianini N., Shawe-Taylor J., Williamson B., Efficient SVM training using low-rank kernel representations,J. Mach. Learn. Res., 2:243–264, 2001.
116.
go back to reference Golub G., Van Loan C.,Matrix Computation, 3rd ed. The Johns Hopkins University Press, Baltimore, Maryland, 1996. Golub G., Van Loan C.,Matrix Computation, 3rd ed. The Johns Hopkins University Press, Baltimore, Maryland, 1996.
117.
go back to reference Gonzalez T., Clustering to minimize the maximum intercluster distance.Theor. Comput. Sci., 38:293–306, 1985.CrossRefMATH Gonzalez T., Clustering to minimize the maximum intercluster distance.Theor. Comput. Sci., 38:293–306, 1985.CrossRefMATH
120.
135.
go back to reference Hart, P., Moment distributions in economics: an exposition,J. Royal. Statis Soc. Ser. A, 138:423–434, 1975.CrossRef Hart, P., Moment distributions in economics: an exposition,J. Royal. Statis Soc. Ser. A, 138:423–434, 1975.CrossRef
138.
go back to reference Havrda J., Charvat, F., Quantification methods of classification processes: concept of structural a entropy,Kybernetica 3:30, 1967.MATHMathSciNet Havrda J., Charvat, F., Quantification methods of classification processes: concept of structural a entropy,Kybernetica 3:30, 1967.MATHMathSciNet
154.
go back to reference Horn D., Gottlieb A., Algorithm for data clustering in pattern recognition problems based on quantum mechanics,Phys. Rev. Lett., 88(1):018702, 2002.CrossRef Horn D., Gottlieb A., Algorithm for data clustering in pattern recognition problems based on quantum mechanics,Phys. Rev. Lett., 88(1):018702, 2002.CrossRef
168.
go back to reference Jizba P., Toshihico T., The world according to Renyi: Thermodynamics of multifractal systems,Ann. Phys., 312:17–59, 2004.CrossRefMATH Jizba P., Toshihico T., The world according to Renyi: Thermodynamics of multifractal systems,Ann. Phys., 312:17–59, 2004.CrossRefMATH
177.
go back to reference Kapur J.,Measures of Information and their Applications, Wiley Eastern Ltd, New Delhi, 1994.MATH Kapur J.,Measures of Information and their Applications, Wiley Eastern Ltd, New Delhi, 1994.MATH
179.
go back to reference Kawai A, Fukushige T., $105/Gflops astrophysical N-body simulation with reconfigurable add-in card and hierarchical tree algorithm, inProc. SC2006, IEEE Computer Society Press, Tampa FL, 2006. Kawai A, Fukushige T., $105/Gflops astrophysical N-body simulation with reconfigurable add-in card and hierarchical tree algorithm, inProc. SC2006, IEEE Computer Society Press, Tampa FL, 2006.
184.
go back to reference Kolmogorov A., Sur la notion de la moyenne,Atti della R. Accademia Nazionale dei Lincei, 12:388–391, 1930. Kolmogorov A., Sur la notion de la moyenne,Atti della R. Accademia Nazionale dei Lincei, 12:388–391, 1930.
188.
go back to reference Kullback S.,Information theory and statistics, Dover, Mineola, NY, 1959.MATH Kullback S.,Information theory and statistics, Dover, Mineola, NY, 1959.MATH
205.
go back to reference Lutwak E., Yang D., Zhang G., Cramér–Rao and moment-entropy inequalities for Renyi entropy and generalized Fisher information,IEEE Trans. Info. Theor.., 51(2):473–479, 2005.CrossRefMATHMathSciNet Lutwak E., Yang D., Zhang G., Cramér–Rao and moment-entropy inequalities for Renyi entropy and generalized Fisher information,IEEE Trans. Info. Theor.., 51(2):473–479, 2005.CrossRefMATHMathSciNet
229.
go back to reference Nagumo M., Uber eine klasse von mittelwerte,Japanese J. Math.., 7:71, 1930.MATH Nagumo M., Uber eine klasse von mittelwerte,Japanese J. Math.., 7:71, 1930.MATH
236.
go back to reference Pardo L.,Statistical Inference based on Divergence measures, Chapman & Hall, Boca raton, FL, 2006.MATH Pardo L.,Statistical Inference based on Divergence measures, Chapman & Hall, Boca raton, FL, 2006.MATH
241.
252.
go back to reference Principe, J., Xu D., Fisher J., Information theoretic learning, in unsupervised adaptive filtering, Simon Haykin (Ed.), pp. 265–319, Wiley, New York, 2000. Principe, J., Xu D., Fisher J., Information theoretic learning, in unsupervised adaptive filtering, Simon Haykin (Ed.), pp. 265–319, Wiley, New York, 2000.
259.
go back to reference Rao S., Unsupervised Learning: An Information Theoretic Learning Approach, Ph.D. thesis, University of Florida, Gainesville, 2008. Rao S., Unsupervised Learning: An Information Theoretic Learning Approach, Ph.D. thesis, University of Florida, Gainesville, 2008.
263.
go back to reference Renyi A., On measures of entropy and information,Proc. of the 4th Berkeley Symp. Math. Statist. Prob. 1960, vol. I, Berkeley University Press, pp. 457, 1961. Renyi A., On measures of entropy and information,Proc. of the 4th Berkeley Symp. Math. Statist. Prob. 1960, vol. I, Berkeley University Press, pp. 457, 1961.
264.
go back to reference Renyi A., Probability Theory, North-Holland, University Amsterdam, 1970. Renyi A., Probability Theory, North-Holland, University Amsterdam, 1970.
265.
go back to reference Renyi A. (Ed.),Selected Papers of Alfred Renyi, vol. 2, Akademia Kiado, Budapest, 1976. Renyi A. (Ed.),Selected Papers of Alfred Renyi, vol. 2, Akademia Kiado, Budapest, 1976.
266.
go back to reference Renyi A., Some fundamental questions about information theory, in Renyi, A. (Ed.),Selected Papers of Alfred Renyi, vol. 2, Akademia Kiado, Budapest, 1976. Renyi A., Some fundamental questions about information theory, in Renyi, A. (Ed.),Selected Papers of Alfred Renyi, vol. 2, Akademia Kiado, Budapest, 1976.
276.
go back to reference Rudin W.Principles of Mathematical Analysis. McGraw-Hill, New York, 1976.MATH Rudin W.Principles of Mathematical Analysis. McGraw-Hill, New York, 1976.MATH
292.
go back to reference Seth S., and Principe J., On speeding up computation in information theoretic learning, inProc. IJCNN 2009, Atlanta, GA, 2009. Seth S., and Principe J., On speeding up computation in information theoretic learning, inProc. IJCNN 2009, Atlanta, GA, 2009.
300.
go back to reference Silverman B.,Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, 1986.CrossRefMATH Silverman B.,Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, 1986.CrossRefMATH
308.
go back to reference Song, K., Renyi information, log likelihood and an intrinsic distribution measure,J. of Stat. Plan. and Inference, 93: 51–69, 2001.CrossRefMATH Song, K., Renyi information, log likelihood and an intrinsic distribution measure,J. of Stat. Plan. and Inference, 93: 51–69, 2001.CrossRefMATH
319.
go back to reference Torkkola K., Feature extraction by non-parametric mutual information maximization,J. Mach. Learn. Res.., 3:1415–1438, 2003.MATHMathSciNet Torkkola K., Feature extraction by non-parametric mutual information maximization,J. Mach. Learn. Res.., 3:1415–1438, 2003.MATHMathSciNet
326.
go back to reference von Neumann, J.,Mathematical Foundations of Quantum Mechanics, Princeton University Press, Princeton, NJ, 1955.MATH von Neumann, J.,Mathematical Foundations of Quantum Mechanics, Princeton University Press, Princeton, NJ, 1955.MATH
340.
go back to reference Xu D., Energy, Entropy and Information Potential for Neural Computation, PhD Dissertation, University of Florida, Gainesville, 1999 Xu D., Energy, Entropy and Information Potential for Neural Computation, PhD Dissertation, University of Florida, Gainesville, 1999
345.
go back to reference Yang C., Duraiswami R., Gumerov N., Davis L., Improved fast Gauss transform and efficient kernel density estimation. InProc. ICCV 2003, pages 464–471, 2003. Yang C., Duraiswami R., Gumerov N., Davis L., Improved fast Gauss transform and efficient kernel density estimation. InProc. ICCV 2003, pages 464–471, 2003.
Metadata
Title
Renyi’s Entropy, Divergence and Their Nonparametric Estimators
Authors
Dongxin Xu
Deniz Erdogmuns
Copyright Year
2010
Publisher
Springer New York
DOI
https://doi.org/10.1007/978-1-4419-1570-2_2

Premium Partner