Skip to main content
Top

2015 | OriginalPaper | Chapter

17. Algorithmic Statistics Revisited

Authors : Nikolay Vereshchagin, Alexander Shen

Published in: Measures of Complexity

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The mission of statistics is to provide adequate statistical hypotheses (models) for observed data. But what is an “adequate” model? To answer this question, one needs to use the notions of algorithmic information theory. It turns out that for every data string x one can naturally define a “stochasticity profile,” a curve that represents a trade-off between the complexity of a model and its adequacy. This curve has four different equivalent definitions in terms of (1) randomness deficiency, (2) minimal description length, (3) position in the lists of simple strings, and (4) Kolmogorov complexity with decompression time bounded by the busy beaver function. We present a survey of the corresponding definitions and results relating them to each other.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
We assume that the reader is familiar with basic notions of algorithmic information theory and use them freely. For a short introduction see Chap. 7; more information can be found in [14].
 
2
There is an alternative definition of \(d(x|A)\). Consider a function t of two arguments x and A, defined when \(x\in A\), and having integer values. We say that t islower semicomputable if there is an algorithm that (given x and A) generates lower bounds for t(xA) that converge to the true value of t(xA) in the limit. We say that t is a probability-bounded test if for every A and every positive integer k the fraction of \(x\in A\) such that \(t(x,A)>k\) is at most 1/k. Now \(d(x|A)\) can be defined as the logarithm of the maximal (up to an O(1)-factor) lower semicomputable probability-bounded test.
 
3
As is usual in algorithmic information theory, we consider the complexities up to \(O(\log n)\) precision if we deal with strings of length at most n. Two subsets \(S,T\subseteq \mathbb {Z}^2\) are the same for us if S is contained in the \(O(\log n)\)-neighborhood of T and vice versa.
 
4
The additional term \(O(\log {\textit{C}}(A))\) should appear on the right hand side, since we need to specify where the description of A ends and the ordinal number of x starts, so the length of the description (\({\textit{C}}(A)\)) should be specified in advance using some self-delimiting encoding. One may assume that \({\textit{C}}(A)\le n\), otherwise the inequality is trivial, so this additional term is \(O(\log n)\).
 
5
Let x and y be independent random strings of length n, so the pair (xy) has complexity close to 2n. Assume that x starts with 0 and y starts with 1. Let A be the set of strings that start with 0, plus the string y. Then A, considered as a model for x, has large optimality deficiency but small randomness deficiency. To decrease the optimality deficiency, we may remove y from A.
 
6
One may ask which computational model is used to measure the computation time, and complain that the notion of time-bounded complexity may depend on the choice of an optimal programming language (decompressor) and its interpreter. Indeed this is the case, but we will use a very rough measure of computation time based on the busy beaver function, and the difference between computational models does not matter. The reader may assume that we fix some optimal programming language, and some interpreter (say, a Turing machine) for this language, and count the steps performed by this interpreter.
 
7
Usually n-th busy beaver number is defined as the maximal running time or the maximal number of non-empty cells that can appear after a Turing machine with at most n states terminates starting on the empty tape. This gives a different number; we modify the definition so it does not depend on the peculiarities of encoding information by transition tables of Turing machines.
 
8
We assume that an algorithm is fixed that, given m, enumerates all strings of complexity at most m in some order.
 
9
In fact, \(\varOmega _k\) contains the same information (up to \(O(\log k)\) conditional complexity in both directions) as the first k bits of Chaitin’s \(\varOmega \)-number (a lower semicomputable random real), so we use the same letter \(\varOmega \) to denote it.
 
10
As usual, we assume that the programming language is optimal, i.e., gives an O(1)-minimal value of the complexity compared to other languages.
 
Literature
1.
go back to reference Antunes, L., Fortnow, L., van Melkebeek, D., Vinodchandran, N.V.: Computational depth: concept and applications. Theor. Comput. Sci. 354(3), 391–404 (2006)CrossRefMATH Antunes, L., Fortnow, L., van Melkebeek, D., Vinodchandran, N.V.: Computational depth: concept and applications. Theor. Comput. Sci. 354(3), 391–404 (2006)CrossRefMATH
2.
go back to reference Bauwens, B.: Computability in statistical hypotheses testing, and characterizations of independence and directed influences in time series using Kolmogorov complexity. Ph.D. thesis, University of Gent (2010) Bauwens, B.: Computability in statistical hypotheses testing, and characterizations of independence and directed influences in time series using Kolmogorov complexity. Ph.D. thesis, University of Gent (2010)
4.
go back to reference Cover, T.M.: Attending [10] (2002). Email to Paul Vitányi Cover, T.M.: Attending [10] (2002). Email to Paul Vitányi
6.
go back to reference Gács, P.: Attending [10] (2002). E-mail to Paul Vitányi Gács, P.: Attending [10] (2002). E-mail to Paul Vitányi
7.
go back to reference Gács, P., Tromp, J., Vitányi, P.M.B.: Algorithmic statistics. IEEE Trans. Inf. Theory 47(6), 2443–2463 (2001) (Corrected in 48(8), 2427 (2002)) Gács, P., Tromp, J., Vitányi, P.M.B.: Algorithmic statistics. IEEE Trans. Inf. Theory 47(6), 2443–2463 (2001) (Corrected in 48(8), 2427 (2002))
8.
go back to reference Hammer, D., Romashchenko, A., Shen, A., Vereshchagin, N.: Inequalities for Shannon entropy and Kolmogorov complexity. J. Comput. Syst. Sci. 60(2), 442–464 (2000)MathSciNetCrossRefMATH Hammer, D., Romashchenko, A., Shen, A., Vereshchagin, N.: Inequalities for Shannon entropy and Kolmogorov complexity. J. Comput. Syst. Sci. 60(2), 442–464 (2000)MathSciNetCrossRefMATH
9.
go back to reference Kolmogorov, A.N.: Complexity of algorithms and an objective definition of randomness. Uspekhi Matematicheskikh Nauk 29(4), 155 (1974) (Abstract of a talk at the meeting of the Moscow Mathematical Society, April 16, 1974, in Russian) Kolmogorov, A.N.: Complexity of algorithms and an objective definition of randomness. Uspekhi Matematicheskikh Nauk 29(4), 155 (1974) (Abstract of a talk at the meeting of the Moscow Mathematical Society, April 16, 1974, in Russian)
10.
go back to reference Kolmogorov, A.N.: Talk at the information theory symposium in Tallinn, Estonia (1974) (According to [6, 4]) Kolmogorov, A.N.: Talk at the information theory symposium in Tallinn, Estonia (1974) (According to [6, 4])
11.
go back to reference Koppel, M.: Structure. In: Herken, R. (ed.) The Universal Turing Machine: A Half-Century Survey, pp. 435–452. Oxford University Press, Oxford (1988) Koppel, M.: Structure. In: Herken, R. (ed.) The Universal Turing Machine: A Half-Century Survey, pp. 435–452. Oxford University Press, Oxford (1988)
12.
go back to reference Levin, L.A.: Emails to Paul Vitányi (2002) Levin, L.A.: Emails to Paul Vitányi (2002)
13.
go back to reference Levin, L.A.: Randomness conservation inequalities; information and independence in mathematical theories. Inf. Control 61(1), 15–37 (1984)CrossRefMATH Levin, L.A.: Randomness conservation inequalities; information and independence in mathematical theories. Inf. Control 61(1), 15–37 (1984)CrossRefMATH
14.
go back to reference Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer, Berlin (2008)CrossRefMATH Li, M., Vitányi, P.M.B.: An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn. Springer, Berlin (2008)CrossRefMATH
15.
go back to reference Muchnik, A.A., Mezhirov, I., Shen, A., Vereshchagin, N.: Game interpretation of Kolmorogov complexity. Technical report arxiv:1003.4712 (2010) Muchnik, A.A., Mezhirov, I., Shen, A., Vereshchagin, N.: Game interpretation of Kolmorogov complexity. Technical report arxiv:​1003.​4712 (2010)
16.
go back to reference Muchnik, A.A., Romashchenko, A.E.: Stability of properties of Kolmogorov complexity under relativization. Probl. Inf. Transm. 46(1), 38–61 (2010)MathSciNetCrossRefMATH Muchnik, A.A., Romashchenko, A.E.: Stability of properties of Kolmogorov complexity under relativization. Probl. Inf. Transm. 46(1), 38–61 (2010)MathSciNetCrossRefMATH
17.
18.
go back to reference RAND Corporation: A million random digits with 100,000 normal deviates. Free Press, Glencoe (1955) RAND Corporation: A million random digits with 100,000 normal deviates. Free Press, Glencoe (1955)
19.
20.
go back to reference Romashchenko, A., Shen, A., Vereshchagin, N.: Combinatorial interpretation of Kolmogorov complexity. Theor. Comput. Sci. 271(1–2), 111–123 (2002)MathSciNetCrossRefMATH Romashchenko, A., Shen, A., Vereshchagin, N.: Combinatorial interpretation of Kolmogorov complexity. Theor. Comput. Sci. 271(1–2), 111–123 (2002)MathSciNetCrossRefMATH
21.
go back to reference Shen, A.: The concept of \((\alpha,\beta )\)-stochasticity in the Kolmogorov sense, and its properties. Sov. Math. Dokl. 28(1), 295–299 (1983)MATH Shen, A.: The concept of \((\alpha,\beta )\)-stochasticity in the Kolmogorov sense, and its properties. Sov. Math. Dokl. 28(1), 295–299 (1983)MATH
22.
go back to reference Vereshchagin, N.: On algorithmic strong sufficient statistics. In: Bonizzoni, V.B.P., Löwe, B. (eds.) The Nature of Computation: Logic, Algorithms, Applications, Proceedings of the Ninth Conference on Computability in Europe (CiE 2013) Vereshchagin, N.: On algorithmic strong sufficient statistics. In: Bonizzoni, V.B.P., Löwe, B. (eds.) The Nature of Computation: Logic, Algorithms, Applications, Proceedings of the Ninth Conference on Computability in Europe (CiE 2013)
23.
go back to reference Vereshchagin, N.: Algorithmic minimal sufficient statistic revisited. In: Ambos-Spies, K., Merkle, W. (eds.) Mathematical Theory and Computational Practice, Proceedings of the Fifth Conference on Computability in Europe (CiE 2009). Lecture Notes in Computer Science, vol. 5635, pp. 478–487. Springer, Berlin (2009)CrossRef Vereshchagin, N.: Algorithmic minimal sufficient statistic revisited. In: Ambos-Spies, K., Merkle, W. (eds.) Mathematical Theory and Computational Practice, Proceedings of the Fifth Conference on Computability in Europe (CiE 2009). Lecture Notes in Computer Science, vol. 5635, pp. 478–487. Springer, Berlin (2009)CrossRef
24.
go back to reference Vereshchagin, N., Vitanyi, P.M.B.: Kolmogorov’s structure functions with an application to the foundations of model selection. IEEE Trans. Inf. Theory 50(12), 3265–3290 (2004)MathSciNetCrossRefMATH Vereshchagin, N., Vitanyi, P.M.B.: Kolmogorov’s structure functions with an application to the foundations of model selection. IEEE Trans. Inf. Theory 50(12), 3265–3290 (2004)MathSciNetCrossRefMATH
25.
go back to reference Vereshchagin, N.K., Vitányi, P.M.B.: Rate distortion and denoising of individual data using Kolmogorov complexity. IEEE Trans. Inf. Theory 56(7), 3438–3454 (2010)CrossRef Vereshchagin, N.K., Vitányi, P.M.B.: Rate distortion and denoising of individual data using Kolmogorov complexity. IEEE Trans. Inf. Theory 56(7), 3438–3454 (2010)CrossRef
Metadata
Title
Algorithmic Statistics Revisited
Authors
Nikolay Vereshchagin
Alexander Shen
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-21852-6_17

Premium Partner