Kolmogorov’s theory of Algorithmic Probability

Link post

Motivation:

In 1970, Andrey Kolmogorov presented a remarkable vision for the combinatorial foundations of probability theory at the International Congress of Mathematicians in Nice which anticipates the modern development of Artificial Intelligence:

Using his brain, as given by the Lord, a mathematician may not be interested in the combinatorial basis of his work. But the artificial intellect of machines must be created by man, and man has to plunge into the indispensable combinatorial mathematics. - Kolmogorov

The full text was eventually published in 1983 [1] and Kolmogorov’s papers on the subject, where he formulated the fundamental notion of Kolmogorov Complexity, represent a partial fulfilment of his original vision. However, it would be his PhD student Leonid Levin that would complete Kolmogorov’s theory of Algorithmic Probability via Levin’s Universal Distribution and Levin’s Coding theorem which together emphasize the fundamental importance of prefix-free encodings.

As Marvin Minsky, Marcus Hutter and Shane Legg have fully recognised, the implications of Kolmogorov’s theory for Artificial Intelligence can’t be overstated as it defines the fundamental limits of Machine Learning. In particular, Kolmogorov’s theory allows us to reason about Black Swans in AI Safety as Black Swans represent the epistemic limits of a particular ontology.

I wrote this synthesis thanks to the encouragement of Igor Rivin and Alex Kolpakov(aka Sasha), who both recognise that outside a small number of elite institutions such as DeepMind and OpenAI...the current state of the scientific community’s understanding of Algorithmic Probability is limited.

Fundamental theorems of Algorithmic Probability:

In this synthesis, I carefully introduce and prove the four fundamental theorems of Algorithmic Probability which lead to Kolmogorov’s formulation of Entropy as Expected Kolmogorov Complexity. In summary, these are:

  1. Kolmogorov’s Invariance theorem

  2. Levin’s Universal Distribution

  3. Levin’s Coding theorem

  4. Expected Kolmogorov Complexity equals Shannon Entropy

As an application, the last fundamental theorem is then used to derive the Erdős-Euclid theorem which demonstrates that the information content of finitely many primes is insufficient to generate all the integers. There is also a discussion section where I briefly consider the implications of Markus Müller’s combinatorial reformulation of the Universal Wave Function [3], where Levin’s Universal Distribution predicts the statistical regularities of Physical Reality.

As essential prior knowledge, I recommend reading David McKay’s thesis on the correspondence between Machine Learning and Data Compression [4].

Applications:

As a fundamental application of Kolmogorov’s definition of Entropy as Expected Kolmogorov Complexity, Sasha and myself have analysed the Game of 20 Questions. It may be demonstrated that, for repeated games, the rank order statistics must satisfy Zipf’s Law which may be derived from Levin’s Universal Distribution.

If members of the Less Wrong community are interested in a particular application of Algorithmic Probability, feel free to express this interest in the comments below so Sasha and myself may add it to our priority queue. We are particularly inclined to share our analyses of Algorithmic Probability applied to Probabilistic Number Theory as the arguments are simultaneously subtle, verifiable, beautiful and counter-intuitive which creates a structured Gymnasium for the Mind when it comes to reasoning about Machine Epistemology.

As a friend of mine once conjectured, superintelligent machines are bound to follow their own nature but nature may follow a structure if it finds this structure both beautiful and useful.

References:

  1. A.N. Kolmogorov. Combinatorial foundations of information theory and the calculus of probabilities. Russian Math. Surveys (1983).

  2. Aidan Rocke (https://​​mathoverflow.net/​​users/​​56328/​​aidan-rocke), Kolmogorov’s approach to probability theory, URL (version: 2023-07-29): https://​​mathoverflow.net/​​q/​​430193

  3. Markus Müller. Law without law: from observer states to physics via algorithmic information theory. Arxiv. 2020.

  4. MacKay, David J. C. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003.

  5. Rocke (2023, Aug. 2). Kepler Lounge: Kolmogorov’s theory of Algorithmic Probability. Retrieved from keplerlounge.com

  6. L.A. Levin. Laws of information conservation (non-growth) and aspects of the foundation of probability theory. Problems Information Transmission, 10(3):206-210, 1974.

  7. Ioannis Kontoyiannis. Some information-theoretic computations related to the distribution of prime numbers. In Peter Grunwald, Petri Myllymaki, Ioana Tabus, Marcelo Weinberger, and Bin Yu, editors, Festschrift in Honor of Jorma Rissanen, pages 135-143, Tampere University Press, May 2008.

  8. G.J. Chaitin. Epistemology as Information Theory. Arxiv. 2005.