In this post I construct a variant of Solomonoff induction which allows for agents with imperfect memory.
Solomonoff induction is a mathematical formalization of Occam’s razor, and is supposed to be a “master equation” of epistemic rationality. In order words, forming rational expectations about the future is supposed to be tantamount to computing Solomonoff induction (approximately, since it’s incomputable).
Solomonoff induction operates by assigning probabilities Pr(s∗∣s) to continuations s∗ of a given finite sequence s. In the simplest formalism, the sequence elements are bits.s represents past observations and s∗ represents a possible future.
A rational agent A is supposed to use Pr(s∗∣s) to evaluate the probability of s∗. There are two problems with this. One is that Pr(s∗∣s) is uncomputable whereas A has a limited amount of computing resources in its disposal. For now, we are going to ignore this. Another problem is that A doesn’t have direct access to s. A can only estimate s from A’s memory. Moreover, this estimation depends on knowledge A has about reality which is supposed to be generated by Solomonoff induction itself. In other words, inferring the future from the past only makes sense in the approximation of perfect memory, in general we need to be able to infer both the past and the future from the present.
The bit-sequence formalism is not well-adapted to this, since the present state of A has to contain all of A’s memory about the past so it has to be more than a single bit. I suggest solving this by considering sequences υ of natural numbers instead of bit-sequences s. Generalizing regular Solomonoff induction to natural numbers is straight-forward: the random program R computes convergent sequences of lower bounds for the transition probabilities
PrR(υi=n∣{υj}j<i)
Now we want a new induction procedure whose input is a single natural number n representing the state of A’s consciousness which allows assigning probabilities to sequences of natural numbers containing n. We achieve this by assigning the following probabilities to Solomonoff hypotheses R:
Pr(R∣n)=Z−1E(ln(t(n)2)−1∣R)
Here, Z−1 is a normalization factor, E stands for expectation value and t(n)∈N∪{∞} is defined by
t(n)=min{i∣υi=n}
where {υi} is the infinite sequence of natural numbers “generated” by R.
The factor ln(t(n)2)−1 penalizes hypotheses in which n appears late in {υi}. Its form was chosen to achieve equal suppression of two spurious hypotheses we want to avoid:
The “timeless hypothesis” TL which computes “true” {υi} silently (i.e. w/o outputting it), extracts n by evaluating υt(n) and generates the constant sequence ∀i:υi=n. It gains a factor of about lnt(n) on the “correct” hypothesis by reducing age to 0 but loses a factor of about t(n) due to the increased complexity of specifying t(n) explicitly. Its overall suppression is thus about t(n)lnt(n)
The “Boltzmann brain” hypothesis BB which generates the sequence ∀i:υi=i. It gains due to its reduced complexity but is suppressed by lnnlnt(n) due to increased age. Since an agent is supposed to record memories, we expect n to be exponential in t(n). We thus get the same order-of-magnitude suppression as before
In this case equal suppression implies (roughly) maximal suppression since there is a trade-off between suppressing TL and suppressing BB.
I think it should be possible to define a class of cellular automata CA and initial conditions I s.t. the state of the automaton evolves indefinitely rather than e.g. becoming periodic, for which my induction correctly infers the rules of CA from the state of the automaton at a sufficiently late time. Note that for many cellular automata, the state is exponential in the time since changes propagate at a fixed velocity.
Another application is to the anthropic trilemma. This formalism suggests continuity of experience is fundamentally meaningful and that subjective probabilities behave just like objective probabilities. We thus eliminate the second and third horns. However, explicitly applying this formalism to solve solve the trilemma remains challenging. In a future post I am going to argue the first horn is actually correct but without giving a qualitative proof using this formalism.
A somewhat speculative application is applying the formalism repeatedly w/o recording the past. Given consciousness state n we are able to compute the probability distribution of next consciousness state m. If we wish to consistently continue one step further, we should use both n and m. However, A can only use m to estimate probabilities. This way we are led to a Markovian stochastic process. The physical meaning of this construction is not clear but in some ways this Markovian process is more interesting than the non-Markovian process you get with regular Solomonoff induction or with recording the past in the current formalism. In regular Solomonoff induction, as time progresses A gains information but the physics is fixed. So to speak, the map improves but the territory stays the same. The Markovian version allows for actual physics to change but A cannot notice it! The process follows a pattern but the pattern keeps randomly evolving. Maybe this construction is more than just a mathematical artifact?
Solomonoff induction without perfect memory
In this post I construct a variant of Solomonoff induction which allows for agents with imperfect memory.
Solomonoff induction is a mathematical formalization of Occam’s razor, and is supposed to be a “master equation” of epistemic rationality. In order words, forming rational expectations about the future is supposed to be tantamount to computing Solomonoff induction (approximately, since it’s incomputable).
Solomonoff induction operates by assigning probabilities Pr(s∗∣s) to continuations s∗ of a given finite sequence s. In the simplest formalism, the sequence elements are bits.s represents past observations and s∗ represents a possible future.
A rational agent A is supposed to use Pr(s∗∣s) to evaluate the probability of s∗. There are two problems with this. One is that Pr(s∗∣s) is uncomputable whereas A has a limited amount of computing resources in its disposal. For now, we are going to ignore this. Another problem is that A doesn’t have direct access to s. A can only estimate s from A’s memory. Moreover, this estimation depends on knowledge A has about reality which is supposed to be generated by Solomonoff induction itself. In other words, inferring the future from the past only makes sense in the approximation of perfect memory, in general we need to be able to infer both the past and the future from the present.
The bit-sequence formalism is not well-adapted to this, since the present state of A has to contain all of A’s memory about the past so it has to be more than a single bit. I suggest solving this by considering sequences υ of natural numbers instead of bit-sequences s. Generalizing regular Solomonoff induction to natural numbers is straight-forward: the random program R computes convergent sequences of lower bounds for the transition probabilities
PrR(υi=n∣{υj}j<i)
Now we want a new induction procedure whose input is a single natural number n representing the state of A’s consciousness which allows assigning probabilities to sequences of natural numbers containing n. We achieve this by assigning the following probabilities to Solomonoff hypotheses R:
Pr(R∣n)=Z−1E(ln(t(n)2)−1∣R)
Here, Z−1 is a normalization factor, E stands for expectation value and t(n)∈N∪{∞} is defined by
t(n)=min{i∣υi=n}
where {υi} is the infinite sequence of natural numbers “generated” by R.
The factor ln(t(n)2)−1 penalizes hypotheses in which n appears late in {υi}. Its form was chosen to achieve equal suppression of two spurious hypotheses we want to avoid:
The “timeless hypothesis” TL which computes “true” {υi} silently (i.e. w/o outputting it), extracts n by evaluating υt(n) and generates the constant sequence ∀i:υi=n. It gains a factor of about lnt(n) on the “correct” hypothesis by reducing age to 0 but loses a factor of about t(n) due to the increased complexity of specifying t(n) explicitly. Its overall suppression is thus about t(n)lnt(n)
The “Boltzmann brain” hypothesis BB which generates the sequence ∀i:υi=i. It gains due to its reduced complexity but is suppressed by lnnlnt(n) due to increased age. Since an agent is supposed to record memories, we expect n to be exponential in t(n). We thus get the same order-of-magnitude suppression as before
I think it should be possible to define a class of cellular automata CA and initial conditions I s.t. the state of the automaton evolves indefinitely rather than e.g. becoming periodic, for which my induction correctly infers the rules of CA from the state of the automaton at a sufficiently late time. Note that for many cellular automata, the state is exponential in the time since changes propagate at a fixed velocity.
Another application is to the anthropic trilemma. This formalism suggests continuity of experience is fundamentally meaningful and that subjective probabilities behave just like objective probabilities. We thus eliminate the second and third horns. However, explicitly applying this formalism to solve solve the trilemma remains challenging. In a future post I am going to argue the first horn is actually correct but without giving a qualitative proof using this formalism.
A somewhat speculative application is applying the formalism repeatedly w/o recording the past. Given consciousness state n we are able to compute the probability distribution of next consciousness state m. If we wish to consistently continue one step further, we should use both n and m. However, A can only use m to estimate probabilities. This way we are led to a Markovian stochastic process. The physical meaning of this construction is not clear but in some ways this Markovian process is more interesting than the non-Markovian process you get with regular Solomonoff induction or with recording the past in the current formalism. In regular Solomonoff induction, as time progresses A gains information but the physics is fixed. So to speak, the map improves but the territory stays the same. The Markovian version allows for actual physics to change but A cannot notice it! The process follows a pattern but the pattern keeps randomly evolving. Maybe this construction is more than just a mathematical artifact?