Solomonoff induction without perfect memory
In this post I construct a variant of Solomonoff induction which allows for agents with imperfect memory.
Solomonoff induction is a mathematical formalization of Occam’s razor, and is supposed to be a “master equation” of epistemic rationality. In order words, forming rational expectations about the future is supposed to be tantamount to computing Solomonoff induction (approximately, since it’s incomputable).
Solomonoff induction operates by assigning probabilities to continuations of a given finite sequence . In the simplest formalism, the sequence elements are bits. represents past observations and represents a possible future.
A rational agent is supposed to use to evaluate the probability of . There are two problems with this. One is that is uncomputable whereas has a limited amount of computing resources in its disposal. For now, we are going to ignore this. Another problem is that doesn’t have direct access to . can only estimate from ’s memory. Moreover, this estimation depends on knowledge has about reality which is supposed to be generated by Solomonoff induction itself. In other words, inferring the future from the past only makes sense in the approximation of perfect memory, in general we need to be able to infer both the past and the future from the present.
The bit-sequence formalism is not well-adapted to this, since the present state of has to contain all of ’s memory about the past so it has to be more than a single bit. I suggest solving this by considering sequences of natural numbers instead of bit-sequences . Generalizing regular Solomonoff induction to natural numbers is straight-forward: the random program computes convergent sequences of lower bounds for the transition probabilities
Now we want a new induction procedure whose input is a single natural number representing the state of ’s consciousness which allows assigning probabilities to sequences of natural numbers containing . We achieve this by assigning the following probabilities to Solomonoff hypotheses :
Here, is a normalization factor, stands for expectation value and is defined by
where is the infinite sequence of natural numbers “generated” by .
The factor penalizes hypotheses in which appears late in . Its form was chosen to achieve equal suppression of two spurious hypotheses we want to avoid:
The “timeless hypothesis” which computes “true” silently (i.e. w/o outputting it), extracts by evaluating and generates the constant sequence . It gains a factor of about on the “correct” hypothesis by reducing age to 0 but loses a factor of about due to the increased complexity of specifying explicitly. Its overall suppression is thus about
The “Boltzmann brain” hypothesis which generates the sequence . It gains due to its reduced complexity but is suppressed by due to increased age. Since an agent is supposed to record memories, we expect to be exponential in . We thus get the same order-of-magnitude suppression as before
I think it should be possible to define a class of cellular automata and initial conditions s.t. the state of the automaton evolves indefinitely rather than e.g. becoming periodic, for which my induction correctly infers the rules of from the state of the automaton at a sufficiently late time. Note that for many cellular automata, the state is exponential in the time since changes propagate at a fixed velocity.
Another application is to the anthropic trilemma. This formalism suggests continuity of experience is fundamentally meaningful and that subjective probabilities behave just like objective probabilities. We thus eliminate the second and third horns. However, explicitly applying this formalism to solve solve the trilemma remains challenging. In a future post I am going to argue the first horn is actually correct but without giving a qualitative proof using this formalism.
A somewhat speculative application is applying the formalism repeatedly w/o recording the past. Given consciousness state we are able to compute the probability distribution of next consciousness state . If we wish to consistently continue one step further, we should use both and . However, can only use to estimate probabilities. This way we are led to a Markovian stochastic process. The physical meaning of this construction is not clear but in some ways this Markovian process is more interesting than the non-Markovian process you get with regular Solomonoff induction or with recording the past in the current formalism. In regular Solomonoff induction, as time progresses gains information but the physics is fixed. So to speak, the map improves but the territory stays the same. The Markovian version allows for actual physics to change but cannot notice it! The process follows a pattern but the pattern keeps randomly evolving. Maybe this construction is more than just a mathematical artifact?
You can remember past observations as a shortest program that matches them, potentially compressing them fairly dramatically. If it still doesn’t fit, one could probably also perform some sort of search for the optimal lossy encoding scheme as well. This would break your assumptions about length of remembered s being independent from the hypotheses.