Indeed de Blanc’s paper explores questions which I intend to solve in this series.
I don’t think your specific formula works. It looks like it either doesn’t handle the multi-level nature of explanations of reality (with utility functions generally defined at the higher levels and physics at the lowest level)...
This decomposition is something I plan to address in detail in the followup post, where physics comes into the picture. However, I hope to convince you there is no problem in the formula.
...Solomonoff induction finding the true multi-level explanation from which we can just pick out the information at the level we want. But, this doesn’t work because (a) Solomonoff induction will probably just find models of physics, not multi-level explanations, (b) even if it did (since we used something like the speed prior), we don’t have reason to believe that they’ll be the same multi-level explanations that humans use, (c) if we did something like only care about models that happen to contain game of life states in exactly the way we want (which is nontrivial given that some random noise could be plausibly viewed as a game of life history), we’d essentially be conditioning an a very weird event (that high-level information is directly part of physics and the game of life model you’re using is exactly correct with no exceptions including cosmic rays), which I think might cause problems.
It seems that you are thinking about the binary sequence x as a sequence of observations made by an observer, which is the classical setup of Solomonoff induction. However, in my approach there is no fixed a priori relation between the sequence and any specific description of the physical universe.
In UDT, we consider conditional expectation values with respect to logical uncertainty of the form
val(a; U) = E(E(U) | The agent implements input-output mapping a)
where the inner E(U) is the Solomonoff expectation value from equation [1] and the outer E refers to logical uncertainty. Therefore, out of all programs contributing the the Solomonoff ensemble, the ones that contribute to the a-dependence of val are the ones that produce the universe encoded in a form compatible with f.
It looks like it subtracts the total number of cells, so it prefers for there to be fewer total cells satisfying the game of life rules?
No, it subtracts something proportional to the number of cells that don’t satisfy the rules.
I take it this is because we’re using a Solomonoff prior over universe histories?
Of course.
I find this statement plausible but 2^K is a pretty large factor.
It is of similar magnitude to differences between using different universal Turing machines in the definition of the Solomonoff ensemble. These difference become negligible for agents that work with large amounts of evidence.
Also, if we define f to be a completely unreasonable function (e.g. it arranges the universe in a way so that no gliders are detected, or it chooses to simulate a whole lot of gliders or not based on some silly property of the universe), then it seems like you have proven that your utility function can never be more than a factor of 2^K away from what you’d get with f.
f is required to be bijective, so it cannot lose or create information. Therefore, regardless of f, some programs in the Solomonoff ensemble will produce gliders and others won’t.
It is of similar magnitude to differences between using different universal Turing machines in the definition of the Solomonoff ensemble. These difference become negligible for agents that work with large amounts of evidence.
Hmm, I’m not sure that this is something that you can easily get evidence for or against? The 2^K factor in ordinary Solomonoff induction is usually considered fine because it can only cause you to make at most K errors. But here it’s applying to utilities, which you can’t get evidence for or against the same way you can for probabilities.
f is required to be bijective, so it cannot lose or create information. Therefore, regardless of f, some programs in the Solomonoff ensemble will produce gliders and others won’t.
Okay, I see how this is true. But we could design f so that it only creates gliders if the universe satisfies some silly property. It seems like this would lead us to only care about universes satisfying this silly property, so the silly property would end up being our utility function.
Consider 3 possible encodings: f maps “0” bits in our sequence to empty cells and “1″ bits to full cells, while traversing the cells in V in some predefined order. g is the same with 0 and 1 reversed. h computes the XOR of all bits up to a given point and uses that to define the state of the cell.
So far we haven’t discussed what the agent G itself is like. Suppose G is literally a construct within the Game of Life universe. Now, our G is a UDT agent so it decides its action by computing logical uncertainty conditional expectation values of the form roughly val(a) = E(E(U) | G() = a) where “G()” stands for evaluating the program implemented by G. How does the condition “G() = a” influence E(U)? Since G actually exists within some GoL partial histories, different actions of G lead to different continuations of these histories. Depending on the choice of f,g,h these histories will map to very different binary sequences. However in all cases the effect of G’s action on the number of gliders in its universe will be very similar.
It still seems like this is very much affected by the measure you assign to different game of life universes, and that the measure strongly depends on f.
Suppose we want to set f to control the agent’s behavior, so that when it sees sensory data s, it takes silly action a(s), where a is a short function. To work this way, f will map game of life states in which the agent has seen s and should take action a(s) to binary strings that have greater measure, compared to game of life states in which the agent has seen s and should take some other action. I think this is almost always possible due to the agent’s partial information about the world: there is nearly always an infinite number of world states in which a(s) is a good idea, regardless of s. f has a compact description (not much longer than a), and it forces the agent’s behavior to be equal to a(s) (except in some unrealistic cases where the agent has very good information about the world).
What you’re saying can be rephrased as follows. The prior probability measure on the space of (possibly rule-violating) Game of Life histories depends on f since it is the f-image of the Solomonoff measure. You are right. However, the dependence is as strong as the dependence of the Solomonoff measure on the choice of a universal Turing machine.
In other words, the complexity of the f you need to make G take a silly action is about the same as the complexity of the universal Turing machine you need to make G take the same action.
Thx for commenting!
Indeed de Blanc’s paper explores questions which I intend to solve in this series.
This decomposition is something I plan to address in detail in the followup post, where physics comes into the picture. However, I hope to convince you there is no problem in the formula.
It seems that you are thinking about the binary sequence x as a sequence of observations made by an observer, which is the classical setup of Solomonoff induction. However, in my approach there is no fixed a priori relation between the sequence and any specific description of the physical universe.
In UDT, we consider conditional expectation values with respect to logical uncertainty of the form
val(a; U) = E(E(U) | The agent implements input-output mapping a)
where the inner E(U) is the Solomonoff expectation value from equation [1] and the outer E refers to logical uncertainty. Therefore, out of all programs contributing the the Solomonoff ensemble, the ones that contribute to the a-dependence of val are the ones that produce the universe encoded in a form compatible with f.
No, it subtracts something proportional to the number of cells that don’t satisfy the rules.
Of course.
It is of similar magnitude to differences between using different universal Turing machines in the definition of the Solomonoff ensemble. These difference become negligible for agents that work with large amounts of evidence.
f is required to be bijective, so it cannot lose or create information. Therefore, regardless of f, some programs in the Solomonoff ensemble will produce gliders and others won’t.
Thanks for the additional explanation.
Hmm, I’m not sure that this is something that you can easily get evidence for or against? The 2^K factor in ordinary Solomonoff induction is usually considered fine because it can only cause you to make at most K errors. But here it’s applying to utilities, which you can’t get evidence for or against the same way you can for probabilities.
Okay, I see how this is true. But we could design f so that it only creates gliders if the universe satisfies some silly property. It seems like this would lead us to only care about universes satisfying this silly property, so the silly property would end up being our utility function.
Let me explain by the way of example.
Consider 3 possible encodings: f maps “0” bits in our sequence to empty cells and “1″ bits to full cells, while traversing the cells in V in some predefined order. g is the same with 0 and 1 reversed. h computes the XOR of all bits up to a given point and uses that to define the state of the cell.
So far we haven’t discussed what the agent G itself is like. Suppose G is literally a construct within the Game of Life universe. Now, our G is a UDT agent so it decides its action by computing logical uncertainty conditional expectation values of the form roughly val(a) = E(E(U) | G() = a) where “G()” stands for evaluating the program implemented by G. How does the condition “G() = a” influence E(U)? Since G actually exists within some GoL partial histories, different actions of G lead to different continuations of these histories. Depending on the choice of f,g,h these histories will map to very different binary sequences. However in all cases the effect of G’s action on the number of gliders in its universe will be very similar.
It still seems like this is very much affected by the measure you assign to different game of life universes, and that the measure strongly depends on f.
Suppose we want to set f to control the agent’s behavior, so that when it sees sensory data s, it takes silly action a(s), where a is a short function. To work this way, f will map game of life states in which the agent has seen s and should take action a(s) to binary strings that have greater measure, compared to game of life states in which the agent has seen s and should take some other action. I think this is almost always possible due to the agent’s partial information about the world: there is nearly always an infinite number of world states in which a(s) is a good idea, regardless of s. f has a compact description (not much longer than a), and it forces the agent’s behavior to be equal to a(s) (except in some unrealistic cases where the agent has very good information about the world).
What you’re saying can be rephrased as follows. The prior probability measure on the space of (possibly rule-violating) Game of Life histories depends on f since it is the f-image of the Solomonoff measure. You are right. However, the dependence is as strong as the dependence of the Solomonoff measure on the choice of a universal Turing machine.
In other words, the complexity of the f you need to make G take a silly action is about the same as the complexity of the universal Turing machine you need to make G take the same action.