Umm… what about my argument that a human can represent their predictions symbolically like “P(next bit is 1)=i-th bit of BB(100)” instead of using numerals, and thereby do better than a Solomonoff predictor because the Solomonoff predictor can’t incorporate this?
A trivial but noteworthy fact is that every finite sequence of Σ values, such as Σ(0), Σ(1), Σ(2), …, Σ(n) for any given n, is computable, even though the infinite sequence Σ is not computable (see computable function examples.
Sorry, I should have used BB(2^100) as the example. The universal prior assigns the number BB(2^100) a very small weight, because the only way to represent it computably is by giving a 2^100 state Turing machine. A human would assign it a much larger weight, referencing it by its short symbolic representation.
Until I write up a better argument, you might want to (assuming you haven’t already) read this post where I gave a decision problem that a human does (infinitely) better than AIXI.
I don’t think I understood that fully, but there seems to be a problem with your theory. The human gets to start in the epistemically advantaged position of knowing that the game is based on a sequence of busy beavers and knowing that they are a very fast growing function. AIXI is prevent from knowing this information and has to start as if from a blank canvas. The reason we use a Occamian prior for AIXI is because we refuse to tailor it to a specific environment, if your logic is sound, then yes, it does do worse when it is dropped into an environment where it is paired with a human with an epistemic advantage, but it would beat the human across the space of possible worlds.
Another problem you seem to have is to assume that the only hypothesis in the entire set that gives useful predictions is the hypothesis which is, in fact, correct. There are plenty of other function which correctly predict arbitrarily large numbers of 1′s, with much less complexity, which can give the overall probability weighting that AIXI is using a usefully correct model of its universe, if not a fully correct one.
How a human might come to believe, without being epistemically privileged, that a sequence is probably a sequence of busy beavers, is a deep problem, similar to the problem of distinguishing halting oracles from impostors. (At least one mathematical logician who has thought deeply about the latter problem thinks that it’s doable.)
But in any case, the usual justification for AIXI (or adopting the universal prior) is that (asymptotically) it does as well as or better than any computable agent, even one that is epistemically privileged, as long as the environment is computable. Eliezer and others were claiming that it does as well as or better than any computable agent, even if the environment is not computable, and this is what my counter-example disproves.
So you think that we need to rethink our theory of what perfect optimization is, in order to take into account the possibility we live in an uncomputable universe? Even if you are correct in your example, there is no reason to suppose that your human does better in the space of possible uncomputable universes than AIXI, as opposed to better in that one possible (impossible) universe.
So you think that we need to rethink our theory of what perfect optimization is, in order to take into account the possibility we live in an uncomputable universe?
Yes.
Even if you are correct in your example, there is no reason to suppose that your human does better in the space of possible uncomputable universes than AIXI, as opposed to better in that one possible (impossible) universe.
This seems pretty easy, given the same level of raw computing power available to AIXI (otherwise the human gets screwed in the majority of cases simply because he doesn’t have enough computing power).
For example, I can simply modify AIXI with a rule that says “if you’ve seen a sequence of increasingly large numbers that can’t be explained by any short computable rule, put some weight into it being BB(1)...BB(2^n)… (and also modify it to reasoning symbolically about expected utilities instead of comparing numbers) and that will surely be an improvement over all possible uncomputable universes. (ETA: Strike that “surely”. I have to think this over more carefully.)
How to make an optimal decision algorithm (as opposed to just improving upon AIXI) is still an open problem.
For example, I can simply modify AIXI with a rule that says “if you’ve seen a sequence of increasingly large numbers that can’t be explained by any short computable rule, put some weight into it being BB(1)...BB(2^n)… (and also modify it to reasoning symbolically about expected utilities instead of comparing numbers) and that will surely be an improvement over all possible uncomputable universes. (ETA: Strike that “surely”. I have to think this over more carefully.)
This is what I dislike about your logic. You create a situation where (you think) AIXI fails, but you fail to take into account the likelihood of being in the situation versus being in a similar situation. I can easily see a human seeing a long series of ones, with some zeros at the beginning, saying “ah-ha, this must be the result of a sequence of busy beavers”, when all he’s actually seeing is 3^^^3 minus his telephone number or something. AIXI can lose in really improbable universes, because it’s designed to work in the space of universes, not some particular one. By modifying the rules, you can make it better in specific universes, but only by reducing its performance in similar seeming universes.
What about the agent using Solomonoff’s distribution? After seeing
BB(1),...,BB(2^n), the algorithmic complexity of BB(1),...,BB(2^n) is sunk,
so to speak. It will predict a higher expected payoff for playing 0 in any
round i where the conditional complexity K(i | BB(1),...,BB(2^n)) < 100.
This includes for example 2BB(2^n), 2BB(2^n)+1, BB(2^n)^2 * 3 + 4,
BB(2^n)^^^3, etc. It will bet on 0 in these rounds (erroneously, since
K(BB(2^(n+1)) | BB(2^n)) > 100 for large n), and therefore lose relative to
a human.
I don’t understand how the bolded part follows. The best explanation by round BB(2^n) would be “All 1′s except for the Busy Beaver numbers up to 2^n”, right?
Yes, that’s the most probable explanation according to the Solomonoff prior, but AIXI doesn’t just use the most probable explanation to make decisions, it uses all computable explanations that haven’t been contradicted by its input yet. For example, “All 1′s except for the Busy Beaver numbers up to 2^n and 2BB(2^n)” is only slightly less likely than “All 1′s except for the Busy Beaver numbers up to 2^n” and is compatible with its input so far. The conditional probability of that explanation, given what it has seen, is high enough that it would bet on 0 at round 2BB(2^n), whereas the human wouldn’t.
EDIT: Wouldn’t it also break even by predicting the next Busy Beaver number? “All 1′s except for BB(1...2^n+1)” is also only slightly less likely. EDIT: I feel more stupid.
The next number in the sequence is BB(2^(n+1)), not BB(2^n+1).
ETA: In case more explanation is needed, it takes O(2^n) more bits to computably describe BB(2^(n+1)), even if you already have BB(2^n). (It might take O(2^n) more bits to describe BB(2^n+1) as well, but I wasn’t sure so I used BB(2^(n+1)) in my example instead.)
Since K(BB(2^(n+1)) | BB(2^n)) > 100 for large n, AIXI actually will not bet on 0 when BB(2^(n+1) comes around, and all those 0s that it does bet on are simply “wasted”.
BB(100) is computable. Am I missing something?
Maybe… by BB I mean the Busy Beaver function Σ as defined in this Wikipedia entry.
Right, and...
So why can’t the universal prior use it?
Sorry, I should have used BB(2^100) as the example. The universal prior assigns the number BB(2^100) a very small weight, because the only way to represent it computably is by giving a 2^100 state Turing machine. A human would assign it a much larger weight, referencing it by its short symbolic representation.
Until I write up a better argument, you might want to (assuming you haven’t already) read this post where I gave a decision problem that a human does (infinitely) better than AIXI.
I don’t think I understood that fully, but there seems to be a problem with your theory. The human gets to start in the epistemically advantaged position of knowing that the game is based on a sequence of busy beavers and knowing that they are a very fast growing function. AIXI is prevent from knowing this information and has to start as if from a blank canvas. The reason we use a Occamian prior for AIXI is because we refuse to tailor it to a specific environment, if your logic is sound, then yes, it does do worse when it is dropped into an environment where it is paired with a human with an epistemic advantage, but it would beat the human across the space of possible worlds.
Another problem you seem to have is to assume that the only hypothesis in the entire set that gives useful predictions is the hypothesis which is, in fact, correct. There are plenty of other function which correctly predict arbitrarily large numbers of 1′s, with much less complexity, which can give the overall probability weighting that AIXI is using a usefully correct model of its universe, if not a fully correct one.
How a human might come to believe, without being epistemically privileged, that a sequence is probably a sequence of busy beavers, is a deep problem, similar to the problem of distinguishing halting oracles from impostors. (At least one mathematical logician who has thought deeply about the latter problem thinks that it’s doable.)
But in any case, the usual justification for AIXI (or adopting the universal prior) is that (asymptotically) it does as well as or better than any computable agent, even one that is epistemically privileged, as long as the environment is computable. Eliezer and others were claiming that it does as well as or better than any computable agent, even if the environment is not computable, and this is what my counter-example disproves.
So you think that we need to rethink our theory of what perfect optimization is, in order to take into account the possibility we live in an uncomputable universe? Even if you are correct in your example, there is no reason to suppose that your human does better in the space of possible uncomputable universes than AIXI, as opposed to better in that one possible (impossible) universe.
Yes.
This seems pretty easy, given the same level of raw computing power available to AIXI (otherwise the human gets screwed in the majority of cases simply because he doesn’t have enough computing power).
For example, I can simply modify AIXI with a rule that says “if you’ve seen a sequence of increasingly large numbers that can’t be explained by any short computable rule, put some weight into it being BB(1)...BB(2^n)… (and also modify it to reasoning symbolically about expected utilities instead of comparing numbers) and that will surely be an improvement over all possible uncomputable universes. (ETA: Strike that “surely”. I have to think this over more carefully.)
How to make an optimal decision algorithm (as opposed to just improving upon AIXI) is still an open problem.
This is what I dislike about your logic. You create a situation where (you think) AIXI fails, but you fail to take into account the likelihood of being in the situation versus being in a similar situation. I can easily see a human seeing a long series of ones, with some zeros at the beginning, saying “ah-ha, this must be the result of a sequence of busy beavers”, when all he’s actually seeing is 3^^^3 minus his telephone number or something. AIXI can lose in really improbable universes, because it’s designed to work in the space of universes, not some particular one. By modifying the rules, you can make it better in specific universes, but only by reducing its performance in similar seeming universes.
I don’t understand how the bolded part follows. The best explanation by round BB(2^n) would be “All 1′s except for the Busy Beaver numbers up to 2^n”, right?
Yes, that’s the most probable explanation according to the Solomonoff prior, but AIXI doesn’t just use the most probable explanation to make decisions, it uses all computable explanations that haven’t been contradicted by its input yet. For example, “All 1′s except for the Busy Beaver numbers up to 2^n and 2BB(2^n)” is only slightly less likely than “All 1′s except for the Busy Beaver numbers up to 2^n” and is compatible with its input so far. The conditional probability of that explanation, given what it has seen, is high enough that it would bet on 0 at round 2BB(2^n), whereas the human wouldn’t.
Oh.
I feel stupid now.
EDIT: Wouldn’t it also break even by predicting the next Busy Beaver number? “All 1′s except for BB(1...2^n+1)” is also only slightly less likely. EDIT: I feel more stupid.
The next number in the sequence is BB(2^(n+1)), not BB(2^n+1).
ETA: In case more explanation is needed, it takes O(2^n) more bits to computably describe BB(2^(n+1)), even if you already have BB(2^n). (It might take O(2^n) more bits to describe BB(2^n+1) as well, but I wasn’t sure so I used BB(2^(n+1)) in my example instead.)
Since K(BB(2^(n+1)) | BB(2^n)) > 100 for large n, AIXI actually will not bet on 0 when BB(2^(n+1) comes around, and all those 0s that it does bet on are simply “wasted”.
You can find it by emulating the Busy Beaver.
BB(100) is computable—and BB(2^100) is computable too :-(