Why would you want to make any predictions at all? Predictions are not directly about value. It doesn’t seem that there is a place for the human concept of prediction in a foundational decision theory.
It doesn’t seem that there is a place for the human concept of prediction in a foundational decision theory.
I think that’s right. I was making the point about prediction because Eliezer still seems to believe that predictions of sensory experience is somehow fundamental, and I wanted to convince him that the universal prior is wrong even given that belief.
Still, universal prior does seem to be a universal way of eliciting what the human concept of prediction (expectation, probability) is, to the limit of our ability to train such a device, for exactly the reasons Eliezer gives: whatever is the concept we use, it’s in there, among the programs universal prior weights.
ETA: On the other hand, the concept thus reconstructed would be limited to talk about observations, and so won’t be a general concept, while human expectation is probably more general than that, and you’d need a general logical language to capture it (and a language of unknown expressive power to capture it faithfully).
ETA2: Predictions might still be a necessary concept to express the decisions that agent makes, to connect formal statements with what the agent actually does, and so express what the agent actually does as formal statements. We might have to deal with reality because the initial implementation of FAI has to be constructed specifically in reality.
Umm… what about my argument that a human can represent their predictions symbolically like “P(next bit is 1)=i-th bit of BB(100)” instead of using numerals, and thereby do better than a Solomonoff predictor because the Solomonoff predictor can’t incorporate this? Or in other words, the only reason the standard proofs of Solomonoff prediction’s optimality go through is that they assume predictions are represented using numerals?
Re: “what about my argument that a human can [adapt its razor a little] and thereby do better than a Solomonoff predictor because the Solomonoff predictor can’t incorporate this?”
There are at least two things “Solomonoff predictor” could refer to:
An intelligent agent with Solomonoff-based priors;
An agent who is wired to use a Solomonoff-based razor on their sense inputs;
A human is more like the first agent. The second agent is not really properly intelligent—and adapts poorly to new environments.
Humans are (can be represented by) turing machines. All halting turing machines are incorporated in AIXI. Therefore, anything that humans can do to more effectively predict something than a “mere machine” is already incorporated into AIXI.
More generally, anything you represent symbolically can be represented using binary strings. That’s how that string you wrote got to me in the first place. You converted the turing operations in your head into a string of symbols, a computer turned that into a string of digits, my computer turned it back into symbols and my brain used computable algorithms to make sense of them. What makes you think that any of this is impossible for AIXI?
Am I going crazy, or did you just basically repeat what Eliezer, Cyan, and Nesov said without addressing my point?
Do you guys think that you understand my argument and that it’s wrong, or that it’s too confusing and I need to formulate it better, or what? Everyone just seems to be ignoring it and repeating the standard party line....
ETA: Now reading the second part of your comment, which was added after my response.
ETA2: Clearly I underestimated the inferential distance here, but I thought at least Eliezer and Nesov would get it, since they appear to understand the other part of my argument about the universal prior being wrong for decision making, and this seems to be a short step. I’ll try to figure out how to explain it better.
If 4 people all think you’re wrong for the same reason, either you’re wrong or you’re not explaining yourself. You seem to disbelieve the first, so try harder with the explaining.
Well, people expect him to be making good points, even when they don’t understand him (ie, I don’t understand UDT fully, but it seems to be important). Also, he’s advocating further thinking, which is popular around here.
Well, people expect him to be making good points, even when they don’t understand him
And I really, really wish people would stop doing that, whether it’s for Wei_Dai or anyone else you deem to be smart.
Folks, you may think you’re doing us all a favor by voting someone up because they’re smart, but that policy has the effect of creating an information cascade, because it makes an inference bounce back, accumulating arbitrarily high support irrespective of its relationship to reality.
The content of a post or comment should screen off any other information about its value [1], including who made it.
[1] except in obvious cases like when someone is confirming that something is true about that person specifically
Please only vote up posts you both understand and approve of.
I agree, but would like to point out that I don’t see any evidence that people aren’t already doing this. As far as I can tell, Lucas was only speculating that people voted up my post based on the author. Several other of myrecentposts have fairly low scores, for example. (All of them advocated further thinking as well, so I don’t think that’s it either.)
In the limit, even if that one human is the only thing in all of the hypotheses that AIXI has under consideration, AIXI will be predicting precisely as that human does.
Umm… what about my argument that a human can represent their predictions symbolically like “P(next bit is 1)=i-th bit of BB(100)” instead of using numerals, and thereby do better than a Solomonoff predictor because the Solomonoff predictor can’t incorporate this?
A trivial but noteworthy fact is that every finite sequence of Σ values, such as Σ(0), Σ(1), Σ(2), …, Σ(n) for any given n, is computable, even though the infinite sequence Σ is not computable (see computable function examples.
Sorry, I should have used BB(2^100) as the example. The universal prior assigns the number BB(2^100) a very small weight, because the only way to represent it computably is by giving a 2^100 state Turing machine. A human would assign it a much larger weight, referencing it by its short symbolic representation.
Until I write up a better argument, you might want to (assuming you haven’t already) read this post where I gave a decision problem that a human does (infinitely) better than AIXI.
I don’t think I understood that fully, but there seems to be a problem with your theory. The human gets to start in the epistemically advantaged position of knowing that the game is based on a sequence of busy beavers and knowing that they are a very fast growing function. AIXI is prevent from knowing this information and has to start as if from a blank canvas. The reason we use a Occamian prior for AIXI is because we refuse to tailor it to a specific environment, if your logic is sound, then yes, it does do worse when it is dropped into an environment where it is paired with a human with an epistemic advantage, but it would beat the human across the space of possible worlds.
Another problem you seem to have is to assume that the only hypothesis in the entire set that gives useful predictions is the hypothesis which is, in fact, correct. There are plenty of other function which correctly predict arbitrarily large numbers of 1′s, with much less complexity, which can give the overall probability weighting that AIXI is using a usefully correct model of its universe, if not a fully correct one.
How a human might come to believe, without being epistemically privileged, that a sequence is probably a sequence of busy beavers, is a deep problem, similar to the problem of distinguishing halting oracles from impostors. (At least one mathematical logician who has thought deeply about the latter problem thinks that it’s doable.)
But in any case, the usual justification for AIXI (or adopting the universal prior) is that (asymptotically) it does as well as or better than any computable agent, even one that is epistemically privileged, as long as the environment is computable. Eliezer and others were claiming that it does as well as or better than any computable agent, even if the environment is not computable, and this is what my counter-example disproves.
So you think that we need to rethink our theory of what perfect optimization is, in order to take into account the possibility we live in an uncomputable universe? Even if you are correct in your example, there is no reason to suppose that your human does better in the space of possible uncomputable universes than AIXI, as opposed to better in that one possible (impossible) universe.
So you think that we need to rethink our theory of what perfect optimization is, in order to take into account the possibility we live in an uncomputable universe?
Yes.
Even if you are correct in your example, there is no reason to suppose that your human does better in the space of possible uncomputable universes than AIXI, as opposed to better in that one possible (impossible) universe.
This seems pretty easy, given the same level of raw computing power available to AIXI (otherwise the human gets screwed in the majority of cases simply because he doesn’t have enough computing power).
For example, I can simply modify AIXI with a rule that says “if you’ve seen a sequence of increasingly large numbers that can’t be explained by any short computable rule, put some weight into it being BB(1)...BB(2^n)… (and also modify it to reasoning symbolically about expected utilities instead of comparing numbers) and that will surely be an improvement over all possible uncomputable universes. (ETA: Strike that “surely”. I have to think this over more carefully.)
How to make an optimal decision algorithm (as opposed to just improving upon AIXI) is still an open problem.
For example, I can simply modify AIXI with a rule that says “if you’ve seen a sequence of increasingly large numbers that can’t be explained by any short computable rule, put some weight into it being BB(1)...BB(2^n)… (and also modify it to reasoning symbolically about expected utilities instead of comparing numbers) and that will surely be an improvement over all possible uncomputable universes. (ETA: Strike that “surely”. I have to think this over more carefully.)
This is what I dislike about your logic. You create a situation where (you think) AIXI fails, but you fail to take into account the likelihood of being in the situation versus being in a similar situation. I can easily see a human seeing a long series of ones, with some zeros at the beginning, saying “ah-ha, this must be the result of a sequence of busy beavers”, when all he’s actually seeing is 3^^^3 minus his telephone number or something. AIXI can lose in really improbable universes, because it’s designed to work in the space of universes, not some particular one. By modifying the rules, you can make it better in specific universes, but only by reducing its performance in similar seeming universes.
What about the agent using Solomonoff’s distribution? After seeing
BB(1),...,BB(2^n), the algorithmic complexity of BB(1),...,BB(2^n) is sunk,
so to speak. It will predict a higher expected payoff for playing 0 in any
round i where the conditional complexity K(i | BB(1),...,BB(2^n)) < 100.
This includes for example 2BB(2^n), 2BB(2^n)+1, BB(2^n)^2 * 3 + 4,
BB(2^n)^^^3, etc. It will bet on 0 in these rounds (erroneously, since
K(BB(2^(n+1)) | BB(2^n)) > 100 for large n), and therefore lose relative to
a human.
I don’t understand how the bolded part follows. The best explanation by round BB(2^n) would be “All 1′s except for the Busy Beaver numbers up to 2^n”, right?
Yes, that’s the most probable explanation according to the Solomonoff prior, but AIXI doesn’t just use the most probable explanation to make decisions, it uses all computable explanations that haven’t been contradicted by its input yet. For example, “All 1′s except for the Busy Beaver numbers up to 2^n and 2BB(2^n)” is only slightly less likely than “All 1′s except for the Busy Beaver numbers up to 2^n” and is compatible with its input so far. The conditional probability of that explanation, given what it has seen, is high enough that it would bet on 0 at round 2BB(2^n), whereas the human wouldn’t.
EDIT: Wouldn’t it also break even by predicting the next Busy Beaver number? “All 1′s except for BB(1...2^n+1)” is also only slightly less likely. EDIT: I feel more stupid.
The next number in the sequence is BB(2^(n+1)), not BB(2^n+1).
ETA: In case more explanation is needed, it takes O(2^n) more bits to computably describe BB(2^(n+1)), even if you already have BB(2^n). (It might take O(2^n) more bits to describe BB(2^n+1) as well, but I wasn’t sure so I used BB(2^(n+1)) in my example instead.)
Since K(BB(2^(n+1)) | BB(2^n)) > 100 for large n, AIXI actually will not bet on 0 when BB(2^(n+1) comes around, and all those 0s that it does bet on are simply “wasted”.
Surely predictions of sensory experience are pretty fundamental. To understand the consequences of your actions, you have to be able to make “what-if” predictions.
You can hardly steer yourself effectively into the future if you don’t have an understanding of the consequences of your actions.
Yes, it might be necessary exactly for that purpose (though consequences don’t reside just in the “future”), but I don’t understand this well enough to decide either way.
Consequences not being in the future seems to be a curious concept to me—though I understand that Feynman dabbled with the idea on sub-microscopic scales.
I think we’ve got it covered with Newcomb’s Problem (consequences in the past) and Counterfactual Mugging (consequences in another possible world). And there is still greater generality with logical consequences.
FWIW, I wouldn’t classify Newcomb’s Problem as having to do with “consequences in the past” or Counterfactual Mugging as having to do with “consequences in another possible world”.
For me, “consequences” refers to the basic cause-and-effect relationship—and consequences always take place downstream.
Anticipating something doesn’t really mean that the future is causally affecting the past. If you deconstruct anticipation, it is all actually based on current and previous knowledge.
You are arguing definitions (with the use of a dictionary, no less!). The notion of consequences useful for decision theory is a separate idea from causality of physics.
Why would you want to make any predictions at all? Predictions are not directly about value. It doesn’t seem that there is a place for the human concept of prediction in a foundational decision theory.
I think that’s right. I was making the point about prediction because Eliezer still seems to believe that predictions of sensory experience is somehow fundamental, and I wanted to convince him that the universal prior is wrong even given that belief.
Still, universal prior does seem to be a universal way of eliciting what the human concept of prediction (expectation, probability) is, to the limit of our ability to train such a device, for exactly the reasons Eliezer gives: whatever is the concept we use, it’s in there, among the programs universal prior weights.
ETA: On the other hand, the concept thus reconstructed would be limited to talk about observations, and so won’t be a general concept, while human expectation is probably more general than that, and you’d need a general logical language to capture it (and a language of unknown expressive power to capture it faithfully).
ETA2: Predictions might still be a necessary concept to express the decisions that agent makes, to connect formal statements with what the agent actually does, and so express what the agent actually does as formal statements. We might have to deal with reality because the initial implementation of FAI has to be constructed specifically in reality.
Umm… what about my argument that a human can represent their predictions symbolically like “P(next bit is 1)=i-th bit of BB(100)” instead of using numerals, and thereby do better than a Solomonoff predictor because the Solomonoff predictor can’t incorporate this? Or in other words, the only reason the standard proofs of Solomonoff prediction’s optimality go through is that they assume predictions are represented using numerals?
Re: “what about my argument that a human can [adapt its razor a little] and thereby do better than a Solomonoff predictor because the Solomonoff predictor can’t incorporate this?”
There are at least two things “Solomonoff predictor” could refer to:
An intelligent agent with Solomonoff-based priors;
An agent who is wired to use a Solomonoff-based razor on their sense inputs;
A human is more like the first agent. The second agent is not really properly intelligent—and adapts poorly to new environments.
Humans are (can be represented by) turing machines. All halting turing machines are incorporated in AIXI. Therefore, anything that humans can do to more effectively predict something than a “mere machine” is already incorporated into AIXI.
More generally, anything you represent symbolically can be represented using binary strings. That’s how that string you wrote got to me in the first place. You converted the turing operations in your head into a string of symbols, a computer turned that into a string of digits, my computer turned it back into symbols and my brain used computable algorithms to make sense of them. What makes you think that any of this is impossible for AIXI?
Am I going crazy, or did you just basically repeat what Eliezer, Cyan, and Nesov said without addressing my point?
Do you guys think that you understand my argument and that it’s wrong, or that it’s too confusing and I need to formulate it better, or what? Everyone just seems to be ignoring it and repeating the standard party line....
ETA: Now reading the second part of your comment, which was added after my response.
ETA2: Clearly I underestimated the inferential distance here, but I thought at least Eliezer and Nesov would get it, since they appear to understand the other part of my argument about the universal prior being wrong for decision making, and this seems to be a short step. I’ll try to figure out how to explain it better.
If 4 people all think you’re wrong for the same reason, either you’re wrong or you’re not explaining yourself. You seem to disbelieve the first, so try harder with the explaining.
Didn’t stop 23+ people from voting up his article … (21 now; I and someone else voted it down)
Well, people expect him to be making good points, even when they don’t understand him (ie, I don’t understand UDT fully, but it seems to be important). Also, he’s advocating further thinking, which is popular around here.
And I really, really wish people would stop doing that, whether it’s for Wei_Dai or anyone else you deem to be smart.
Folks, you may think you’re doing us all a favor by voting someone up because they’re smart, but that policy has the effect of creating an information cascade, because it makes an inference bounce back, accumulating arbitrarily high support irrespective of its relationship to reality.
The content of a post or comment should screen off any other information about its value [1], including who made it.
[1] except in obvious cases like when someone is confirming that something is true about that person specifically
Seconded. Please only vote up posts you both understand and approve of.
I agree, but would like to point out that I don’t see any evidence that people aren’t already doing this. As far as I can tell, Lucas was only speculating that people voted up my post based on the author. Several other of my recent posts have fairly low scores, for example. (All of them advocated further thinking as well, so I don’t think that’s it either.)
The fact that AIXI can predict that a human would predict certain things, does not mean that AIXI can agree with those predictions.
In the limit, even if that one human is the only thing in all of the hypotheses that AIXI has under consideration, AIXI will be predicting precisely as that human does.
BB(100) is computable. Am I missing something?
Maybe… by BB I mean the Busy Beaver function Σ as defined in this Wikipedia entry.
Right, and...
So why can’t the universal prior use it?
Sorry, I should have used BB(2^100) as the example. The universal prior assigns the number BB(2^100) a very small weight, because the only way to represent it computably is by giving a 2^100 state Turing machine. A human would assign it a much larger weight, referencing it by its short symbolic representation.
Until I write up a better argument, you might want to (assuming you haven’t already) read this post where I gave a decision problem that a human does (infinitely) better than AIXI.
I don’t think I understood that fully, but there seems to be a problem with your theory. The human gets to start in the epistemically advantaged position of knowing that the game is based on a sequence of busy beavers and knowing that they are a very fast growing function. AIXI is prevent from knowing this information and has to start as if from a blank canvas. The reason we use a Occamian prior for AIXI is because we refuse to tailor it to a specific environment, if your logic is sound, then yes, it does do worse when it is dropped into an environment where it is paired with a human with an epistemic advantage, but it would beat the human across the space of possible worlds.
Another problem you seem to have is to assume that the only hypothesis in the entire set that gives useful predictions is the hypothesis which is, in fact, correct. There are plenty of other function which correctly predict arbitrarily large numbers of 1′s, with much less complexity, which can give the overall probability weighting that AIXI is using a usefully correct model of its universe, if not a fully correct one.
How a human might come to believe, without being epistemically privileged, that a sequence is probably a sequence of busy beavers, is a deep problem, similar to the problem of distinguishing halting oracles from impostors. (At least one mathematical logician who has thought deeply about the latter problem thinks that it’s doable.)
But in any case, the usual justification for AIXI (or adopting the universal prior) is that (asymptotically) it does as well as or better than any computable agent, even one that is epistemically privileged, as long as the environment is computable. Eliezer and others were claiming that it does as well as or better than any computable agent, even if the environment is not computable, and this is what my counter-example disproves.
So you think that we need to rethink our theory of what perfect optimization is, in order to take into account the possibility we live in an uncomputable universe? Even if you are correct in your example, there is no reason to suppose that your human does better in the space of possible uncomputable universes than AIXI, as opposed to better in that one possible (impossible) universe.
Yes.
This seems pretty easy, given the same level of raw computing power available to AIXI (otherwise the human gets screwed in the majority of cases simply because he doesn’t have enough computing power).
For example, I can simply modify AIXI with a rule that says “if you’ve seen a sequence of increasingly large numbers that can’t be explained by any short computable rule, put some weight into it being BB(1)...BB(2^n)… (and also modify it to reasoning symbolically about expected utilities instead of comparing numbers) and that will surely be an improvement over all possible uncomputable universes. (ETA: Strike that “surely”. I have to think this over more carefully.)
How to make an optimal decision algorithm (as opposed to just improving upon AIXI) is still an open problem.
This is what I dislike about your logic. You create a situation where (you think) AIXI fails, but you fail to take into account the likelihood of being in the situation versus being in a similar situation. I can easily see a human seeing a long series of ones, with some zeros at the beginning, saying “ah-ha, this must be the result of a sequence of busy beavers”, when all he’s actually seeing is 3^^^3 minus his telephone number or something. AIXI can lose in really improbable universes, because it’s designed to work in the space of universes, not some particular one. By modifying the rules, you can make it better in specific universes, but only by reducing its performance in similar seeming universes.
I don’t understand how the bolded part follows. The best explanation by round BB(2^n) would be “All 1′s except for the Busy Beaver numbers up to 2^n”, right?
Yes, that’s the most probable explanation according to the Solomonoff prior, but AIXI doesn’t just use the most probable explanation to make decisions, it uses all computable explanations that haven’t been contradicted by its input yet. For example, “All 1′s except for the Busy Beaver numbers up to 2^n and 2BB(2^n)” is only slightly less likely than “All 1′s except for the Busy Beaver numbers up to 2^n” and is compatible with its input so far. The conditional probability of that explanation, given what it has seen, is high enough that it would bet on 0 at round 2BB(2^n), whereas the human wouldn’t.
Oh.
I feel stupid now.
EDIT: Wouldn’t it also break even by predicting the next Busy Beaver number? “All 1′s except for BB(1...2^n+1)” is also only slightly less likely. EDIT: I feel more stupid.
The next number in the sequence is BB(2^(n+1)), not BB(2^n+1).
ETA: In case more explanation is needed, it takes O(2^n) more bits to computably describe BB(2^(n+1)), even if you already have BB(2^n). (It might take O(2^n) more bits to describe BB(2^n+1) as well, but I wasn’t sure so I used BB(2^(n+1)) in my example instead.)
Since K(BB(2^(n+1)) | BB(2^n)) > 100 for large n, AIXI actually will not bet on 0 when BB(2^(n+1) comes around, and all those 0s that it does bet on are simply “wasted”.
You can find it by emulating the Busy Beaver.
BB(100) is computable—and BB(2^100) is computable too :-(
Surely predictions of sensory experience are pretty fundamental. To understand the consequences of your actions, you have to be able to make “what-if” predictions.
Re: “It doesn’t seem that there is a place for the human concept of prediction in a foundational decision theory.”
You can hardly steer yourself effectively into the future if you don’t have an understanding of the consequences of your actions.
Yes, it might be necessary exactly for that purpose (though consequences don’t reside just in the “future”), but I don’t understand this well enough to decide either way.
I checked with the dictionary. It had:
the effect, result, or outcome of something occurring earlier: The accident was the consequence of reckless driving.
an act or instance of following something as an effect, result, or outcome.
http://dictionary.reference.com/browse/consequence
Consequences not being in the future seems to be a curious concept to me—though I understand that Feynman dabbled with the idea on sub-microscopic scales.
I think we’ve got it covered with Newcomb’s Problem (consequences in the past) and Counterfactual Mugging (consequences in another possible world). And there is still greater generality with logical consequences.
FWIW, I wouldn’t classify Newcomb’s Problem as having to do with “consequences in the past” or Counterfactual Mugging as having to do with “consequences in another possible world”.
For me, “consequences” refers to the basic cause-and-effect relationship—and consequences always take place downstream.
Anticipating something doesn’t really mean that the future is causally affecting the past. If you deconstruct anticipation, it is all actually based on current and previous knowledge.
You are arguing definitions (with the use of a dictionary, no less!). The notion of consequences useful for decision theory is a separate idea from causality of physics.
Is “consequences” really a good term for what you are talking about?
It seems as though it is likely to cause confusion to me.
Does anyone else use the term in this way?