Can anyone point me toward work that’s been done on the five-and-ten problem? Or does someone want to discuss it here? Specifically, I don’t understand why it is a problem for probabilistic algorithms. I would reason:
There is a high probability that I prefer $10 to $5. Therefore I will decide to choose $5, with low probability.
And there’s nowhere to go from there. If I try to use the fact that I chose $5 to prove that $5 was the better choice all along (because I’m rational), I get something like:
The probability that I prefer $5 to $10 is low. But I have very high confidence in my rationality, meaning that I assign high probability, a priori, to any choice I make being the choice I prefer. Therefore, given that I choose $5, the probability that I prefer $5 is high. So $5 doesn’t seem like a bad choice, since I’ll probably end up with what I prefer.
But things still turn out right, because:
However, the probability that I prefer $10, given that I choose $10, is even higher, because the probability that I prefer $10 was high to begin with. Therefore, $10 is a better choice than $5, because the probability that (I prefer $10 to $5 given that I choose $10) is higher than the probability that (I prefer $5 to $10 given that I choose $5).
So unless I’m missing something, the five-and-ten problem is just a problem of overconfidence.
The problem is how classical logical statements work. The statement “If A then B” more properly translates as “~(A and ~B)”.
Thus, we get valid logical statements that look bizarre to humans: “If Paris is the capital of France, then Rome is the capital of Italy” seems untrue in a causal sense (if we changed the capital of France, we would not change the capital of Italy, and vice versa) but it is true in a logical sense, because A is true, B is true, true and ~true is false, and ~false is true.
That example seems just silly, but the problem is the reverse example is disastrous. Notice that, because of the “and,” if A is false then it doesn’t matter what B is: false and X is false, ~false is true. If I choose the premise “Marseilles is the capital of France,” then any B works. “If Marseilles is the capital of France, then I will receive infinite utility” is a true relationship under classical logic, but is clearly not a causal relationship: changing the capital will not grant me infinite utility, and as soon as the capital changes, the logical truth of the sentence will change.
If you have a reasoner that makes decisions, they need to use causal logic, not classical logic, or they’ll get tripped up by the word “implication.”
I get that. What I’m really wondering is how this extends to probabilistic reasoning. I can think of an obvious analog. If the algorithm assigns zero probability that it will choose $5, then when it explores the counterfactual hypothesis “I choose $5”, it gets nonsense when it tries to condition on the hypothesis. That is, for all U,
A causal reasoner will compute about P(utility=U| do{action=$5}), which doesn’t run into this trouble. This is the approach I recommend.
Probabilistic reasoning about actions that you will make is, to the best of my knowledge, not a seriously considered approach to making decisions outside of the context of mixed strategies in game theory, and even there it doesn’t apply that strong, as you can see mixed strategies as putting forth a certain (but parameterized) action whose outcome is subject to uncertainty.
I don’t think your sketch is correct for two reasons:
The assumption that your action is utility-maximizing requires that you choose the best action, and so using it to justify your choice of action leads to circularity.
Your argument hinges on P(U($10)>U($5)|A=$10) > P(U($5)>U($10)|A=$5), which seems like an odd statement to me. If you take the actions maximize utility assumption seriously, both of those are 1, and thus the first can’t be higher than the second. If you view the actions as not at all informative about the preference probabilities, then you’re just repeating your prior. If the action gives some information, there’s no reason for the information to be symmetric- you can easily construct a 2x2 matrix example where the reverse inequality holds (that if we know they picked $5, they are more likely to prefer $5 to $10 than someone who picked $10 is to prefer $10 to $5, even though most people prefer $10 to $5.
What I am saying is that I don’t assume that I maximize expected utility. I take the five-and-ten problem as a proof that an agent cannot be certain that it will make the optimal choice, while it is choosing, because this leads to a contradiction. But this doesn’t mean that I can’t use the evidence that a choice would represent, while choosing. In this case, I can tell that U($10) > U($5) directly, so conditioning on A=$10 or A=$5 is redundant. The point is that it doesn’t cause the algorithm to blow up, as long I don’t think my probability of maximizing utility is 0 or 1.
It’s true that A=$5 could be stronger evidence for U($5)>U($10) than A=$10 is for U($10)>U($5). But there’s no particular reason to think it would be. And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference. As long as making a choice is evidence for that being the optimal choice, only insofar as I am confident that I make the optimal choice in general, it will provide equally strong evidence for every choice, and cancel itself out. But in cases where a particular choice is evidence of good things for other reasons (like Newcomb’s problem), taking this evidence into consideration can affect my decision.
So why can’t I just use the knowledge that I’ll go through this line of reasoning to prove that I will choose $10 and yield a contradiction? Because I can’t prove that I’ll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion. Now, there may be a shortcut I could use to prove what my choice will be, but the very fact that this would yield a contradiction means that no such proof exists in a consistent formal system.
(BTW, I agree that CDT is the only decision theory that works in practice, as is. I’m only addressing one issue with the various timeless decision theories)
And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference.
Well, then why even update? (Or, more specifically, why assume that this is harmless normally, but an ace up your sleeve for a particular class of problems? You need to be able to reliably distinguish when this helps you and when this hurts you from the inside, which seems difficult.)
Because I can’t prove that I’ll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion.
I’m not sure that I understand this; I’m under the impression that many TDT applications require that they be able to simulate themselves (and other TDT reasoners) this way.
Good questions. I don’t know the answers. But like you say, UDT especially is basically defined circularly—where the agent’s decision is a function of itself. Making this coherent is still an unsolved problem. So I was wondering if we could get around some of the paradoxes by giving up on certainty.
To me, it looks like the five-and-ten problem is that the quotation is not the referent. It seems to me that a program reasoning about its utitlity function in the way explained in the article is like a person saying ” ′ “Snow is white.” is true.′ is a true statement.” The word true cannot coherently have the same meaning in both locations within the sentence.
Can anyone point me toward work that’s been done on the five-and-ten problem? Or does someone want to discuss it here? Specifically, I don’t understand why it is a problem for probabilistic algorithms. I would reason:
And there’s nowhere to go from there. If I try to use the fact that I chose $5 to prove that $5 was the better choice all along (because I’m rational), I get something like:
But things still turn out right, because:
So unless I’m missing something, the five-and-ten problem is just a problem of overconfidence.
The problem is how classical logical statements work. The statement “If A then B” more properly translates as “~(A and ~B)”.
Thus, we get valid logical statements that look bizarre to humans: “If Paris is the capital of France, then Rome is the capital of Italy” seems untrue in a causal sense (if we changed the capital of France, we would not change the capital of Italy, and vice versa) but it is true in a logical sense, because A is true, B is true, true and ~true is false, and ~false is true.
That example seems just silly, but the problem is the reverse example is disastrous. Notice that, because of the “and,” if A is false then it doesn’t matter what B is: false and X is false, ~false is true. If I choose the premise “Marseilles is the capital of France,” then any B works. “If Marseilles is the capital of France, then I will receive infinite utility” is a true relationship under classical logic, but is clearly not a causal relationship: changing the capital will not grant me infinite utility, and as soon as the capital changes, the logical truth of the sentence will change.
If you have a reasoner that makes decisions, they need to use causal logic, not classical logic, or they’ll get tripped up by the word “implication.”
I get that. What I’m really wondering is how this extends to probabilistic reasoning. I can think of an obvious analog. If the algorithm assigns zero probability that it will choose $5, then when it explores the counterfactual hypothesis “I choose $5”, it gets nonsense when it tries to condition on the hypothesis. That is, for all U,
P(utility=U | action=$5) = P(utility=U and action=$5) / P(action=$5) = 0⁄0
is undefined. But is there an analog for this problem under uncertainty, or was my sketch correct about how that would work out?
A causal reasoner will compute about P(utility=U| do{action=$5}), which doesn’t run into this trouble. This is the approach I recommend.
Probabilistic reasoning about actions that you will make is, to the best of my knowledge, not a seriously considered approach to making decisions outside of the context of mixed strategies in game theory, and even there it doesn’t apply that strong, as you can see mixed strategies as putting forth a certain (but parameterized) action whose outcome is subject to uncertainty.
I don’t think your sketch is correct for two reasons:
The assumption that your action is utility-maximizing requires that you choose the best action, and so using it to justify your choice of action leads to circularity.
Your argument hinges on P(U($10)>U($5)|A=$10) > P(U($5)>U($10)|A=$5), which seems like an odd statement to me. If you take the actions maximize utility assumption seriously, both of those are 1, and thus the first can’t be higher than the second. If you view the actions as not at all informative about the preference probabilities, then you’re just repeating your prior. If the action gives some information, there’s no reason for the information to be symmetric- you can easily construct a 2x2 matrix example where the reverse inequality holds (that if we know they picked $5, they are more likely to prefer $5 to $10 than someone who picked $10 is to prefer $10 to $5, even though most people prefer $10 to $5.
What I am saying is that I don’t assume that I maximize expected utility. I take the five-and-ten problem as a proof that an agent cannot be certain that it will make the optimal choice, while it is choosing, because this leads to a contradiction. But this doesn’t mean that I can’t use the evidence that a choice would represent, while choosing. In this case, I can tell that U($10) > U($5) directly, so conditioning on A=$10 or A=$5 is redundant. The point is that it doesn’t cause the algorithm to blow up, as long I don’t think my probability of maximizing utility is 0 or 1.
It’s true that A=$5 could be stronger evidence for U($5)>U($10) than A=$10 is for U($10)>U($5). But there’s no particular reason to think it would be. And as long as P(U($10)>U($5)) is large enough a priori, it will swamp out the difference. As long as making a choice is evidence for that being the optimal choice, only insofar as I am confident that I make the optimal choice in general, it will provide equally strong evidence for every choice, and cancel itself out. But in cases where a particular choice is evidence of good things for other reasons (like Newcomb’s problem), taking this evidence into consideration can affect my decision.
So why can’t I just use the knowledge that I’ll go through this line of reasoning to prove that I will choose $10 and yield a contradiction? Because I can’t prove that I’ll go through this line of reasoning. Simulating my decision process as part of my decision would result in infinite recursion. Now, there may be a shortcut I could use to prove what my choice will be, but the very fact that this would yield a contradiction means that no such proof exists in a consistent formal system.
(BTW, I agree that CDT is the only decision theory that works in practice, as is. I’m only addressing one issue with the various timeless decision theories)
Well, then why even update? (Or, more specifically, why assume that this is harmless normally, but an ace up your sleeve for a particular class of problems? You need to be able to reliably distinguish when this helps you and when this hurts you from the inside, which seems difficult.)
I’m not sure that I understand this; I’m under the impression that many TDT applications require that they be able to simulate themselves (and other TDT reasoners) this way.
Good questions. I don’t know the answers. But like you say, UDT especially is basically defined circularly—where the agent’s decision is a function of itself. Making this coherent is still an unsolved problem. So I was wondering if we could get around some of the paradoxes by giving up on certainty.
To me, it looks like the five-and-ten problem is that the quotation is not the referent. It seems to me that a program reasoning about its utitlity function in the way explained in the article is like a person saying ” ′ “Snow is white.” is true.′ is a true statement.” The word true cannot coherently have the same meaning in both locations within the sentence.