I think a lot of the confusion about these types of decision theory problems has to do with not not everyone thinking about the same problem even when it seems like they are.
For example, consider the problem I’ll call ‘pseudo newcombs problem’. Omega still gives you the same options, and history has proven a strong correlation between peoples choices and its predictions.
The difference is that instead of simulating the relevant part of your decision algorithm to make the prediction, Omega just looks to see whether you have a blue dot or a red dot on your forehead- since a red dot has been a perfect indicator of a mental dysfunction that makes the response to every query “one box!” and blue dot has been a perfect indicator of a functioning brain. In addition, all people with working brains have chosen two boxes in the past.
If I understand correctly, all decision theories discussed will two box here, and rightly so- choosing one box doesn’t cause Omega to choose differently since that decision was determined solely by the color of your dot.
People that say to two box on Newcomblike problems think of this type of Omega, since sufficiently detailed simulations aren’t the first thing that come to mind- indicators of broken minds do.
For the one shot PD, it seems like something similar is happening. Cooperating just doesn’t ‘seem’ right to me most of the time, but it’s only because I’d have a hard time believing the other guy was running the same algorithm.
I had an interesting dream where I was copied recently, and it made cooperation on one shot PD a lot more intuitive. Even when trued, I’d cooperate with that guy, no question about it.
For the one shot PD, it seems like something similar is happening. Cooperating just doesn’t ‘seem’ right to me most of the time, but it’s only because I’d have a hard time believing the other guy was running the same algorithm.
Do you think that the other guy is thinking the same thing, and reasoning the same way? Or do you think that the other will probably decide to cooperate or defect on the PD using some unrelated algorithm?
My main reason for potentially defecting on the true PD against another human—note the sheer difficulty of obtaining this unless the partner is Hannibal with an imminently lethal wound—would be my doubts that they were actually calculating using a timeless decision theory, even counting someone thinking about Hofstadterian superrationality as TDT. Most people who’ve studied the matter in college have been taught that the right thing to do is to defect, and those who cooperate on instinct are running a different algorithm, that of being honorable.
But it’d be pretty damn hard in real life to put me into a literally one-shot, uncoordinated, no-communication, true PD where I’m running TDT, the other person is running honor with no inkling that I’m TDT, and the utilities at stake outweigh that which constrains me not to betray honorable people. It deserves a disclaimer to the effect of “This hypothetical problem is sufficiently different from the basic conditions of real life that no ethical advice should be taken from my hypothetical answer.”
Do you think that the other guy is thinking the same thing, and reasoning the same way? Or do you think that the other will probably decide to cooperate or defect on the PD using some unrelated algorithm?
The latter. I haven’t thought about this enough be comfortable knowing how similar his algorithm must be in order to cooperate, but if I ultimately decided to defect it’d be because I thought it qualified as sufficiently different.
So you fully expect in real life that you might defect and yet see the other person cooperate (with standard ethical disclaimers about how hard it is to true the PD such that you actually prefer to see that outcome).
Yes, that’s correct. I also currently see a significant probability of choosing to cooperate and finding out that the other guy defected on me. Should I take your response as evidence to reconsider? As I said before, I don’t claim to have this all sorted out.
As to your disclaimer, it seems like your impression says that it’s much harder to true PD than mine says. If you think you can make the thing truly one shot without reputational consequences (which may be the hard part, but it seems like you think its the other part), then it’s just a question of setting up the payoff table.
If you don’t have personal connections to the other party, it seems that you don’t care any more about him than the other 6 billion people on earth. If you can meet those conditions, even a small contribution to fighting existential risks (funded by your prize money) should outweigh anything you care about him.
But it’d be pretty damn hard in real life to put me into a literally one-shot, uncoordinated, no-communication, true PD where I’m running TDT, the other person is running honor with no inkling that I’m TDT, and the utilities at stake outweigh that which constrains me not to betray honorable people.
Mostly because of the “one-shot, uncoordinated, no-communication, true… utilities at stake outweigh” parts, I would think. The really relevant question conditions on those things.
If I understand correctly, all decision theories discussed will two box here, and rightly so- choosing one box doesn’t cause Omega to choose differently since that decision was determined solely by the color of your dot.
Depending on the set-up, “innards-CSAs” may one-box here. Innards-CSAs go back to a particular moment in time (or to their creator’s probability distribution) and ask: “if I had been created at that time, with a (perhaps physically transparent) policy that would one-box, would I get more money than if I had been created with a (perhaps physically transparent) policy that would two-box?”
If your Omega came to use the colored dots in its prediction because one-boxing and two-boxing was correlated with dot-colors, and if the innards-CSA in question is programmed to do its its counterfactual innards-swap back before Omega concluded that this was the correlation, and if your innards-CSA ended up copied (perhaps with variations) such that, if it had had different innards, Omega would have ended up with a different decision-rule… then it will one-box.
And “rightly so” in the view of the innards-CSA… because, by reasoning in this manner, the CSA can increase the odds that Omega has decision-rules that favor its own dot-color. At least according to its own notion of how to reckon counterfactuals.
Depending on your beliefs about what computation Omega did to choose its policy, the TDT counterfactual comes out as either “If things like me one-boxed, then Omega would put $1m into box B on seeing a blue dot” or “If things like me one-boxed, then Omega would still have decided to leave B empty when seeing a blue dot, and so if things like me one-boxed I would get nothing.”
I see your point, which is why I made sure to write “In addition, all people with working brains have chosen two boxes in the past.”
My point is that you can have situations where there is a strong correlation so that Omega nearly always predicts correctly, but that Omega’s prediction isn’t caused by the output of the algorithm you use to compute your decisions, so you should two box.
The lack of effort to distinguish between the two cases seems to have generated a lot of confusion (at least it got me for a while).
I think a lot of the confusion about these types of decision theory problems has to do with not not everyone thinking about the same problem even when it seems like they are.
For example, consider the problem I’ll call ‘pseudo newcombs problem’. Omega still gives you the same options, and history has proven a strong correlation between peoples choices and its predictions.
The difference is that instead of simulating the relevant part of your decision algorithm to make the prediction, Omega just looks to see whether you have a blue dot or a red dot on your forehead- since a red dot has been a perfect indicator of a mental dysfunction that makes the response to every query “one box!” and blue dot has been a perfect indicator of a functioning brain. In addition, all people with working brains have chosen two boxes in the past.
If I understand correctly, all decision theories discussed will two box here, and rightly so- choosing one box doesn’t cause Omega to choose differently since that decision was determined solely by the color of your dot.
People that say to two box on Newcomblike problems think of this type of Omega, since sufficiently detailed simulations aren’t the first thing that come to mind- indicators of broken minds do.
For the one shot PD, it seems like something similar is happening. Cooperating just doesn’t ‘seem’ right to me most of the time, but it’s only because I’d have a hard time believing the other guy was running the same algorithm.
I had an interesting dream where I was copied recently, and it made cooperation on one shot PD a lot more intuitive. Even when trued, I’d cooperate with that guy, no question about it.
Do you think that the other guy is thinking the same thing, and reasoning the same way? Or do you think that the other will probably decide to cooperate or defect on the PD using some unrelated algorithm?
My main reason for potentially defecting on the true PD against another human—note the sheer difficulty of obtaining this unless the partner is Hannibal with an imminently lethal wound—would be my doubts that they were actually calculating using a timeless decision theory, even counting someone thinking about Hofstadterian superrationality as TDT. Most people who’ve studied the matter in college have been taught that the right thing to do is to defect, and those who cooperate on instinct are running a different algorithm, that of being honorable.
But it’d be pretty damn hard in real life to put me into a literally one-shot, uncoordinated, no-communication, true PD where I’m running TDT, the other person is running honor with no inkling that I’m TDT, and the utilities at stake outweigh that which constrains me not to betray honorable people. It deserves a disclaimer to the effect of “This hypothetical problem is sufficiently different from the basic conditions of real life that no ethical advice should be taken from my hypothetical answer.”
The latter. I haven’t thought about this enough be comfortable knowing how similar his algorithm must be in order to cooperate, but if I ultimately decided to defect it’d be because I thought it qualified as sufficiently different.
So you fully expect in real life that you might defect and yet see the other person cooperate (with standard ethical disclaimers about how hard it is to true the PD such that you actually prefer to see that outcome).
Yes, that’s correct. I also currently see a significant probability of choosing to cooperate and finding out that the other guy defected on me. Should I take your response as evidence to reconsider? As I said before, I don’t claim to have this all sorted out.
As to your disclaimer, it seems like your impression says that it’s much harder to true PD than mine says. If you think you can make the thing truly one shot without reputational consequences (which may be the hard part, but it seems like you think its the other part), then it’s just a question of setting up the payoff table.
If you don’t have personal connections to the other party, it seems that you don’t care any more about him than the other 6 billion people on earth. If you can meet those conditions, even a small contribution to fighting existential risks (funded by your prize money) should outweigh anything you care about him.
Mostly because of the “one-shot, uncoordinated, no-communication, true… utilities at stake outweigh” parts, I would think. The really relevant question conditions on those things.
Depending on the set-up, “innards-CSAs” may one-box here. Innards-CSAs go back to a particular moment in time (or to their creator’s probability distribution) and ask: “if I had been created at that time, with a (perhaps physically transparent) policy that would one-box, would I get more money than if I had been created with a (perhaps physically transparent) policy that would two-box?”
If your Omega came to use the colored dots in its prediction because one-boxing and two-boxing was correlated with dot-colors, and if the innards-CSA in question is programmed to do its its counterfactual innards-swap back before Omega concluded that this was the correlation, and if your innards-CSA ended up copied (perhaps with variations) such that, if it had had different innards, Omega would have ended up with a different decision-rule… then it will one-box.
And “rightly so” in the view of the innards-CSA… because, by reasoning in this manner, the CSA can increase the odds that Omega has decision-rules that favor its own dot-color. At least according to its own notion of how to reckon counterfactuals.
Depending on your beliefs about what computation Omega did to choose its policy, the TDT counterfactual comes out as either “If things like me one-boxed, then Omega would put $1m into box B on seeing a blue dot” or “If things like me one-boxed, then Omega would still have decided to leave B empty when seeing a blue dot, and so if things like me one-boxed I would get nothing.”
I see your point, which is why I made sure to write “In addition, all people with working brains have chosen two boxes in the past.”
My point is that you can have situations where there is a strong correlation so that Omega nearly always predicts correctly, but that Omega’s prediction isn’t caused by the output of the algorithm you use to compute your decisions, so you should two box.
The lack of effort to distinguish between the two cases seems to have generated a lot of confusion (at least it got me for a while).