You’re right about the payoff matrix, I guess newcomb’s problem doesn’t have a payoff matrix at all, since there’s no payoff defined for the person filling the boxes.
What do you mean by “prove a fact about their decision theory”? Do you mean that you’re proving “a rational AI would use decision theory X and therefore use strategy Y”, or do you only mean “GIVEN that an AI uses decision theory X, they would use strategy Y”?
There seems to be a belief floating around this site that an AI could end up using any old kind of decision theory, depending on how it was programmed. Do you subscribe to this?
The “horrible strategy”, newcomb’s problem, and TDT games in general only make sense if player 2 (the player who acts second) can be simulated by player 1 in enough detail that player 1 can make a choice which is CONDITIONAL on player 2′s choice. A choice which, to reiterate, they have not yet actually made, and which they have an incentive to make differently than they are expected to.
The difficulty of achieving this may best be illustrated with an example. Here player 1, a human, and player 2, omega, are playing newcomb’s problem.
Player 1: “Omega is super-intelligent and super-rational. He uses updateless decision theory. Therefore, he will pick the winning strategy rather than the winning local move. Ergo, he will only take the opaque box. So, I should put money in both boxes. A = A.”
Player 2: “Chump.” [takes both boxes]
But of course, this wouldn’t REALLY happen, because if omega reasoned locally like this, we’d somehow be able to predict that, right? and so we wouldn’t put money in the box, right? And so, being rational, he wouldn’t want to act like that, because then he’d get less money. So, he’d definitely one-box. Whew, glad we reasoned through that. Let’s put the money in both boxes now.
Player 2: “Chump.” [takes both boxes]
The problem is, he can keep doing this no matter how fancy our reasoning gets, because at the end of the day, WE CAN’T SIMULATE HIS THINKING. It’s not enough to do some handwavy reasoning about decision theories and payoff matrixes and stuff, in order to do a UDT bargain, we have to actually be able to actually simulate his brain. To not just see his thinking on the horizon, as it were, but to be A STEP AHEAD. And this we cannot do.
Actually come to think of it, an even better analogy than a switched up newcomb’s problem is a switched up parfit’s hitchhiker. The human vs. human version works, not perfectly by any means, but at least to some extent, because humans are imperfect liars. You can’t simulate another human’s brain in perfect detail, but sometimes you can be a step ahead of them.
If the hitchhiker is omega, you can’t. This is a bad thing for both you and omega, but it’s not something either of you can change. Omega could self-modify to become Omega+, who’s just like omega except that he never lies, but he would have no way of proving to you that he had done so. Maybe omega will get lucky, and you’ll convince yourself through some flawed and convoluted reasoning that he has an incentive to do this, but he actually doesn’t, because there’s no possible way it will impact your decision.
Consider this. Omega promises to give you $500 if you take him into town, you agree, when you get to town he calls you a chump and runs away. What is your reaction? Do you think to yourself “DOES NOT COMPUTE”?
Omega got everything he wanted, so presumably his actions were rational. Why did your model not predict this?