YOU have to be simulating THE AI at time 0, in your human imagination. This is not possible.
Entirely possible, since I only need to prove a fact about their decision theory, not simulate it in real time—though it may mean that it’s a smaller subset of possible AIs. But if we allow unbounded utility, any finite probability is enough to blackmail with.
As for this being like Newcomb’s problem—no it’s not, the payoff matrix is different.
EDIT: Well, I guess it is sort of similar. But “sort of similar” isn’t enough, it really is a different game.
You’re right about the payoff matrix, I guess newcomb’s problem doesn’t have a payoff matrix at all, since there’s no payoff defined for the person filling the boxes.
What do you mean by “prove a fact about their decision theory”? Do you mean that you’re proving “a rational AI would use decision theory X and therefore use strategy Y”, or do you only mean “GIVEN that an AI uses decision theory X, they would use strategy Y”?
There seems to be a belief floating around this site that an AI could end up using any old kind of decision theory, depending on how it was programmed. Do you subscribe to this?
The “horrible strategy”, newcomb’s problem, and TDT games in general only make sense if player 2 (the player who acts second) can be simulated by player 1 in enough detail that player 1 can make a choice which is CONDITIONAL on player 2′s choice. A choice which, to reiterate, they have not yet actually made, and which they have an incentive to make differently than they are expected to.
The difficulty of achieving this may best be illustrated with an example. Here player 1, a human, and player 2, omega, are playing newcomb’s problem.
Player 1: “Omega is super-intelligent and super-rational. He uses updateless decision theory. Therefore, he will pick the winning strategy rather than the winning local move. Ergo, he will only take the opaque box. So, I should put money in both boxes. A = A.”
Player 2: “Chump.” [takes both boxes]
But of course, this wouldn’t REALLY happen, because if omega reasoned locally like this, we’d somehow be able to predict that, right? and so we wouldn’t put money in the box, right? And so, being rational, he wouldn’t want to act like that, because then he’d get less money. So, he’d definitely one-box. Whew, glad we reasoned through that. Let’s put the money in both boxes now.
Player 2: “Chump.” [takes both boxes]
The problem is, he can keep doing this no matter how fancy our reasoning gets, because at the end of the day, WE CAN’T SIMULATE HIS THINKING. It’s not enough to do some handwavy reasoning about decision theories and payoff matrixes and stuff, in order to do a UDT bargain, we have to actually be able to actually simulate his brain. To not just see his thinking on the horizon, as it were, but to be A STEP AHEAD. And this we cannot do.
in order to do a UDT bargain, we have to actually be able to actually simulate his brain
Nope. For example, humans do this sort of reasoning in games like the ultimatum game, and I can’t simulate a human completely any more than I can simulate an AI completely. All you really need to know is what their options are and how they choose between options.
Actually come to think of it, an even better analogy than a switched up newcomb’s problem is a switched up parfit’s hitchhiker. The human vs. human version works, not perfectly by any means, but at least to some extent, because humans are imperfect liars. You can’t simulate another human’s brain in perfect detail, but sometimes you can be a step ahead of them.
If the hitchhiker is omega, you can’t. This is a bad thing for both you and omega, but it’s not something either of you can change. Omega could self-modify to become Omega+, who’s just like omega except that he never lies, but he would have no way of proving to you that he had done so. Maybe omega will get lucky, and you’ll convince yourself through some flawed and convoluted reasoning that he has an incentive to do this, but he actually doesn’t, because there’s no possible way it will impact your decision.
Consider this. Omega promises to give you $500 if you take him into town, you agree, when you get to town he calls you a chump and runs away. What is your reaction? Do you think to yourself “DOES NOT COMPUTE”?
Omega got everything he wanted, so presumably his actions were rational. Why did your model not predict this?
Well, if I’m playing the part of the diver right, in order for me to do it in the first place I’d have to have some evidence that Omega was honest. Really I only need a 10% or so chance of him being honest to pick him up. So I’d probably go “my evidence was wrong, dang, now I’m out the $5 for gas and the 3 utilons of having to ride with that jerk Omega.” This would also be new evidence that changed my probabilities by varying amounts.
So the analogy is that giving the ride to the bad AI is like helping it come into existence, and it not paying is like it doing horrible things to you anyway? If that’s the case, I might well think to myself “DOES NOT COMPUTE.”
Entirely possible, since I only need to prove a fact about their decision theory, not simulate it in real time—though it may mean that it’s a smaller subset of possible AIs. But if we allow unbounded utility, any finite probability is enough to blackmail with.
As for this being like Newcomb’s problem—no it’s not, the payoff matrix is different.
EDIT: Well, I guess it is sort of similar. But “sort of similar” isn’t enough, it really is a different game.
You’re right about the payoff matrix, I guess newcomb’s problem doesn’t have a payoff matrix at all, since there’s no payoff defined for the person filling the boxes.
What do you mean by “prove a fact about their decision theory”? Do you mean that you’re proving “a rational AI would use decision theory X and therefore use strategy Y”, or do you only mean “GIVEN that an AI uses decision theory X, they would use strategy Y”?
There seems to be a belief floating around this site that an AI could end up using any old kind of decision theory, depending on how it was programmed. Do you subscribe to this?
The “horrible strategy”, newcomb’s problem, and TDT games in general only make sense if player 2 (the player who acts second) can be simulated by player 1 in enough detail that player 1 can make a choice which is CONDITIONAL on player 2′s choice. A choice which, to reiterate, they have not yet actually made, and which they have an incentive to make differently than they are expected to.
The difficulty of achieving this may best be illustrated with an example. Here player 1, a human, and player 2, omega, are playing newcomb’s problem.
Player 1: “Omega is super-intelligent and super-rational. He uses updateless decision theory. Therefore, he will pick the winning strategy rather than the winning local move. Ergo, he will only take the opaque box. So, I should put money in both boxes. A = A.”
Player 2: “Chump.” [takes both boxes]
But of course, this wouldn’t REALLY happen, because if omega reasoned locally like this, we’d somehow be able to predict that, right? and so we wouldn’t put money in the box, right? And so, being rational, he wouldn’t want to act like that, because then he’d get less money. So, he’d definitely one-box. Whew, glad we reasoned through that. Let’s put the money in both boxes now.
Player 2: “Chump.” [takes both boxes]
The problem is, he can keep doing this no matter how fancy our reasoning gets, because at the end of the day, WE CAN’T SIMULATE HIS THINKING. It’s not enough to do some handwavy reasoning about decision theories and payoff matrixes and stuff, in order to do a UDT bargain, we have to actually be able to actually simulate his brain. To not just see his thinking on the horizon, as it were, but to be A STEP AHEAD. And this we cannot do.
Nope. For example, humans do this sort of reasoning in games like the ultimatum game, and I can’t simulate a human completely any more than I can simulate an AI completely. All you really need to know is what their options are and how they choose between options.
Actually come to think of it, an even better analogy than a switched up newcomb’s problem is a switched up parfit’s hitchhiker. The human vs. human version works, not perfectly by any means, but at least to some extent, because humans are imperfect liars. You can’t simulate another human’s brain in perfect detail, but sometimes you can be a step ahead of them.
If the hitchhiker is omega, you can’t. This is a bad thing for both you and omega, but it’s not something either of you can change. Omega could self-modify to become Omega+, who’s just like omega except that he never lies, but he would have no way of proving to you that he had done so. Maybe omega will get lucky, and you’ll convince yourself through some flawed and convoluted reasoning that he has an incentive to do this, but he actually doesn’t, because there’s no possible way it will impact your decision.
Consider this. Omega promises to give you $500 if you take him into town, you agree, when you get to town he calls you a chump and runs away. What is your reaction? Do you think to yourself “DOES NOT COMPUTE”?
Omega got everything he wanted, so presumably his actions were rational. Why did your model not predict this?
Well, if I’m playing the part of the diver right, in order for me to do it in the first place I’d have to have some evidence that Omega was honest. Really I only need a 10% or so chance of him being honest to pick him up. So I’d probably go “my evidence was wrong, dang, now I’m out the $5 for gas and the 3 utilons of having to ride with that jerk Omega.” This would also be new evidence that changed my probabilities by varying amounts.
So the analogy is that giving the ride to the bad AI is like helping it come into existence, and it not paying is like it doing horrible things to you anyway? If that’s the case, I might well think to myself “DOES NOT COMPUTE.”