Actually come to think of it, an even better analogy than a switched up newcomb’s problem is a switched up parfit’s hitchhiker. The human vs. human version works, not perfectly by any means, but at least to some extent, because humans are imperfect liars. You can’t simulate another human’s brain in perfect detail, but sometimes you can be a step ahead of them.
If the hitchhiker is omega, you can’t. This is a bad thing for both you and omega, but it’s not something either of you can change. Omega could self-modify to become Omega+, who’s just like omega except that he never lies, but he would have no way of proving to you that he had done so. Maybe omega will get lucky, and you’ll convince yourself through some flawed and convoluted reasoning that he has an incentive to do this, but he actually doesn’t, because there’s no possible way it will impact your decision.
Consider this. Omega promises to give you $500 if you take him into town, you agree, when you get to town he calls you a chump and runs away. What is your reaction? Do you think to yourself “DOES NOT COMPUTE”?
Omega got everything he wanted, so presumably his actions were rational. Why did your model not predict this?
Well, if I’m playing the part of the diver right, in order for me to do it in the first place I’d have to have some evidence that Omega was honest. Really I only need a 10% or so chance of him being honest to pick him up. So I’d probably go “my evidence was wrong, dang, now I’m out the $5 for gas and the 3 utilons of having to ride with that jerk Omega.” This would also be new evidence that changed my probabilities by varying amounts.
So the analogy is that giving the ride to the bad AI is like helping it come into existence, and it not paying is like it doing horrible things to you anyway? If that’s the case, I might well think to myself “DOES NOT COMPUTE.”
Actually come to think of it, an even better analogy than a switched up newcomb’s problem is a switched up parfit’s hitchhiker. The human vs. human version works, not perfectly by any means, but at least to some extent, because humans are imperfect liars. You can’t simulate another human’s brain in perfect detail, but sometimes you can be a step ahead of them.
If the hitchhiker is omega, you can’t. This is a bad thing for both you and omega, but it’s not something either of you can change. Omega could self-modify to become Omega+, who’s just like omega except that he never lies, but he would have no way of proving to you that he had done so. Maybe omega will get lucky, and you’ll convince yourself through some flawed and convoluted reasoning that he has an incentive to do this, but he actually doesn’t, because there’s no possible way it will impact your decision.
Consider this. Omega promises to give you $500 if you take him into town, you agree, when you get to town he calls you a chump and runs away. What is your reaction? Do you think to yourself “DOES NOT COMPUTE”?
Omega got everything he wanted, so presumably his actions were rational. Why did your model not predict this?
Well, if I’m playing the part of the diver right, in order for me to do it in the first place I’d have to have some evidence that Omega was honest. Really I only need a 10% or so chance of him being honest to pick him up. So I’d probably go “my evidence was wrong, dang, now I’m out the $5 for gas and the 3 utilons of having to ride with that jerk Omega.” This would also be new evidence that changed my probabilities by varying amounts.
So the analogy is that giving the ride to the bad AI is like helping it come into existence, and it not paying is like it doing horrible things to you anyway? If that’s the case, I might well think to myself “DOES NOT COMPUTE.”