At this point I am not convinced any problem where agents in the environment have access to our decision theory’s source code or copies of our agent are fair problems. But my impression from hearing and reading what people talk about is that this is a heretical position.
It seems somewhat likely to me that agents will be reasoning about each other using access to source code fairly soon (if just human operators evaluating whether or not to run intelligent programs, or what inputs to give to those programs). So then the question is something like: “what’s the point of declaring a problem unfair?”, to which the main answer seems to be “to spend limited no free lunch points.” If I perform poorly on worlds that don’t exist in order to perform better on worlds that do exist, that’s a profitable trade.
Which leads to this:
I disagree with this view and see Newcomb’s problem as punishing rational agents.
...
My big complaint with mind reading is that there just isn’t any mind reading.
One thing that seems important (for decision theories implemented by humans or embedded agents, as distinct from decision theories implemented by Cartesian agents) is whether or not the decision theory is robust to ignorance / black swans. That is, if you bake into your view of the world that mind reading is impossible, then you can be durably exploited by any actual mind reading (whereas having some sort of ontological update process or low probability on bizarre occurrences allows you to only be exploited a finite number of times).
But note the connection to the earlier bit—if something is actually impossible, then it feels costless to give up on it in order to perform better in the other worlds. (My personal resolution to counterfactual mugging, for example, seems to rest on an underlying belief that it’s free to write off logically inconsistent worlds, in a way that it’s not free to write off factually inconsistent worlds that could have been factually consistent / are factually consistent in a different part of the multiverse.)
I think the current piece that points at this question most directly is Success-First Decision Theories by Preston Greene.
It seems somewhat likely to me that agents will be reasoning about each other using access to source code fairly soon (if just human operators evaluating whether or not to run intelligent programs, or what inputs to give to those programs). So then the question is something like: “what’s the point of declaring a problem unfair?”, to which the main answer seems to be “to spend limited no free lunch points.” If I perform poorly on worlds that don’t exist in order to perform better on worlds that do exist, that’s a profitable trade.
Which leads to this:
One thing that seems important (for decision theories implemented by humans or embedded agents, as distinct from decision theories implemented by Cartesian agents) is whether or not the decision theory is robust to ignorance / black swans. That is, if you bake into your view of the world that mind reading is impossible, then you can be durably exploited by any actual mind reading (whereas having some sort of ontological update process or low probability on bizarre occurrences allows you to only be exploited a finite number of times).
But note the connection to the earlier bit—if something is actually impossible, then it feels costless to give up on it in order to perform better in the other worlds. (My personal resolution to counterfactual mugging, for example, seems to rest on an underlying belief that it’s free to write off logically inconsistent worlds, in a way that it’s not free to write off factually inconsistent worlds that could have been factually consistent / are factually consistent in a different part of the multiverse.)
Thanks for your detailed reply! I’ll look into that reference.