Optimal by the fairly obvious criterion of “gets agents who use it maximal rewards.” If you cared about which decision theory you used because of some extra factor, the problem would become one where the rewards were not solely action-determined or decision-determined, when that extra factor is cast in terms of reward.
If you prefer, I’m sure you could recast it using the word “wins.”
A CDTist when presented with Newcomb’s paradox, would say that, given his situation, $1,000 is the best he could hope for. Sure he could do better if box A had a million dollars in it, but he could also do better if box B had a million dollars. It doesn’t, so he can’t. He can’t make box A had a million dollars any more than he can make box B had a million dollars. He’s not a time-traveler. If you put a TDTist in this scenario, he’d get nothing. If you put anyone in a different scenario
An EDTist, when presented with a non-ideal Parfit’s Hitchhiker, when asked for the money, would say that he knows the guy picked him up. It’s one thing to change the past when it’s unknown. That’s really the only way it’s different from the future. But there is no way refusing at this point could possibly leave him stranded in the desert.
“Gets agents who use it maximal rewards” is a fairly obvious criterion to say, but there are some obvious unspoken assumptions. The difference between CDT, EDT, and TDT is what assumptions exist.
Well obviously. But agents using different decision theories do not disagree about rewards within the class of problems we’re talking about. So you can compare different decision theories using the idea of “reflective consistency,” where an agent is reflectively consistent only if it doesn’t think it could gain by changing itself. A CDT agent watching a TDT agent win a million dollars from Omega believes that if only it could change, it could win a million dollars too, so it is not reflectively consistent within the class of decision-determined problems.
Suppose Omega gave money exclusively to TDT agents. If this were the case, it would be good to precommit to being a TDT agent. This isn’t because TDT is better; it’s because the contest was unfair.
Maybe this is analagous to Newcomb’s problem. Maybe not. The point is, it’s not obvious whether or not it is.
I suppose what I’m trying to say is that it’s not that CDT usually gives the optimal solution but has a few flaws. It’s that CDT, EDT, and TDT agents have a different idea of what the “optimal solution” refers to. It’s not that optimal is the one the *DT strategy would pick. It’s that the strategy itself is to find the optimal solution, for some value of optimal.
Maybe this is analagous to Newcomb’s problem. Maybe not.
It’s different. If the reward isn’t determined by the decision the agent makes, but instead by how the agent made that decision, it isn’t a “decision-determined problem” anymore. That’s why I’ve been using that phrase. TDT is only generally good for decision-determined problems. Newcomb’s problem is a decision-determined problem, which is important because it doesn’t set out to expressly reward some type of agent; it’s fair.
It’s that the strategy itself is to find the optimal solution, for some value of optimal.
But all these “local optimal solutions” can be measured on the same scale, e.g. dollars. And so if the decision theory is just an intermediary—if what we really want is a dollar-maximizing agent, or a game-winning agent, we can compare different decision theories along a common yardstick. The best decision theories will be the ones that dominate all the others within a certain class of problems—they do as well or better than all other decision theories on every single problem of that class. This quickly becomes impossible for larger classes of problems, but can be made possible again by Occamian constraints like symmetry.
Optimal by the fairly obvious criterion of “gets agents who use it maximal rewards.” If you cared about which decision theory you used because of some extra factor, the problem would become one where the rewards were not solely action-determined or decision-determined, when that extra factor is cast in terms of reward.
If you prefer, I’m sure you could recast it using the word “wins.”
A CDTist when presented with Newcomb’s paradox, would say that, given his situation, $1,000 is the best he could hope for. Sure he could do better if box A had a million dollars in it, but he could also do better if box B had a million dollars. It doesn’t, so he can’t. He can’t make box A had a million dollars any more than he can make box B had a million dollars. He’s not a time-traveler. If you put a TDTist in this scenario, he’d get nothing. If you put anyone in a different scenario
An EDTist, when presented with a non-ideal Parfit’s Hitchhiker, when asked for the money, would say that he knows the guy picked him up. It’s one thing to change the past when it’s unknown. That’s really the only way it’s different from the future. But there is no way refusing at this point could possibly leave him stranded in the desert.
“Gets agents who use it maximal rewards” is a fairly obvious criterion to say, but there are some obvious unspoken assumptions. The difference between CDT, EDT, and TDT is what assumptions exist.
Well obviously. But agents using different decision theories do not disagree about rewards within the class of problems we’re talking about. So you can compare different decision theories using the idea of “reflective consistency,” where an agent is reflectively consistent only if it doesn’t think it could gain by changing itself. A CDT agent watching a TDT agent win a million dollars from Omega believes that if only it could change, it could win a million dollars too, so it is not reflectively consistent within the class of decision-determined problems.
Suppose Omega gave money exclusively to TDT agents. If this were the case, it would be good to precommit to being a TDT agent. This isn’t because TDT is better; it’s because the contest was unfair.
Maybe this is analagous to Newcomb’s problem. Maybe not. The point is, it’s not obvious whether or not it is.
I suppose what I’m trying to say is that it’s not that CDT usually gives the optimal solution but has a few flaws. It’s that CDT, EDT, and TDT agents have a different idea of what the “optimal solution” refers to. It’s not that optimal is the one the *DT strategy would pick. It’s that the strategy itself is to find the optimal solution, for some value of optimal.
It’s different. If the reward isn’t determined by the decision the agent makes, but instead by how the agent made that decision, it isn’t a “decision-determined problem” anymore. That’s why I’ve been using that phrase. TDT is only generally good for decision-determined problems. Newcomb’s problem is a decision-determined problem, which is important because it doesn’t set out to expressly reward some type of agent; it’s fair.
But all these “local optimal solutions” can be measured on the same scale, e.g. dollars. And so if the decision theory is just an intermediary—if what we really want is a dollar-maximizing agent, or a game-winning agent, we can compare different decision theories along a common yardstick. The best decision theories will be the ones that dominate all the others within a certain class of problems—they do as well or better than all other decision theories on every single problem of that class. This quickly becomes impossible for larger classes of problems, but can be made possible again by Occamian constraints like symmetry.