In the original ADT paper, the agents are allowed to output distributions over moves.
The fact that we take the limit as epsilon goes to 0 means the evil problem can’t be constructed, even if randomization is not allowed. (The proof in the ADT paper doesn’t work, but that doesn’t mean something like it couldn’t possibly work)
It’s basically saying “since the two actions A and A′ get equal expected utility in the limit, the total variation distance between a distribution over the two actions, and one of the actions, limits to zero”, which is false
You’re right, this is an error in the proof, good catch.
Re chicken: The interpretation of the embedder that I meant is “opponent only uses the embedder where it is up against [whatever policy you plugged in]”. This embedder does not get knocked down by the reality filter. Let Et be the embedder. The logical inductor expects Ut to equal the crash/crash utility, and it also expects Et(⌈ADTϵ⌉) to equal the crash/crash utility. The expressions Ut and Et(⌈ADTϵ⌉) are provably equal, so of course the logical inductor expects them to be the same, and the reality check passes.
The error in your argument is that you are embedding actions rather than agents. The fact that NeverSwerveBot and ADT both provably always take the straight action does not mean the embedder assigns them equal utilities.
Wasn’t there a fairness/continuity condition in the original ADT paper that if there were two “agents” that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if Et(|At−Bt|)<δ, then Et(|Et(At)−Et(Bt)|)<ϵ ) This would mean that it’d be impossible to have Et(Et(ADTt,ϵ)) be low while Et(Et(straightt)) is high, so the argument still goes through.
Although, after this whole line of discussion, I’m realizing that there are enough substantial differences between the original formulation of ADT and the thing I wrote up that I should probably clean up this post a bit and clarify more about what’s different in the two formulations. Thanks for that.
Yes, the continuity condition on embedders in the ADT paper would eliminate the embedder I meant. Which means the answer might depend on whether ADT considers discontinuous embedders. (The importance of the continuity condition is that it is used in the optimality proof; the optimality proof can’t apply to chicken for this reason).
The fact that we take the limit as epsilon goes to 0 means the evil problem can’t be constructed, even if randomization is not allowed. (The proof in the ADT paper doesn’t work, but that doesn’t mean something like it couldn’t possibly work)
You’re right, this is an error in the proof, good catch.
Re chicken: The interpretation of the embedder that I meant is “opponent only uses the embedder where it is up against [whatever policy you plugged in]”. This embedder does not get knocked down by the reality filter. Let Et be the embedder. The logical inductor expects Ut to equal the crash/crash utility, and it also expects Et(⌈ADTϵ⌉) to equal the crash/crash utility. The expressions Ut and Et(⌈ADTϵ⌉) are provably equal, so of course the logical inductor expects them to be the same, and the reality check passes.
The error in your argument is that you are embedding actions rather than agents. The fact that NeverSwerveBot and ADT both provably always take the straight action does not mean the embedder assigns them equal utilities.
Wasn’t there a fairness/continuity condition in the original ADT paper that if there were two “agents” that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if Et(|At−Bt|)<δ, then Et(|Et(At)−Et(Bt)|)<ϵ ) This would mean that it’d be impossible to have Et(Et(ADTt,ϵ)) be low while Et(Et(straightt)) is high, so the argument still goes through.
Although, after this whole line of discussion, I’m realizing that there are enough substantial differences between the original formulation of ADT and the thing I wrote up that I should probably clean up this post a bit and clarify more about what’s different in the two formulations. Thanks for that.
Yes, the continuity condition on embedders in the ADT paper would eliminate the embedder I meant. Which means the answer might depend on whether ADT considers discontinuous embedders. (The importance of the continuity condition is that it is used in the optimality proof; the optimality proof can’t apply to chicken for this reason).