COEDT can be thought of as “learning” from an infinite sequence of agents who explore less and less.
Interestingly, the issue COEDT has with sequential decision problems looks suspiciously similar to folk theorems in iterated game theory (which also imply that completely-aligned agents can get a very bad outcome because they will each maximally punish anyone who doesn’t play the grim trigger strategy). There might be some kind of folk theorem for COEDT, though there’s a complication in that, conditioning on yourself taking a probability-0 action, you get both worlds where you are being punished and ones where you are punishing someone else, which might mean counterfactual punishments can’t be maximal for everyone at once (yay?).
COEDT ensures that conditionals on each action exist at all, but it doesn’t ensure that agents behave even remotely sanely in these conditionals, as it’s still conditioning on a very rare event, and the relevant rationality conditions permit agents to behave insanely with very small probability. What would be really nice is to get some set of conditional beliefs under which:
no one takes any strictly dominated actions with nonzero probability (i.e. an action such that all possible worlds where the agent takes this action are worse than all possible worlds where the agent doesn’t)
conditional on any subset of the agents taking non-strictly-dominated actions, no agent takes any strictly dominated action with nonzero probability
(I suspect this is easier for common-payoff problems; for non-common-payoff problems, agents might take strictly dominated actions as a form of extortion)
COEDT doesn’t get this but perhaps a similar construction (maybe using the hyperreals?) does.
Interestingly, the issue COEDT has with sequential decision problems looks suspiciously similar to folk theorems in iterated game theory (which also imply that completely-aligned agents can get a very bad outcome because they will each maximally punish anyone who doesn’t play the grim trigger strategy). There might be some kind of folk theorem for COEDT, though there’s a complication in that, conditioning on yourself taking a probability-0 action, you get both worlds where you are being punished and ones where you are punishing someone else, which might mean counterfactual punishments can’t be maximal for everyone at once (yay?).
COEDT ensures that conditionals on each action exist at all, but it doesn’t ensure that agents behave even remotely sanely in these conditionals, as it’s still conditioning on a very rare event, and the relevant rationality conditions permit agents to behave insanely with very small probability. What would be really nice is to get some set of conditional beliefs under which:
no one takes any strictly dominated actions with nonzero probability (i.e. an action such that all possible worlds where the agent takes this action are worse than all possible worlds where the agent doesn’t)
conditional on any subset of the agents taking non-strictly-dominated actions, no agent takes any strictly dominated action with nonzero probability
(I suspect this is easier for common-payoff problems; for non-common-payoff problems, agents might take strictly dominated actions as a form of extortion)
COEDT doesn’t get this but perhaps a similar construction (maybe using the hyperreals?) does.