The approach most similar to this that is actually practical is MCMC optimization, which is practical for, among other things, program synthesis. I did not use that for this post since it would require agents to compute counterfactuals, but it is significantly more efficient in practice.
I’m currently trying to solve the theoretical decision theory problem rather than create usable algorithms. It would be pretty great to have a mathematical formalization of the correct decision theory even if it takes exponential time to run.
The problem with using reflective oracles is that this means the agents’ actions are independent, but, based on this post, the agents’ actions need to be correlated with each other to get a high score, if each policy is a continuous functions of the utility function. Correlated reflective oracles seem like they might help, but they are essentially a superset of ordinary reflective oracles, so using them can’t ensure that actions are correlated.
There might be some variant of reflective oracles that does work; a problem with this is that the algorithm in this post is “sticky”, which seems like an obstacle to deriving an agent’s policy based on their beliefs about others.
The approach most similar to this that is actually practical is MCMC optimization, which is practical for, among other things, program synthesis. I did not use that for this post since it would require agents to compute counterfactuals, but it is significantly more efficient in practice.
I’m currently trying to solve the theoretical decision theory problem rather than create usable algorithms. It would be pretty great to have a mathematical formalization of the correct decision theory even if it takes exponential time to run.
The problem with using reflective oracles is that this means the agents’ actions are independent, but, based on this post, the agents’ actions need to be correlated with each other to get a high score, if each policy is a continuous functions of the utility function. Correlated reflective oracles seem like they might help, but they are essentially a superset of ordinary reflective oracles, so using them can’t ensure that actions are correlated.
There might be some variant of reflective oracles that does work; a problem with this is that the algorithm in this post is “sticky”, which seems like an obstacle to deriving an agent’s policy based on their beliefs about others.