You mention some general ways to get non-myopic behavior, but when it comes to myopic behavior you default to a clean, human-comprehensible agent model. I’m curious if you have any thoughts on open avenues related to training procedures that encourage myopia in inner optimizers, even if those inner optimizers are black boxes? I do seem to vaguely recall a post from one of you about this, or maybe it was Richard Ngo.
I think that trying to encourage myopia via behavioral incentives is likely to be extremely difficult, if not impossible (at least without a better understanding of our training processes’ inductive biases). Krueger et al.’s “Hidden Incentives for Auto-Induced Distributional Shift” is a good resource for some of the problems that you run into when you try to do that. As a result, I think that mechanistic incentives are likely to be necessary—and I personally favor some form of relaxed adversarial training—but that’s going to require us to get a better understanding of what exactly it looks for an agent to be myopic or not so we know what the overseer in a setup like that should be looking for.
(Edited for having an actual point)
You mention some general ways to get non-myopic behavior, but when it comes to myopic behavior you default to a clean, human-comprehensible agent model. I’m curious if you have any thoughts on open avenues related to training procedures that encourage myopia in inner optimizers, even if those inner optimizers are black boxes? I do seem to vaguely recall a post from one of you about this, or maybe it was Richard Ngo.
I think that trying to encourage myopia via behavioral incentives is likely to be extremely difficult, if not impossible (at least without a better understanding of our training processes’ inductive biases). Krueger et al.’s “Hidden Incentives for Auto-Induced Distributional Shift” is a good resource for some of the problems that you run into when you try to do that. As a result, I think that mechanistic incentives are likely to be necessary—and I personally favor some form of relaxed adversarial training—but that’s going to require us to get a better understanding of what exactly it looks for an agent to be myopic or not so we know what the overseer in a setup like that should be looking for.