Interesting post. I agree that a myopic oracle is a promising idea. In particular if it were used to accelerate AGI alignment research. (It could be dangerous if used in many other ways.)
> The major shortcoming of a myopic Oracle is that its long-term plans cannot be expected to work reliably.
I don’t think this is necessarily true. It depends on the sort of myopia which the oracle is imbued with. If it has a myopia where it is incapable of thinking about the long-term future, then this is correct.
But it could have a different kind of myopia where it is very capable of thinking about the long-term future, but its goal is still to maximize its short term reward. In this case, it could be incentivized to go to the trouble of creating a good long-term plan if that meant it was more likely to get the short-term reward.
If we had such a nice form of myopia, the best formats I’ve seen so far for creating such an oracle with it are debate (which you referenced) and market making. Market making is particularly interesting because it is compatible with per-step myopia, which seems like it will be easier to enforce than per-episode myopia (what debate seems to require).
I don’t think there is any known way to verify/enforce any of these kinds of myopia yet though. I still need to read your post about reverse-intent alignment to better understand that idea.
Interesting post. I agree that a myopic oracle is a promising idea. In particular if it were used to accelerate AGI alignment research. (It could be dangerous if used in many other ways.)
> The major shortcoming of a myopic Oracle is that its long-term plans cannot be expected to work reliably.
I don’t think this is necessarily true. It depends on the sort of myopia which the oracle is imbued with. If it has a myopia where it is incapable of thinking about the long-term future, then this is correct.
But it could have a different kind of myopia where it is very capable of thinking about the long-term future, but its goal is still to maximize its short term reward. In this case, it could be incentivized to go to the trouble of creating a good long-term plan if that meant it was more likely to get the short-term reward.
If we had such a nice form of myopia, the best formats I’ve seen so far for creating such an oracle with it are debate (which you referenced) and market making. Market making is particularly interesting because it is compatible with per-step myopia, which seems like it will be easier to enforce than per-episode myopia (what debate seems to require).
I don’t think there is any known way to verify/enforce any of these kinds of myopia yet though. I still need to read your post about reverse-intent alignment to better understand that idea.