This is the most intuitive answer to me, as well. It’s also extremely difficult, and it‘s unclear how it is going to be useful for doing alignment generally.
Perhaps one idea is to train AI to write legible code, then use human code review on it. This seems as safe as our current mode of software development if the AI is not actively obfuscating (a big assumption).
This is the most intuitive answer to me, as well. It’s also extremely difficult, and it‘s unclear how it is going to be useful for doing alignment generally.
Perhaps one idea is to train AI to write legible code, then use human code review on it. This seems as safe as our current mode of software development if the AI is not actively obfuscating (a big assumption).