I see that working. But we still have the problem that if the number of answers is too large, somewhere there is going to be an answer X, such that the most likely behaviour for a human that answers X is to write something dangerous. Now, that’s ok if the AI has two clearly defined processes: first find the top answer, independently of how it’s written up, then write up as a human. If those goals are mixed, it will go awry.
I see that working. But we still have the problem that if the number of answers is too large, somewhere there is going to be an answer X, such that the most likely behaviour for a human that answers X is to write something dangerous. Now, that’s ok if the AI has two clearly defined processes: first find the top answer, independently of how it’s written up, then write up as a human. If those goals are mixed, it will go awry.