The problem is that the human will know their answer, and could communicate it later if they’re let out of the box. Maybe we could get online users to submit answers to the question. Then the AI selects the best answer from the choices. And if the AI is not turned on, a random answer is selected.
In my version, the AI only cares about what the AI judge thinks of it’s answer. It doesn’t care about anything that happens afterwards. However, it wants to look as human as possible. So it can’t put infohazards into it’s answer.
The problem is that the human will know their answer, and could communicate it later if they’re let out of the box. Maybe we could get online users to submit answers to the question. Then the AI selects the best answer from the choices. And if the AI is not turned on, a random answer is selected.
In my version, the AI only cares about what the AI judge thinks of it’s answer. It doesn’t care about anything that happens afterwards. However, it wants to look as human as possible. So it can’t put infohazards into it’s answer.
Interesting. I’ll think on that.