But a chatbot is very unlikely to produce coherent strings of text, let alone something that has value. So your AI system will be too constrained, and produce terrible output. In order to make it probable that a chatbot could have produced that output.
Or it will produce something that is incredibly improbable, and therefore definitely produced by an AI. And probably dangerous, since you’ve allowed a lot of optimization.
Why not just use humans? You don’t need to whole brain emulation. Just put a human in a room for 3 weeks and have them think about a problem. At the end of 3 weeks, they write down the answer on a sheet of paper. The AI also writes down it’s answer. A different AI, reads both, and tries to guess which one was produced by the AI and the human.
The first AI optimizes for being able to fool the judge and have it’s paper be indistinguishable from human. But it does it conditionally on the human solving the problem. If the human doesn’t solve the problem, the AI wins automatically. So the AI needs to come up with a correct solution, but also one that looks like a human produced it.
You may not even need to use real humans or have a real AI judge at all. Just assume this is all true as a counterfactual. Alternatively, you may need to do this a lot. To get training data, and show the AI the test is real.
The problem is that the human will know their answer, and could communicate it later if they’re let out of the box. Maybe we could get online users to submit answers to the question. Then the AI selects the best answer from the choices. And if the AI is not turned on, a random answer is selected.
In my version, the AI only cares about what the AI judge thinks of it’s answer. It doesn’t care about anything that happens afterwards. However, it wants to look as human as possible. So it can’t put infohazards into it’s answer.
But a chatbot is very unlikely to produce coherent strings of text, let alone something that has value. So your AI system will be too constrained, and produce terrible output. In order to make it probable that a chatbot could have produced that output.
Or it will produce something that is incredibly improbable, and therefore definitely produced by an AI. And probably dangerous, since you’ve allowed a lot of optimization.
Why not just use humans? You don’t need to whole brain emulation. Just put a human in a room for 3 weeks and have them think about a problem. At the end of 3 weeks, they write down the answer on a sheet of paper. The AI also writes down it’s answer. A different AI, reads both, and tries to guess which one was produced by the AI and the human.
The first AI optimizes for being able to fool the judge and have it’s paper be indistinguishable from human. But it does it conditionally on the human solving the problem. If the human doesn’t solve the problem, the AI wins automatically. So the AI needs to come up with a correct solution, but also one that looks like a human produced it.
You may not even need to use real humans or have a real AI judge at all. Just assume this is all true as a counterfactual. Alternatively, you may need to do this a lot. To get training data, and show the AI the test is real.
The problem is that the human will know their answer, and could communicate it later if they’re let out of the box. Maybe we could get online users to submit answers to the question. Then the AI selects the best answer from the choices. And if the AI is not turned on, a random answer is selected.
In my version, the AI only cares about what the AI judge thinks of it’s answer. It doesn’t care about anything that happens afterwards. However, it wants to look as human as possible. So it can’t put infohazards into it’s answer.
Interesting. I’ll think on that.