Possibly a third adversarial AI? Have an AI that generates the questions based on P, is rewarded if the second AI evaluates their probability as close to 50%, is rewarded for the first AI being able to get them right based on P’, and for the human getting them wrong.
That’s probably not quite right; we want the AI to generate hard but not impossible questions. Possibly some sort of term about the AIs predicting whether the human will get a question right?
Possibly a third adversarial AI? Have an AI that generates the questions based on P, is rewarded if the second AI evaluates their probability as close to 50%, is rewarded for the first AI being able to get them right based on P’, and for the human getting them wrong.
That’s probably not quite right; we want the AI to generate hard but not impossible questions. Possibly some sort of term about the AIs predicting whether the human will get a question right?