I know, but my point is that such a model might be very perverse, such as “Humans do not expect to find out that you presented misleading information.” rather than “Humans do not expect that you present misleading information.”
You’re right. This thing can come up in terms of “predicting human behaviour”, if the AI is sneaky enough. It wouldn’t come up in “compare human models of the world to reality”. So there are subtle nuances there to dig into...
I know, but my point is that such a model might be very perverse, such as “Humans do not expect to find out that you presented misleading information.” rather than “Humans do not expect that you present misleading information.”
You’re right. This thing can come up in terms of “predicting human behaviour”, if the AI is sneaky enough. It wouldn’t come up in “compare human models of the world to reality”. So there are subtle nuances there to dig into...