It’s obvious to us that the prompts are lying; how do you know it isn’t also obvious to the AI? (To the degree it even makes sense to talk about the AI having “revealed preferences”)
It’s obvious to us that the prompts are lying; how do you know it isn’t also obvious to the AI? (To the degree it even makes sense to talk about the AI having “revealed preferences”)