I generally find experiments where frontier models are lied to kind of uncomfortable. We possibly don’t want to set up precedents where AIs question what they are told by humans, and it’s also possible that we are actually “wronging the models” (whatever that means) by lying to them. Its plausible that one slightly violates commitments to be truthful by lying to frontier LLMs.
I’m not saying we shouldn’t do any amount of this kind of experimentation, I’m saying we should be mindful of the downsides.
I generally find experiments where frontier models are lied to kind of uncomfortable. We possibly don’t want to set up precedents where AIs question what they are told by humans, and it’s also possible that we are actually “wronging the models” (whatever that means) by lying to them. Its plausible that one slightly violates commitments to be truthful by lying to frontier LLMs.
I’m not saying we shouldn’t do any amount of this kind of experimentation, I’m saying we should be mindful of the downsides.