this failure mode is dangerous because of scheming AI and I say it’s dangerous because the policy is OOD
I would say that it is dangerous in the case where is is both OOD enough that the AI can discriminate and the AI is scheming.
Neither alone would present a serious (i.e. catastrophic) risk in the imitation context we discuss.
[resolved]
Thanks, I improved the wording.
I would say that it is dangerous in the case where is is both OOD enough that the AI can discriminate and the AI is scheming.
Neither alone would present a serious (i.e. catastrophic) risk in the imitation context we discuss.
[resolved]
Thanks, I improved the wording.