What notion of “deceptive alignment” is narrower than that?
Any definition that makes mention of the specific structure/internals of the model.
Any definition that makes mention of the specific structure/internals of the model.