Note that “LLMs are evidence against this hypothesis” isn’t my main point here. The main claim is that the positive arguments for deceptive alignment are flimsy, and thus the prior is very low.
Note that “LLMs are evidence against this hypothesis” isn’t my main point here. The main claim is that the positive arguments for deceptive alignment are flimsy, and thus the prior is very low.