Good post. I think it’s important to distinguish (some version of) these concepts (i.e. SD vs DA).
When an AI has Misaligned goals and uses Strategic Deception to achieve them.
This statement doesn’t seem to capture exactly what you mean by DA in the rest of the post. In particular, a misaligned AI may use SD to achieve its goals, without being deceptive about its alignment / goals. DA, as you’ve discussed it later, seems to be deception about alignment / goals.
Nathan’s suggestion is that adding noise to a sandbagging model might increase performance, rather than decrease it as usual for a non-sandbagging model. It’s an interesting idea!