A system with a shutdown timer, in my sense, has no terms in its reward function which depend on what happens after the timer expires. (This is discussed in more detail in my previous post.) So there is no reason to persuade humans or do anything else to circumvent the timer, unless there is an inner alignment failure (maybe that’s what you mean by “deception instance”). Indeed, it is the formal verification that prevents inner alignment failures.
A system with a shutdown timer, in my sense, has no terms in its reward function which depend on what happens after the timer expires. (This is discussed in more detail in my previous post.) So there is no reason to persuade humans or do anything else to circumvent the timer, unless there is an inner alignment failure (maybe that’s what you mean by “deception instance”). Indeed, it is the formal verification that prevents inner alignment failures.