Yeah, I can think of two general ways to interpret this:
In a variant of CEV, the AI uses our utterances as evidence for what we would have told it if we thought more quickly etc. No single utterance carries much risk because the AI will collect lots of evidence and this will likely correct any misleading effects.
Having successfully translated the quoted instruction into formal code, we add another possible point of failure.
Yeah, I can think of two general ways to interpret this:
In a variant of CEV, the AI uses our utterances as evidence for what we would have told it if we thought more quickly etc. No single utterance carries much risk because the AI will collect lots of evidence and this will likely correct any misleading effects.
Having successfully translated the quoted instruction into formal code, we add another possible point of failure.