I previously thought the argument for measurement tampering being more tractable then general ELK was mostly about the structural / causal properties of multiple independent measurements, but I think I’m more swayed by the argument that measurement tampering will just be more obvious (both easier to see using interpretability and more anomalous in general) then e.g. sycophancy. This is a flimsier argument though, and is less likely to hold when tampering is more subtle.
I previously thought the argument for measurement tampering being more tractable then general ELK was mostly about the structural / causal properties of multiple independent measurements, but I think I’m more swayed by the argument that measurement tampering will just be more obvious (both easier to see using interpretability and more anomalous in general) then e.g. sycophancy. This is a flimsier argument though, and is less likely to hold when tampering is more subtle.