And yes, even if AIs behave predictably in ordinary situations, they might act weird in unusual situations, and act deceptively when they can get away with it. But the same applies to humans, which is why we test in unusual situations, especially for deception, and monitor more closely when context changes rapidly.
“But the same applies to humans” doesn’t seem like an adequate response when the AI system is superintelligent or past the “sharp left turn” capabilities threshold. Solutions that work for unaligned deceptive humans won’t save us from a sufficiently intelligent/capable unaligned deceptive entity.
“But the same applies to humans” doesn’t seem like an adequate response when the AI system is superintelligent or past the “sharp left turn” capabilities threshold. Solutions that work for unaligned deceptive humans won’t save us from a sufficiently intelligent/capable unaligned deceptive entity.