Yonatan Cale comments on Thoughts On (Solving) Deep Deception

Yonatan Cale 9 Dec 2024 22:59 UTC
4 points
0
Intuitively, this involves two components: the ability to robustly steer high-level structures like objectives, and something good to target at.
I agree.
But if we solve these two problems then I think you could go further and say we don’t really need to care about deceptiveness at all. Our AI will just be aligned.
P.S
“Ah”, but straw-you says,
This made me laugh
- Jozdien 10 Dec 2024 13:48 UTC
  5 points
  1
  Parent
  But if we solve these two problems then I think you could go further and say we don’t really need to care about deceptiveness at all. Our AI will just be aligned.
  I agree, but one idea behind deep deception is that it’s an easy-to-miss failure mode. Specifically, I had someone come up after a talk on high-level interpretability to say it didn’t solve deep deception, and well, I disagreed. I don’t talk about it in terms of deceptiveness, but it glosses over a few inferential steps relating to deception that are easy to stumble over, so the claim wasn’t without merit—especially because I think many other agendas miss that insight.