M. Y. Zuo comments on Announcing Apollo Research

M. Y. Zuo 1 Jun 2023 1:42 UTC
1 point
0
Thanks for the posting the announcement.
We think that strategic AI deception – where a model outwardly seems aligned but is in fact misaligned – is a crucial step in many major catastrophic AI risk scenarios and that detecting deception in real-world models is the most important and tractable step to addressing this problem.
Can you elaborate on why the team believes it’s the most important and most tractable step?