When you talk about “other sources of risk from misalignment,” these sound like milder / easier-to-tackle versions of the assumptions you’ve listed? Your assumptions sound like they focus on the worst case scenario. If you can solve the harder version then I would imagine that the easier version would also be solved, no?
Yeah, if you handle scheming, you solve all my safety problems, but not the final bullet point of “models fail to live up to their potential” problems.
When you talk about “other sources of risk from misalignment,” these sound like milder / easier-to-tackle versions of the assumptions you’ve listed? Your assumptions sound like they focus on the worst case scenario. If you can solve the harder version then I would imagine that the easier version would also be solved, no?
Yeah, if you handle scheming, you solve all my safety problems, but not the final bullet point of “models fail to live up to their potential” problems.
(Though the “fail to live up to potential” problems are probably mostly indirect, see here.)