If your hypothesis smears probability over a wider range of outcomes than mine, while I can more sharply predict events using my theory of how alignment works—that constitutes a Bayes-update towards my theory and away from yours.
There is a reference class judgement in this. If I have a theory of good moves in Go (and absently dabble in chess a little bit), while you have a great theory of chess, looking at some move in chess shouldn’t lead to a Bayes-update against ability of my theory to reason about Go. The scope of classical alignment worries is typically about the post-AGI situation. If it manages to say something uninformed about the pre-AGI situation, that’s something out of its natural scope, and shouldn’t be meaningful evidence either way.
I think the correct way of defeating classical alignment worries (about the post-AGI situation) is on priors, looking at the arguments themselves, not on observations where the theory doesn’t expect to have clear or good predictions (and empirically doesn’t). If the arguments appear weak, there is no recourse without observation of the post-AGI world, it remains weak at least until then. Even if it happened to have made good predictions about the current situation, it shouldn’t count in its favor.
There is a reference class judgement in this. If I have a theory of good moves in Go (and absently dabble in chess a little bit), while you have a great theory of chess, looking at some move in chess shouldn’t lead to a Bayes-update against ability of my theory to reason about Go. The scope of classical alignment worries is typically about the post-AGI situation. If it manages to say something uninformed about the pre-AGI situation, that’s something out of its natural scope, and shouldn’t be meaningful evidence either way.
I think the correct way of defeating classical alignment worries (about the post-AGI situation) is on priors, looking at the arguments themselves, not on observations where the theory doesn’t expect to have clear or good predictions (and empirically doesn’t). If the arguments appear weak, there is no recourse without observation of the post-AGI world, it remains weak at least until then. Even if it happened to have made good predictions about the current situation, it shouldn’t count in its favor.