We do have empirical evidence that nonrobust aligned intelligence can be not OK, like this or this. Why are you not more worried about superintelligent versions of these (i.e. with access to galaxies worth of resources)?
What do you mean by “nonrobust” aligned intelligence? Is “robust” being used in the “robust grading” sense, or in the “robust values” sense (of e.g. agents caring about lots of things, only some of which are human-aligned), or some other sense?
Anyways, responding to the vibe of your comment—I feel… quite worried about that? Is there something I wrote which gave the impression otherwise? Maybe the vibe of the post is “alignment admits way more dof than you may have thought” which can suggest I believe “alignment is easy with high probability”?
We do have empirical evidence that nonrobust aligned intelligence can be not OK, like this or this. Why are you not more worried about superintelligent versions of these (i.e. with access to galaxies worth of resources)?
What do you mean by “nonrobust” aligned intelligence? Is “robust” being used in the “robust grading” sense, or in the “robust values” sense (of e.g. agents caring about lots of things, only some of which are human-aligned), or some other sense?
Anyways, responding to the vibe of your comment—I feel… quite worried about that? Is there something I wrote which gave the impression otherwise? Maybe the vibe of the post is “alignment admits way more dof than you may have thought” which can suggest I believe “alignment is easy with high probability”?