In retrospect, I’ve become more optimistic about alignment, and I now think there’s quite a bit more evidence towards the low-stakes version of the alignment problem being quite easy to solve for AIs, and while I don’t expect anything close to formal proof (because of edge cases), I now think that the techniques that were developed are enough such that we can declare victory on the low-stakes alignment problem, and as a consequence I now think we can declare the outer alignment problem to be mostly solved at this point (given Paul’s definition of low-stakes alignment).
The fact that a lot of AI risk scenarios rely more on deceptively aligned AI changing the world in an abrupt and irreversible way than in the past corroborates this.
In retrospect, I’ve become more optimistic about alignment, and I now think there’s quite a bit more evidence towards the low-stakes version of the alignment problem being quite easy to solve for AIs, and while I don’t expect anything close to formal proof (because of edge cases), I now think that the techniques that were developed are enough such that we can declare victory on the low-stakes alignment problem, and as a consequence I now think we can declare the outer alignment problem to be mostly solved at this point (given Paul’s definition of low-stakes alignment).
The fact that a lot of AI risk scenarios rely more on deceptively aligned AI changing the world in an abrupt and irreversible way than in the past corroborates this.