For what it’s worth I’ve also tried to specify what we mean by AI alignment formally. From my perspective these are interesting different kinds of alignment but all still suffer from insufficient precision. For example, what is a “terminal value” exactly and whence does it arise? I think nailing these things down will prove important as we make further progress towards engineering aligned AI and discover the need to ground out some of our ideas in the implementation.
For what it’s worth I’ve also tried to specify what we mean by AI alignment formally. From my perspective these are interesting different kinds of alignment but all still suffer from insufficient precision. For example, what is a “terminal value” exactly and whence does it arise? I think nailing these things down will prove important as we make further progress towards engineering aligned AI and discover the need to ground out some of our ideas in the implementation.