Agreed that for a post-intelligent explosion AI alignment is effectively binary. I do agree with the sharp left turn etc positions, and don’t expect patches and cobbled together solutions to hold up to the stratosphere.
Weakly aligned—Guided towards the kinds of things we want in ways which don’t have strong guarantees. A central example is InstructGPT, but this also includes most interpretability (unless dramatically more effective than current generation), and what I understand to be Paul’s main approaches.
Weakly superintelligent—Superintelligent in some domains, but has not yet undergone recursive self improvement.
These are probably non-standard terms, I’m very happy to be pointed at existing literature with different ones which I can adopt.
I am confident Eliezer would roll his eyes, I have read a great deal of his work and recent debates. I respectfully disagree with his claim that you can’t get useful cognitive work on alignment out of systems which have not yet FOOMed and taken a sharp left turn, based on my understanding of intelligence as babble and prune. I don’t expect us to get enough cognitive work out of these systems in time, but it seems like a path which has non-zero hope.
It is plausible that AIs unavoidably FOOM before the point that they can contribute, but this seems less and less likely as capabilities advance and we notice we’re not dead.
I don’t nearly agree with either of those, and FOOM basically requires physics violations like violating Landauer’s Principle and needing arbitrarily small processors. I’m being frank because I suspect that a lot of a doom position requires hard takeoff, and on physics and history of what happens as AI improves, only the first improvement is a discontinuity, the rest start being far more smooth and slow. So that’s a big crux I have here.
Agreed that for a post-intelligent explosion AI alignment is effectively binary. I do agree with the sharp left turn etc positions, and don’t expect patches and cobbled together solutions to hold up to the stratosphere.
Weakly aligned—Guided towards the kinds of things we want in ways which don’t have strong guarantees. A central example is InstructGPT, but this also includes most interpretability (unless dramatically more effective than current generation), and what I understand to be Paul’s main approaches.
Weakly superintelligent—Superintelligent in some domains, but has not yet undergone recursive self improvement.
These are probably non-standard terms, I’m very happy to be pointed at existing literature with different ones which I can adopt.
I am confident Eliezer would roll his eyes, I have read a great deal of his work and recent debates. I respectfully disagree with his claim that you can’t get useful cognitive work on alignment out of systems which have not yet FOOMed and taken a sharp left turn, based on my understanding of intelligence as babble and prune. I don’t expect us to get enough cognitive work out of these systems in time, but it seems like a path which has non-zero hope.
It is plausible that AIs unavoidably FOOM before the point that they can contribute, but this seems less and less likely as capabilities advance and we notice we’re not dead.
I don’t nearly agree with either of those, and FOOM basically requires physics violations like violating Landauer’s Principle and needing arbitrarily small processors. I’m being frank because I suspect that a lot of a doom position requires hard takeoff, and on physics and history of what happens as AI improves, only the first improvement is a discontinuity, the rest start being far more smooth and slow. So that’s a big crux I have here.