I found this very persuasive and convincing provided that success looks only like a comprehensive theory of alignment.
In worlds where a non rigorous “craft” of alignment is “good enough” for capabilities within a certain range, I don’t think this model/your conclusions necessarily hold.
And the dynamics of takeoff will also determine if having a craft of alignment that’s good enough for par human AI but not superhuman AI is success (if takeoff is slow enough, you can use your aligned HLAI to help you figure out alignment for superhuman AI).
You can generalise this further to a notion of alignment escape velocity. You don’t yet have a comprehensive theory of alignment that’s robust to arbitrary capability amplification, but you have a craft of alignment that’s robust to current/near term capabilities and can use current aligned AI to help develop your alignment craft for more capable agents, before those agents are then developed (this lead of alignment craft over capabilities needs to hold for “alignment escape velocity”. In practice it probably looks like coordination to not develop more powerful AI than we know how to align.)
I expect much slower takeoff than EY (at least a few years between a village idiot and John Von Neumann, and even longer to transition from John Von Neumann to strongly superhuman intelligence [I expect marginal returns to cognitive investment to diminish considerably as you advance along the capabilities frontier]) , so I’m very sympathetic to alignment escape velocity.
(All of the above said, I’m still young [24], so I’ll probably devote my career [at least the first decade of it] to the “figure out a comprehensive theory of alignment” thing. It optimises for personal impact on a brighter future [which is very important to me], and it sounds like a really fun problem [I like abstract thinking].)
I found this very persuasive and convincing provided that success looks only like a comprehensive theory of alignment.
In worlds where a non rigorous “craft” of alignment is “good enough” for capabilities within a certain range, I don’t think this model/your conclusions necessarily hold.
And the dynamics of takeoff will also determine if having a craft of alignment that’s good enough for par human AI but not superhuman AI is success (if takeoff is slow enough, you can use your aligned HLAI to help you figure out alignment for superhuman AI).
You can generalise this further to a notion of alignment escape velocity. You don’t yet have a comprehensive theory of alignment that’s robust to arbitrary capability amplification, but you have a craft of alignment that’s robust to current/near term capabilities and can use current aligned AI to help develop your alignment craft for more capable agents, before those agents are then developed (this lead of alignment craft over capabilities needs to hold for “alignment escape velocity”. In practice it probably looks like coordination to not develop more powerful AI than we know how to align.)
I expect much slower takeoff than EY (at least a few years between a village idiot and John Von Neumann, and even longer to transition from John Von Neumann to strongly superhuman intelligence [I expect marginal returns to cognitive investment to diminish considerably as you advance along the capabilities frontier]) , so I’m very sympathetic to alignment escape velocity.
(All of the above said, I’m still young [24], so I’ll probably devote my career [at least the first decade of it] to the “figure out a comprehensive theory of alignment” thing. It optimises for personal impact on a brighter future [which is very important to me], and it sounds like a really fun problem [I like abstract thinking].)