C. Solving prosaic alignment on the first critical try is not as difficult, nor as dangerous, nor taking as much extra time, as Yudkowsky predicts; whatever effort is put forth by the leading coalition works inside of their lead time.
This is the majority of my probability mass, in the 60-90% probability range, in that I believe that alignment is way easier than the majority of LWers think.
Specifically, I believe we have a pretty straightforward path to alignment, if somewhat tedious and slightly difficult.
I also believe that 2 problems of embedded agency, Goodhart’s law and subsystem alignment, are actually pretty resolvable, and in particular, I think embeddedness matters way less for AIs than humans, primarily because I believe that deep learning showed that large offline learning from datasets works, and in particular, embedded agency concerns go away in an offline learning setting.
This is the majority of my probability mass, in the 60-90% probability range, in that I believe that alignment is way easier than the majority of LWers think.
Specifically, I believe we have a pretty straightforward path to alignment, if somewhat tedious and slightly difficult.
I also believe that 2 problems of embedded agency, Goodhart’s law and subsystem alignment, are actually pretty resolvable, and in particular, I think embeddedness matters way less for AIs than humans, primarily because I believe that deep learning showed that large offline learning from datasets works, and in particular, embedded agency concerns go away in an offline learning setting.