“One reason that goes overlooked is that most human beings are not utilitarians”
I think this point is just straightforwardly wrong. Even from a purely selfish perspective, it’s reasonable to want to stop AI.
The main reason humanity is not going to stop seems mainly like coordination problems, or something close to learned helplessness in these kind of competitive dynamics.
I’m not sure that’s true. It’s true if you adopt the dominant local perspective “alignment is very hard and we need more time to do it”. But there are other perspectives: see “AI is easy to control” by Pope & Belrose, arguing that the success of RLHF means there’s a less than 1% risk of extinction from AI. I think this perspective is both subtly wrong and deeply confused in mistaking alignment with total x-risk, but the core argument isn’t obviously wrong. So reasonable people can and do argue for full speed ahead on AGI.
I agree with pretty much all of the counterarguments made by Steve Byrnes in his Thoughts on “AI is easy to control” by Pope & Belrose. But not all reasonable people will. And those who are also non-utilitarians (most of humanity) will be pursuing AGI ASAP for rational (if ultimately subtly wrong) reasons.
I think we need to understand and take this position seriously to do a good job of avoiding extinction as best we can.
Basically, I think whether or not one thinks whether alignment is hard or not is much more of the crux than whether or not they’re utilitarian.
Pesonally, I don’t find Pope & Belrose very convincing, although I do commend them for the reasonable effort—but if I did believe that AI is likely to go well, I’d probably also be all for it. I just don’t see how this is related to utilitarianism (maybe for all but a very small subset of people in EA).
“One reason that goes overlooked is that most human beings are not utilitarians” I think this point is just straightforwardly wrong. Even from a purely selfish perspective, it’s reasonable to want to stop AI.
The main reason humanity is not going to stop seems mainly like coordination problems, or something close to learned helplessness in these kind of competitive dynamics.
I’m not sure that’s true. It’s true if you adopt the dominant local perspective “alignment is very hard and we need more time to do it”. But there are other perspectives: see “AI is easy to control” by Pope & Belrose, arguing that the success of RLHF means there’s a less than 1% risk of extinction from AI. I think this perspective is both subtly wrong and deeply confused in mistaking alignment with total x-risk, but the core argument isn’t obviously wrong. So reasonable people can and do argue for full speed ahead on AGI.
I agree with pretty much all of the counterarguments made by Steve Byrnes in his Thoughts on “AI is easy to control” by Pope & Belrose. But not all reasonable people will. And those who are also non-utilitarians (most of humanity) will be pursuing AGI ASAP for rational (if ultimately subtly wrong) reasons.
I think we need to understand and take this position seriously to do a good job of avoiding extinction as best we can.
Basically, I think whether or not one thinks whether alignment is hard or not is much more of the crux than whether or not they’re utilitarian.
Pesonally, I don’t find Pope & Belrose very convincing, although I do commend them for the reasonable effort—but if I did believe that AI is likely to go well, I’d probably also be all for it. I just don’t see how this is related to utilitarianism (maybe for all but a very small subset of people in EA).