It sounds like both sides of the debate are implicitly agreeing that capabilities research is bad. I don’t think that claim necessarily follows from the debate premise that “we both believe they are risking either extinction or takeover by an ~eternal alien dictatorship”. I suspect disagreement about capabilities research being bad is the actual crux of the issue for some or many people who believe in AI risk yet still work for capabilities companies, rather than the arguments made in this debate.
The reason it doesn’t necessarily follow is that there’s a spectrum of possibilities about how entangled safety and capabilities research are. On one end of the spectrum, they are completely orthogonal: we can fully solve AI Safety without ever making any progress on AI capabilities. On the other end of the spectrum, they are completely aligned: figuring out how to make an AI that successfully follows, say, humanity’s coherent extrapolated volition, is the same problem as figuring out how to make an AI that successfully performs any other arbitrarily-complicated real world task.
It seems likely to me that the truth is in the middle of the spectrum, but it’s not entirely obvious where on the spectrum it lies. I think it’s highly unlikely pure theoretical work is sufficient to guarantee safe AI, because without advancements in capabilities, we are very unlikely to have an accurate enough model about what AGI actually is to be able to do any useful safety work on it. On the flip side, I doubt AI safety is as simple as making a sufficiently smart AI and just telling it “please be safe, you know what I mean by that right?”.
The degree of entanglement dictates the degree to which capabilities research is intrinsically bad. If we live in a world close to the orthogonal end of the spectrum, it’s relatively easy to distinguish “good research” from “bad research”, and if you do bad research, you are a bad person, shame on you, let’s create social / legal incentives to stop it.
On the flip side, if we live in a world closer to the convergent end of the spectrum, where making strides on AI safety problems is closely tied to advances in capabilities, then some capability research will be necessary, and demonizing capability research will lead to the perverse outcome that it’s mostly done by people who don’t care about safety, whereas we’d prefer a world where the leading AI labs are staffed with brilliant, highly-safety-conscious safety/capabilities researchers who push the frontier in both directions.
It’s outside the scope of this comment for me to weigh in on which end of the spectrum we live on, but I think that’s real crux of this debate, and why smart people in good conscience can fling themselves into capabilities research: their world model is different than AI safety researchers coming from a very orthogonal perspective (my understanding of MIRI’s original philosophy is that they believe that there’s a tremendous amount of safety research that can be done in the absence of capabilities advances).
It sounds like both sides of the debate are implicitly agreeing that capabilities research is bad. I don’t think that claim necessarily follows from the debate premise that “we both believe they are risking either extinction or takeover by an ~eternal alien dictatorship”. I suspect disagreement about capabilities research being bad is the actual crux of the issue for some or many people who believe in AI risk yet still work for capabilities companies, rather than the arguments made in this debate.
The reason it doesn’t necessarily follow is that there’s a spectrum of possibilities about how entangled safety and capabilities research are. On one end of the spectrum, they are completely orthogonal: we can fully solve AI Safety without ever making any progress on AI capabilities. On the other end of the spectrum, they are completely aligned: figuring out how to make an AI that successfully follows, say, humanity’s coherent extrapolated volition, is the same problem as figuring out how to make an AI that successfully performs any other arbitrarily-complicated real world task.
It seems likely to me that the truth is in the middle of the spectrum, but it’s not entirely obvious where on the spectrum it lies. I think it’s highly unlikely pure theoretical work is sufficient to guarantee safe AI, because without advancements in capabilities, we are very unlikely to have an accurate enough model about what AGI actually is to be able to do any useful safety work on it. On the flip side, I doubt AI safety is as simple as making a sufficiently smart AI and just telling it “please be safe, you know what I mean by that right?”.
The degree of entanglement dictates the degree to which capabilities research is intrinsically bad. If we live in a world close to the orthogonal end of the spectrum, it’s relatively easy to distinguish “good research” from “bad research”, and if you do bad research, you are a bad person, shame on you, let’s create social / legal incentives to stop it.
On the flip side, if we live in a world closer to the convergent end of the spectrum, where making strides on AI safety problems is closely tied to advances in capabilities, then some capability research will be necessary, and demonizing capability research will lead to the perverse outcome that it’s mostly done by people who don’t care about safety, whereas we’d prefer a world where the leading AI labs are staffed with brilliant, highly-safety-conscious safety/capabilities researchers who push the frontier in both directions.
It’s outside the scope of this comment for me to weigh in on which end of the spectrum we live on, but I think that’s real crux of this debate, and why smart people in good conscience can fling themselves into capabilities research: their world model is different than AI safety researchers coming from a very orthogonal perspective (my understanding of MIRI’s original philosophy is that they believe that there’s a tremendous amount of safety research that can be done in the absence of capabilities advances).