It seems to me that the this argument only makes sense if we assume that “more capabilities research now” translates into “more gradual development of AGI”. That’s the real crux for me.
If that assumption is false, then accelerating capabilities is basically equivalent to having all the AI alignment and strategy researchers hibernate for some number N years, and then wake up and get back to work. And that, in turn, is strictly worse than having all the AI alignment and strategy researchers do what they can during the next N years, and also continue doing work after those N years have elapsed. I do agree that there is important alignment-related work that we can only do in the future, when AGI is closer. I don’t agree that there is nothing useful being done right now.
On the other hand, if that assumption is true (i.e. the assumption “more capabilities research now” translates into “more gradual development of AGI”), then there’s at least a chance that more capabilities research now would be net positive.
However, I don’t think the assumption is true—or at least, not to any appreciable extent. It would only be true if you thought that there was a different bottleneck to AGI besides capabilities research. You mention faster hardware, but my best guess is that we already have a massive hardware overhang—once we figure out AGI-capable algorithms, I believe we already have the hardware that would support superhuman-level AGI with quite modest amounts of money and chips. (Not everyone agrees with me.) You mention “neuroscience understanding”, but I would say that insofar as neuroscience understanding helps people invent AGI-capable learning algorithms, neuroscience understanding = capabilities research! (I actually think some types of neuroscience are mainly helpful for capabilities and other types are mainly helpful for safety, see here.) I imagine there being small bottlenecks that would add a few months today, but would only add a few weeks in a decade, e.g. future better CUDA compilers. But I don’t see any big bottlenecks, things that add years or decades, other than AGI capabilities research itself.
Even if the assumption is significantly true, I still would be surprised if more capabilities research now would be a good trade, because (1) I do think there’s a lot of very useful alignment work we can do right now (not to mention outreach, developing pedagogy, etc.), (2) the most valuable alignment work is work that informs differential technological development, i.e. work that tells us exactly what AGI capabilities work should be done at all, namely R&D that moves us down a path to maximally alignable AGI, but that’s only valuable to the extent that we figure things out before the wrong kind of capabilities research has already been completed. See Section 1.7 here.
I’m not sure how this desire works, but I don’t think you could train GPT to have it. It looks like some sort of theory of mind is involved in how the goal is defined.
I do think that would be valuable to know, and am very interested in that question myself, but I think that figuring it out is mostly a different type of research than AGI capabilities research—loosely speaking, what you’re talking about looks like “designing the right RL reward function”, whereas capabilities research mostly looks like “designing a good RL algorithm”—or so I claim, for reasons here and here.
In some sense I would think it’s almost tautologically true that faster capabilities research shortens the timeline in which alignment and strategy researchers do their own work.
It seems to me that the this argument only makes sense if we assume that “more capabilities research now” translates into “more gradual development of AGI”. That’s the real crux for me.
If that assumption is false, then accelerating capabilities is basically equivalent to having all the AI alignment and strategy researchers hibernate for some number N years, and then wake up and get back to work. And that, in turn, is strictly worse than having all the AI alignment and strategy researchers do what they can during the next N years, and also continue doing work after those N years have elapsed. I do agree that there is important alignment-related work that we can only do in the future, when AGI is closer. I don’t agree that there is nothing useful being done right now.
On the other hand, if that assumption is true (i.e. the assumption “more capabilities research now” translates into “more gradual development of AGI”), then there’s at least a chance that more capabilities research now would be net positive.
However, I don’t think the assumption is true—or at least, not to any appreciable extent. It would only be true if you thought that there was a different bottleneck to AGI besides capabilities research. You mention faster hardware, but my best guess is that we already have a massive hardware overhang—once we figure out AGI-capable algorithms, I believe we already have the hardware that would support superhuman-level AGI with quite modest amounts of money and chips. (Not everyone agrees with me.) You mention “neuroscience understanding”, but I would say that insofar as neuroscience understanding helps people invent AGI-capable learning algorithms, neuroscience understanding = capabilities research! (I actually think some types of neuroscience are mainly helpful for capabilities and other types are mainly helpful for safety, see here.) I imagine there being small bottlenecks that would add a few months today, but would only add a few weeks in a decade, e.g. future better CUDA compilers. But I don’t see any big bottlenecks, things that add years or decades, other than AGI capabilities research itself.
Even if the assumption is significantly true, I still would be surprised if more capabilities research now would be a good trade, because (1) I do think there’s a lot of very useful alignment work we can do right now (not to mention outreach, developing pedagogy, etc.), (2) the most valuable alignment work is work that informs differential technological development, i.e. work that tells us exactly what AGI capabilities work should be done at all, namely R&D that moves us down a path to maximally alignable AGI, but that’s only valuable to the extent that we figure things out before the wrong kind of capabilities research has already been completed. See Section 1.7 here.
I do think that would be valuable to know, and am very interested in that question myself, but I think that figuring it out is mostly a different type of research than AGI capabilities research—loosely speaking, what you’re talking about looks like “designing the right RL reward function”, whereas capabilities research mostly looks like “designing a good RL algorithm”—or so I claim, for reasons here and here.
In some sense I would think it’s almost tautologically true that faster capabilities research shortens the timeline in which alignment and strategy researchers do their own work.