I kind of want you to get quantitative here? Like pretty much every action we take has some effect on AI timelines, but I think effect-on-AI-timelines is often swamped by other considerations (like effects on attitudes around those who will be developing AI).
Of course it’s prima facie more plausible that the most important effect of AI research is the effect on timelines, but I’m actually still kind of sceptical. On my picture, I think a key variable is the length of time between when-we-understand-the-basic-shape-of-things-that-will-get-to-AGI and when-it-reaches-strong-superintelligence. Each doubling of that length of time feels to me like it could be worth order of 0.5-1% of the future. Keeping implemented-systems close to the technological-frontier-of-what’s-possible could help with this, and may be more affectable than the
Note that I don’t think this really factors into an argument in terms of “advancing alignment” vs “aligning capabilities” (I agree that if “alignment” is understood abstractly the work usually doesn’t add too much to that). It’s more like a DTD argument about different types of advancing capabilities.
I think it’s unfortunate if that strategy looks actively bad on your worldview. But if you want to persuade people not to do it, I think you either need to persuade them of the whole case for your worldview (for which I’ve appreciated your discussion of the sharp left turn), or to explain not just that you think this is bad, but also how big a deal do you think it is. Is this something your model cares about enough to trade for in some kind of grand inter-worldview bargaining? I’m not sure. I kind of think it shouldn’t be (that relative to the size of ask it is, you’d get a much bigger benefit from someone starting to work on things you cared about than stopping this type of capabilities research), but I think it’s pretty likely I couldn’t pass your ITT here.
On my picture, I think a key variable is the length of time between when-we-understand-the-basic-shape-of-things-that-will-get-to-AGI and when-it-reaches-strong-superintelligence. Each doubling of that length of time feels to me like it could be worth order of 0.5-1% of the future.
Amusingly, I expect that each doubling of that time is negative EV. Because that time is very likely negative.
Question, are you assuming time travel or acausality being a thing in the next 20-30 years due to FTL work? Because that’s the only way that time from AGI understanding to superintelligence is negative at all.
No, I expect (absent agent foundations advances) people will build superintelligence before they understand the basic shape of things of which that AGI will consist. An illustrative example (though I don’t think this exact thing will happen): if the first superintelligence popped out of a genetic algorithm, then people would probably have no idea what pieces went into the thing by the time it exists.
On my picture, I think a key variable is the length of time between when-we-understand-the-basic-shape-of-things-that-will-get-to-AGI and when-it-reaches-strong-superintelligence.
I don’t understand why you think the sort of capabilities research done by alignment-conscious people contributes to lengthening this time. In particular, what reason do you have to think they’re not advancing the second time point as much as the first? Could you spell that out more explicitly?
Don’t we already understand the basic shape of things that will get to AGI? Seems plausible that AGI will consist of massive transformer models within some RLHF approach. But maybe you’re looking for something more specific than that.
I kind of want you to get quantitative here? Like pretty much every action we take has some effect on AI timelines, but I think effect-on-AI-timelines is often swamped by other considerations (like effects on attitudes around those who will be developing AI).
Of course it’s prima facie more plausible that the most important effect of AI research is the effect on timelines, but I’m actually still kind of sceptical. On my picture, I think a key variable is the length of time between when-we-understand-the-basic-shape-of-things-that-will-get-to-AGI and when-it-reaches-strong-superintelligence. Each doubling of that length of time feels to me like it could be worth order of 0.5-1% of the future. Keeping implemented-systems close to the technological-frontier-of-what’s-possible could help with this, and may be more affectable than the
Note that I don’t think this really factors into an argument in terms of “advancing alignment” vs “aligning capabilities” (I agree that if “alignment” is understood abstractly the work usually doesn’t add too much to that). It’s more like a DTD argument about different types of advancing capabilities.
I think it’s unfortunate if that strategy looks actively bad on your worldview. But if you want to persuade people not to do it, I think you either need to persuade them of the whole case for your worldview (for which I’ve appreciated your discussion of the sharp left turn), or to explain not just that you think this is bad, but also how big a deal do you think it is. Is this something your model cares about enough to trade for in some kind of grand inter-worldview bargaining? I’m not sure. I kind of think it shouldn’t be (that relative to the size of ask it is, you’d get a much bigger benefit from someone starting to work on things you cared about than stopping this type of capabilities research), but I think it’s pretty likely I couldn’t pass your ITT here.
Amusingly, I expect that each doubling of that time is negative EV. Because that time is very likely negative.
Question, are you assuming time travel or acausality being a thing in the next 20-30 years due to FTL work? Because that’s the only way that time from AGI understanding to superintelligence is negative at all.
No, I expect (absent agent foundations advances) people will build superintelligence before they understand the basic shape of things of which that AGI will consist. An illustrative example (though I don’t think this exact thing will happen): if the first superintelligence popped out of a genetic algorithm, then people would probably have no idea what pieces went into the thing by the time it exists.
I don’t understand why you think the sort of capabilities research done by alignment-conscious people contributes to lengthening this time. In particular, what reason do you have to think they’re not advancing the second time point as much as the first? Could you spell that out more explicitly?
Don’t we already understand the basic shape of things that will get to AGI? Seems plausible that AGI will consist of massive transformer models within some RLHF approach. But maybe you’re looking for something more specific than that.