I think you might have implicitly assumed that my main crux here is whether or not take-off will be fast. I actually feel this is less decision-relevant for me than the other cruxes I listed, such as time-to-AGI or “sharp left turns.” If take-off is fast, AI alignment/control does seem much harder and I’m honestly not sure what research is most effective; maybe attempts at reflectively stable or provable single-shot alignment seem crucial, or maybe we should just do the same stuff faster? I’m curious: what current AI safety research do you consider most impactful in fast take-off worlds?
To me, agent foundations research seems most useful in worlds where:
There is an AGI winter and we have time to do highly reliable agent design; or
We build alignment MVPs, institute a moratorium on superintelligence, and task the AIs to solve superintelligence alignment (quickly), possibly building off existent agent foundations work. In this world, existing agent foundations work helps human overseers ground and evaluate AI output.
Ah, didn’t mean to attribute the takeoff speed crux to you, that’s my own opinion.
I’m not sure what’s best in fast takeoff worlds. My message is mainly just that getting weak AGI to solve alignment for you doesn’t work in a fast takeoff.
“AGI winter” and “overseeing alignment work done by AI” do both strike me as scenarios where agent foundations work is more useful than in the scenario I thought you were picturing. I think #1 still has a problem, but #2 is probably the argument for agent foundations work I currently find most persuasive.
In the moratorium case we suddenly get much more time than we thought we had, which enables longer payback time plans. Seems like we should hold off on working on the longer payback time plans until we know we have that time, not while it still seems likely that the decisive period is soon.
Having more human agent foundations expertise to better oversee agent foundations work done by AI seems good. How good it is depends on a few things. How much of the work that needs to be done is conceptual breakthroughs (tall) vs schlep with existing concepts (wide)? How quickly does our ability to oversee fall off for concepts more advanced than what we’ve developed so far? These seem to me like the main ones, and like very hard questions to get certainty on—I think that uncertainty makes me hesitant to bet on this value prop, but again, it’s the one I think is best.
Cheers!
I think you might have implicitly assumed that my main crux here is whether or not take-off will be fast. I actually feel this is less decision-relevant for me than the other cruxes I listed, such as time-to-AGI or “sharp left turns.” If take-off is fast, AI alignment/control does seem much harder and I’m honestly not sure what research is most effective; maybe attempts at reflectively stable or provable single-shot alignment seem crucial, or maybe we should just do the same stuff faster? I’m curious: what current AI safety research do you consider most impactful in fast take-off worlds?
To me, agent foundations research seems most useful in worlds where:
There is an AGI winter and we have time to do highly reliable agent design; or
We build alignment MVPs, institute a moratorium on superintelligence, and task the AIs to solve superintelligence alignment (quickly), possibly building off existent agent foundations work. In this world, existing agent foundations work helps human overseers ground and evaluate AI output.
Ah, didn’t mean to attribute the takeoff speed crux to you, that’s my own opinion.
I’m not sure what’s best in fast takeoff worlds. My message is mainly just that getting weak AGI to solve alignment for you doesn’t work in a fast takeoff.
“AGI winter” and “overseeing alignment work done by AI” do both strike me as scenarios where agent foundations work is more useful than in the scenario I thought you were picturing. I think #1 still has a problem, but #2 is probably the argument for agent foundations work I currently find most persuasive.
In the moratorium case we suddenly get much more time than we thought we had, which enables longer payback time plans. Seems like we should hold off on working on the longer payback time plans until we know we have that time, not while it still seems likely that the decisive period is soon.
Having more human agent foundations expertise to better oversee agent foundations work done by AI seems good. How good it is depends on a few things. How much of the work that needs to be done is conceptual breakthroughs (tall) vs schlep with existing concepts (wide)? How quickly does our ability to oversee fall off for concepts more advanced than what we’ve developed so far? These seem to me like the main ones, and like very hard questions to get certainty on—I think that uncertainty makes me hesitant to bet on this value prop, but again, it’s the one I think is best.