I’m a bit skeptical of the “these are options that pay off if alignment is harder than my median” story. The way I currently see things going is:
a slow takeoff makes alignment MUCH, MUCH easier [edit: if we get one, I’m uncertain and think the correct position from the current state of evidence is uncertainty]
as a result, all prominent approaches lean very hard on slow takeoff
there is uncertainty about takeoff speed, but folks have mostly given up on reducing this uncertainty
I suspect that even if we have a bunch of good agent foundations research getting done, the result is that we just blast ahead with methods that are many times easier because they lean on slow takeoff, and if takeoff is slow we’re probably fine if it’s fast we die.
Ways that could not happen:
Work of the of the form “here are ways we could notice we are in a fast takeoff world before actually getting slammed” produces evidence compelling enough to pause, or cause leading labs to discard plans that rely on slow takeoff
agent foundations research aiming to do alignment in faster takeoff worlds finds a method so good it works better than current slow takeoff tailored methods even in the slow takeoff case, and labs pivot to this method
Both strike me as pretty unlikely. TBC this doesn’t mean those types of work are bad, I’m saying low probability not necessarily low margins
Reminder that you have a moral obligation, every single time you’re communicating an overall justification of alignment work premised on slow takeoff, in a context where you can spare two sentences without unreasonable cost, to say out loud something to the effect of “Oh and by the way, just so you know, the causal reason I’m talking about this work is that it seems tractable, and the causal reason is not that this work matters.”. If you don’t, you’re spraying your [slipping sideways out of reality] on everyone else.
I’m on board with communicating the premises of the path to impact of your research when you can. I think more people doing that would’ve saved me a lot of confusion. I think your particular phrasing is a bit unfair to the slow takeoff camp but clearly you didn’t mean it to read neutrally, which is a choice you’re allowed to make.
I wouldn’t describe my intention in this comment as communicating a justification of alignment work based on slow takeoff? I’m currently very uncertain about takeoff speeds and my work personally is in the weird limbo of not being premised on either fast or slow scenarios.
I think James was implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose. I agree that he could have made it clearer, but I think he’s made it clear enough given the following line:
I suspect that even if we have a bunch of good agent foundations research getting done, the result is that we just blast ahead with methods that are many times easier because they lean on slow takeoff, and if takeoff is slow we’re probably fine if it’s fast we die.
And as for your last sentence:
If you don’t, you’re spraying your [slipping sideways out of reality] on everyone else.
It depends on the intended audience of your communication. James here very likely implicitly modeled his audience as people who’d comprehend what he was pointing at without having to explicitly say the caveats you list.
I’d prefer you ask why people think the way they do instead of ranting to them about ‘moral obligations’ and insinuating that they are ‘slipping sideways out of reality’.
IDK how to understand your comment as referring to mine.
I’m familiar with how Eliezer uses the term. I was more pointing to the move of saying something like “You are [slipping sideways out of reality], and this is bad! Stop it!” I don’t think this usually results in the person, especially confused people, reflecting and trying to be more skilled at epistemology and communication.
In fact, there’s a loopy thing here where you expect someone who is ‘slipping sideways out of reality’ to caveat their communications with an explicit disclaimer that admits that they are doing so. It seems very unlikely to me that we’ll see such behavior. Either the person has confusion and uncertainty and is usually trying to honestly communicate their uncertainty (which is different from ‘slipping sideways’), or the person would disagree that they are ‘slipping sideways’ and claim (implicitly and explicitly) that what they are doing is tractable / matters.
I’m not sure exactly what mesa is saying here, but insofar as “implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose” means “intending to communicate from a position of uncertainty about takeoff speeds” I think he has me right.
I do think mesa is familiar enough with how I talk that the fact he found this unclear suggests it was my mistake. Good to know for future.
I think you might have implicitly assumed that my main crux here is whether or not take-off will be fast. I actually feel this is less decision-relevant for me than the other cruxes I listed, such as time-to-AGI or “sharp left turns.” If take-off is fast, AI alignment/control does seem much harder and I’m honestly not sure what research is most effective; maybe attempts at reflectively stable or provable single-shot alignment seem crucial, or maybe we should just do the same stuff faster? I’m curious: what current AI safety research do you consider most impactful in fast take-off worlds?
To me, agent foundations research seems most useful in worlds where:
There is an AGI winter and we have time to do highly reliable agent design; or
We build alignment MVPs, institute a moratorium on superintelligence, and task the AIs to solve superintelligence alignment (quickly), possibly building off existent agent foundations work. In this world, existing agent foundations work helps human overseers ground and evaluate AI output.
Ah, didn’t mean to attribute the takeoff speed crux to you, that’s my own opinion.
I’m not sure what’s best in fast takeoff worlds. My message is mainly just that getting weak AGI to solve alignment for you doesn’t work in a fast takeoff.
“AGI winter” and “overseeing alignment work done by AI” do both strike me as scenarios where agent foundations work is more useful than in the scenario I thought you were picturing. I think #1 still has a problem, but #2 is probably the argument for agent foundations work I currently find most persuasive.
In the moratorium case we suddenly get much more time than we thought we had, which enables longer payback time plans. Seems like we should hold off on working on the longer payback time plans until we know we have that time, not while it still seems likely that the decisive period is soon.
Having more human agent foundations expertise to better oversee agent foundations work done by AI seems good. How good it is depends on a few things. How much of the work that needs to be done is conceptual breakthroughs (tall) vs schlep with existing concepts (wide)? How quickly does our ability to oversee fall off for concepts more advanced than what we’ve developed so far? These seem to me like the main ones, and like very hard questions to get certainty on—I think that uncertainty makes me hesitant to bet on this value prop, but again, it’s the one I think is best.
Nice post, glad you wrote up your thinking here.
I’m a bit skeptical of the “these are options that pay off if alignment is harder than my median” story. The way I currently see things going is:
a slow takeoff makes alignment MUCH, MUCH easier [edit: if we get one, I’m uncertain and think the correct position from the current state of evidence is uncertainty]
as a result, all prominent approaches lean very hard on slow takeoff
there is uncertainty about takeoff speed, but folks have mostly given up on reducing this uncertainty
I suspect that even if we have a bunch of good agent foundations research getting done, the result is that we just blast ahead with methods that are many times easier because they lean on slow takeoff, and if takeoff is slow we’re probably fine if it’s fast we die.
Ways that could not happen:
Work of the of the form “here are ways we could notice we are in a fast takeoff world before actually getting slammed” produces evidence compelling enough to pause, or cause leading labs to discard plans that rely on slow takeoff
agent foundations research aiming to do alignment in faster takeoff worlds finds a method so good it works better than current slow takeoff tailored methods even in the slow takeoff case, and labs pivot to this method
Both strike me as pretty unlikely. TBC this doesn’t mean those types of work are bad, I’m saying low probability not necessarily low margins
Reminder that you have a moral obligation, every single time you’re communicating an overall justification of alignment work premised on slow takeoff, in a context where you can spare two sentences without unreasonable cost, to say out loud something to the effect of “Oh and by the way, just so you know, the causal reason I’m talking about this work is that it seems tractable, and the causal reason is not that this work matters.”. If you don’t, you’re spraying your [slipping sideways out of reality] on everyone else.
I’m on board with communicating the premises of the path to impact of your research when you can. I think more people doing that would’ve saved me a lot of confusion. I think your particular phrasing is a bit unfair to the slow takeoff camp but clearly you didn’t mean it to read neutrally, which is a choice you’re allowed to make.
I wouldn’t describe my intention in this comment as communicating a justification of alignment work based on slow takeoff? I’m currently very uncertain about takeoff speeds and my work personally is in the weird limbo of not being premised on either fast or slow scenarios.
I didn’t take you to be doing so—it’s a reminder for the future.
o7
I think James was implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose. I agree that he could have made it clearer, but I think he’s made it clear enough given the following line:
And as for your last sentence:
It depends on the intended audience of your communication. James here very likely implicitly modeled his audience as people who’d comprehend what he was pointing at without having to explicitly say the caveats you list.
I’d prefer you ask why people think the way they do instead of ranting to them about ‘moral obligations’ and insinuating that they are ‘slipping sideways out of reality’.
IDK how to understand your comment as referring to mine. To clarify the “slipping sideways” thing, I’m alluding to “stepping sideways” described in Q2 here: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy#Q2___I_have_a_clever_scheme_for_saving_the_world___I_should_act_as_if_I_believe_it_will_work_and_save_everyone__right__even_if_there_s_arguments_that_it_s_almost_certainly_misguided_and_doomed___Because_if_those_arguments_are_correct_and_my_scheme_can_t_work__we_re_all_dead_anyways__right_
and from
https://www.lesswrong.com/posts/m6dLwGbAGtAYMHsda/epistemic-slipperiness-1#Subtly_Bad_Jokes_and_Slipping_Sideways
I’m familiar with how Eliezer uses the term. I was more pointing to the move of saying something like “You are [slipping sideways out of reality], and this is bad! Stop it!” I don’t think this usually results in the person, especially confused people, reflecting and trying to be more skilled at epistemology and communication.
In fact, there’s a loopy thing here where you expect someone who is ‘slipping sideways out of reality’ to caveat their communications with an explicit disclaimer that admits that they are doing so. It seems very unlikely to me that we’ll see such behavior. Either the person has confusion and uncertainty and is usually trying to honestly communicate their uncertainty (which is different from ‘slipping sideways’), or the person would disagree that they are ‘slipping sideways’ and claim (implicitly and explicitly) that what they are doing is tractable / matters.
Excuse me, none of that is in my comment.
I’m not sure exactly what mesa is saying here, but insofar as “implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose” means “intending to communicate from a position of uncertainty about takeoff speeds” I think he has me right.
I do think mesa is familiar enough with how I talk that the fact he found this unclear suggests it was my mistake. Good to know for future.
Cheers!
I think you might have implicitly assumed that my main crux here is whether or not take-off will be fast. I actually feel this is less decision-relevant for me than the other cruxes I listed, such as time-to-AGI or “sharp left turns.” If take-off is fast, AI alignment/control does seem much harder and I’m honestly not sure what research is most effective; maybe attempts at reflectively stable or provable single-shot alignment seem crucial, or maybe we should just do the same stuff faster? I’m curious: what current AI safety research do you consider most impactful in fast take-off worlds?
To me, agent foundations research seems most useful in worlds where:
There is an AGI winter and we have time to do highly reliable agent design; or
We build alignment MVPs, institute a moratorium on superintelligence, and task the AIs to solve superintelligence alignment (quickly), possibly building off existent agent foundations work. In this world, existing agent foundations work helps human overseers ground and evaluate AI output.
Ah, didn’t mean to attribute the takeoff speed crux to you, that’s my own opinion.
I’m not sure what’s best in fast takeoff worlds. My message is mainly just that getting weak AGI to solve alignment for you doesn’t work in a fast takeoff.
“AGI winter” and “overseeing alignment work done by AI” do both strike me as scenarios where agent foundations work is more useful than in the scenario I thought you were picturing. I think #1 still has a problem, but #2 is probably the argument for agent foundations work I currently find most persuasive.
In the moratorium case we suddenly get much more time than we thought we had, which enables longer payback time plans. Seems like we should hold off on working on the longer payback time plans until we know we have that time, not while it still seems likely that the decisive period is soon.
Having more human agent foundations expertise to better oversee agent foundations work done by AI seems good. How good it is depends on a few things. How much of the work that needs to be done is conceptual breakthroughs (tall) vs schlep with existing concepts (wide)? How quickly does our ability to oversee fall off for concepts more advanced than what we’ve developed so far? These seem to me like the main ones, and like very hard questions to get certainty on—I think that uncertainty makes me hesitant to bet on this value prop, but again, it’s the one I think is best.