My interpretation of MIRI is that they are ARE looking for alternatives and so far have not found any that don’t also seem doomed. E.g. they thought about trying to coordinate to ban or slow down AI capabilities research and concluded that it’s practically impossible since we can’t even ban gain-of-function research and that should be a lot easier. My interpretation of MIRI is that their recent public doomsaying is NOT aimed at getting people to just keep thinking harder about doomed AI alignment research agendas; rather, it is aimed at getting people to think outside the box and hopefully come up with a new plan that might actually work. (The recent April Fool’s post also served another function of warning against some common failure modes, e.g. the “slip sideways into fantasy world” failure mode where you start gambling on assumptions holding true and then end up doing this repeatedly and losing track of how increasingly unlikely the world you are planning for is.)
If that is the case, then I would very much like them to publicize the details for why they think other approaches are doomed. When Yudkowsky has talked about it in the past, it tends to be in the form of single-sentence statements pointing towards past writing on general cognitive fallacies. For him I’m sure that would be enough of a hint to clearly see why strategy x fits that fallacy and will therefore fail, but as a reader, it doesn’t give me much insight as to why such a project is doomed, rather than just potentially flawed.
(Sorry if this doesn’t make sense btw, I’m really tired and am not sure I’m thinking straight atm)
One rationale for not spelling things out in more detail is an expectation that anyone capable of solving a significant chunk of the problem will need to be able to notice such drawbacks themselves. If getting to a partial solution will require a researcher to notice over a hundred failure modes along the path, then Eliezer’s spelling out the first ten may not help—it may deny them the opportunity to reason things through themselves and learn. (I imagine that in reality time constraints are playing a significant role too)
I do think there’s something in this, but it strikes me that there are likely more efficient approaches worth looking for (particularly when it comes to people who want/need to understand the alignment research landscape, but aren’t themselves planning to work in technical alignment).
Quite a lot depends on how hard we think it is to navigate through potential-partial-alignment-solution space. If it were a low dimensional space or workable solutions were dense, one could imagine finding solutions by throwing a few thousand ants at a promising part of the space and letting them find the sugar.
Since the dimensionality is high, and solutions not dense, I think there’s a reasonable case that the bar on individual navigation skill is much higher (hence the emphasis on rationality).
My interpretation of MIRI is that they are ARE looking for alternatives and so far have not found any that don’t also seem doomed.
I mean, I’m also assuming something like this is true, probably, but it’s mostly based on “it seems like something they should do, and I ascribe a lot of competence to them”.
we can’t even ban gain-of-function research and that should be a lot easier
How much effort have we as a community put into banning gain of function vs. solving alignment? Given this, if, say, banning AGI research is 0.5 as hard as alignment (which would make it a great approach) and gain-of-function 0.1 as hard as banning AGI, would we have succeeded at a gain-of-function ban? I doubt it.
My interpretation of MIRI is that their recent public doomsaying is NOT aimed at getting people to just keep thinking harder about doomed AI alignment research agendas; rather, it is aimed at getting people to think outside the box and hopefully come up with a new plan that might actually work.
Idk, I skimmed the April Fool’s post again before submitting this, and I did not get that impression.
My interpretation of MIRI is that they are ARE looking for alternatives and so far have not found any that don’t also seem doomed. E.g. they thought about trying to coordinate to ban or slow down AI capabilities research and concluded that it’s practically impossible since we can’t even ban gain-of-function research and that should be a lot easier. My interpretation of MIRI is that their recent public doomsaying is NOT aimed at getting people to just keep thinking harder about doomed AI alignment research agendas; rather, it is aimed at getting people to think outside the box and hopefully come up with a new plan that might actually work. (The recent April Fool’s post also served another function of warning against some common failure modes, e.g. the “slip sideways into fantasy world” failure mode where you start gambling on assumptions holding true and then end up doing this repeatedly and losing track of how increasingly unlikely the world you are planning for is.)
If that is the case, then I would very much like them to publicize the details for why they think other approaches are doomed. When Yudkowsky has talked about it in the past, it tends to be in the form of single-sentence statements pointing towards past writing on general cognitive fallacies. For him I’m sure that would be enough of a hint to clearly see why strategy x fits that fallacy and will therefore fail, but as a reader, it doesn’t give me much insight as to why such a project is doomed, rather than just potentially flawed. (Sorry if this doesn’t make sense btw, I’m really tired and am not sure I’m thinking straight atm)
I think this would probably be helpful.
One rationale for not spelling things out in more detail is an expectation that anyone capable of solving a significant chunk of the problem will need to be able to notice such drawbacks themselves. If getting to a partial solution will require a researcher to notice over a hundred failure modes along the path, then Eliezer’s spelling out the first ten may not help—it may deny them the opportunity to reason things through themselves and learn. (I imagine that in reality time constraints are playing a significant role too)
I do think there’s something in this, but it strikes me that there are likely more efficient approaches worth looking for (particularly when it comes to people who want/need to understand the alignment research landscape, but aren’t themselves planning to work in technical alignment).
Quite a lot depends on how hard we think it is to navigate through potential-partial-alignment-solution space. If it were a low dimensional space or workable solutions were dense, one could imagine finding solutions by throwing a few thousand ants at a promising part of the space and letting them find the sugar.
Since the dimensionality is high, and solutions not dense, I think there’s a reasonable case that the bar on individual navigation skill is much higher (hence the emphasis on rationality).
I mean, I’m also assuming something like this is true, probably, but it’s mostly based on “it seems like something they should do, and I ascribe a lot of competence to them”.
How much effort have we as a community put into banning gain of function vs. solving alignment? Given this, if, say, banning AGI research is 0.5 as hard as alignment (which would make it a great approach) and gain-of-function 0.1 as hard as banning AGI, would we have succeeded at a gain-of-function ban? I doubt it.
Idk, I skimmed the April Fool’s post again before submitting this, and I did not get that impression.