One rationale for not spelling things out in more detail is an expectation that anyone capable of solving a significant chunk of the problem will need to be able to notice such drawbacks themselves. If getting to a partial solution will require a researcher to notice over a hundred failure modes along the path, then Eliezer’s spelling out the first ten may not help—it may deny them the opportunity to reason things through themselves and learn. (I imagine that in reality time constraints are playing a significant role too)
I do think there’s something in this, but it strikes me that there are likely more efficient approaches worth looking for (particularly when it comes to people who want/need to understand the alignment research landscape, but aren’t themselves planning to work in technical alignment).
Quite a lot depends on how hard we think it is to navigate through potential-partial-alignment-solution space. If it were a low dimensional space or workable solutions were dense, one could imagine finding solutions by throwing a few thousand ants at a promising part of the space and letting them find the sugar.
Since the dimensionality is high, and solutions not dense, I think there’s a reasonable case that the bar on individual navigation skill is much higher (hence the emphasis on rationality).
I think this would probably be helpful.
One rationale for not spelling things out in more detail is an expectation that anyone capable of solving a significant chunk of the problem will need to be able to notice such drawbacks themselves. If getting to a partial solution will require a researcher to notice over a hundred failure modes along the path, then Eliezer’s spelling out the first ten may not help—it may deny them the opportunity to reason things through themselves and learn. (I imagine that in reality time constraints are playing a significant role too)
I do think there’s something in this, but it strikes me that there are likely more efficient approaches worth looking for (particularly when it comes to people who want/need to understand the alignment research landscape, but aren’t themselves planning to work in technical alignment).
Quite a lot depends on how hard we think it is to navigate through potential-partial-alignment-solution space. If it were a low dimensional space or workable solutions were dense, one could imagine finding solutions by throwing a few thousand ants at a promising part of the space and letting them find the sugar.
Since the dimensionality is high, and solutions not dense, I think there’s a reasonable case that the bar on individual navigation skill is much higher (hence the emphasis on rationality).