If that is the case, then I would very much like them to publicize the details for why they think other approaches are doomed. When Yudkowsky has talked about it in the past, it tends to be in the form of single-sentence statements pointing towards past writing on general cognitive fallacies. For him I’m sure that would be enough of a hint to clearly see why strategy x fits that fallacy and will therefore fail, but as a reader, it doesn’t give me much insight as to why such a project is doomed, rather than just potentially flawed.
(Sorry if this doesn’t make sense btw, I’m really tired and am not sure I’m thinking straight atm)
One rationale for not spelling things out in more detail is an expectation that anyone capable of solving a significant chunk of the problem will need to be able to notice such drawbacks themselves. If getting to a partial solution will require a researcher to notice over a hundred failure modes along the path, then Eliezer’s spelling out the first ten may not help—it may deny them the opportunity to reason things through themselves and learn. (I imagine that in reality time constraints are playing a significant role too)
I do think there’s something in this, but it strikes me that there are likely more efficient approaches worth looking for (particularly when it comes to people who want/need to understand the alignment research landscape, but aren’t themselves planning to work in technical alignment).
Quite a lot depends on how hard we think it is to navigate through potential-partial-alignment-solution space. If it were a low dimensional space or workable solutions were dense, one could imagine finding solutions by throwing a few thousand ants at a promising part of the space and letting them find the sugar.
Since the dimensionality is high, and solutions not dense, I think there’s a reasonable case that the bar on individual navigation skill is much higher (hence the emphasis on rationality).
If that is the case, then I would very much like them to publicize the details for why they think other approaches are doomed. When Yudkowsky has talked about it in the past, it tends to be in the form of single-sentence statements pointing towards past writing on general cognitive fallacies. For him I’m sure that would be enough of a hint to clearly see why strategy x fits that fallacy and will therefore fail, but as a reader, it doesn’t give me much insight as to why such a project is doomed, rather than just potentially flawed. (Sorry if this doesn’t make sense btw, I’m really tired and am not sure I’m thinking straight atm)
I think this would probably be helpful.
One rationale for not spelling things out in more detail is an expectation that anyone capable of solving a significant chunk of the problem will need to be able to notice such drawbacks themselves. If getting to a partial solution will require a researcher to notice over a hundred failure modes along the path, then Eliezer’s spelling out the first ten may not help—it may deny them the opportunity to reason things through themselves and learn. (I imagine that in reality time constraints are playing a significant role too)
I do think there’s something in this, but it strikes me that there are likely more efficient approaches worth looking for (particularly when it comes to people who want/need to understand the alignment research landscape, but aren’t themselves planning to work in technical alignment).
Quite a lot depends on how hard we think it is to navigate through potential-partial-alignment-solution space. If it were a low dimensional space or workable solutions were dense, one could imagine finding solutions by throwing a few thousand ants at a promising part of the space and letting them find the sugar.
Since the dimensionality is high, and solutions not dense, I think there’s a reasonable case that the bar on individual navigation skill is much higher (hence the emphasis on rationality).