Yeah, to be clear, acceleration of capabilities is a major reason why I expect public funding would be net negative, rather than just much closer to zero impact than naive multiplication would suggest.
Ignoring the capabilities issue, I think there’s lots of room for uncertainty about whether a big injection of “blind funding” would be net positive, for the reasons explained above. I think we should be pretty confident that the results would be an OOM or more less positive than the naive multiplication suggests, but that’s still not the same as “net negative”; the net positivity/negativity I see as much more uncertain (ignoring capabilities impact).
Accounting for capabilities impact, I think the net impact would be pretty robustly negative.
That might be true for theoretical AI alignment research but I’d imagine it’s less of a problem for types of AI alignment research that have decent feedback loops like interpretability research and other kinds of empirical research like experiments on RL agents.
(Which is not to say that e.g. interpretability research isn’t useful—we can often get great feedback loops on things which provide a useful foundation for the hard parts later on. The point is that, if the field as a whole streetlights on things with good feedback loops, it will end up ignoring the most dangerous things.)
Yeah, to be clear, acceleration of capabilities is a major reason why I expect public funding would be net negative, rather than just much closer to zero impact than naive multiplication would suggest.
Ignoring the capabilities issue, I think there’s lots of room for uncertainty about whether a big injection of “blind funding” would be net positive, for the reasons explained above. I think we should be pretty confident that the results would be an OOM or more less positive than the naive multiplication suggests, but that’s still not the same as “net negative”; the net positivity/negativity I see as much more uncertain (ignoring capabilities impact).
Accounting for capabilities impact, I think the net impact would be pretty robustly negative.
The parts where the bad feedback loops are, are exactly the places where the things-which-might-actually-kill-us are. Things we can see coming are exactly the things which don’t particularly need research to stop, and the fact that we can see them is exactly what makes the feedback loops good. It is not an accident that the feedback loop problem is unusually severe for the field of alignment in particular.
(Which is not to say that e.g. interpretability research isn’t useful—we can often get great feedback loops on things which provide a useful foundation for the hard parts later on. The point is that, if the field as a whole streetlights on things with good feedback loops, it will end up ignoring the most dangerous things.)