I don’t buy the argument that safety researchers have unusually good ideas/research compared to capability researchers at top labs
I don’t think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.
That said, if someone hasn’t thought at all about concepts like “differentially advancing safety” or “capabilities externalities,” then reading this post would probably be helpful, and I’d endorse thinking about those issues.
That’s a lot of what I intend to do with this post, yes. I think a lot of people do not think about the impact of publishing very much and just blurt-out/publish things as a default action, and I would like them to think about their actions more.
I don’t think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.
There currently seems to be >10x as many people directly trying to build AGI/improve capabilities as trying to improve safety.
Suppose that the safety people have as good ideas and research ability as the capabilities people. (As a simplifying assumption.)
Then, if all the safety people switched to working full time on maximally advancing capabilities, this would only advance capabilites by less than 10%.
If, on the other hand, they stopped publically publishing safety work and this resulted in a 50% slow down, all safety work would slow down by 50%.
Naively, it seems very hard for publishing less to make sense if the number of safety researchers is much smaller than the number of capabilities researchers and safety researchers aren’t much better at capabilities than capabilities researchers.
But safety research can actually disproportionally help capabilities, e.g. the development of RLHF allowed OAI to turn their weird text predictors into a very generally useful product.
That said, I agree that if indeed safety researchers produce (highly counterfactual) research advances that are much more effective at increasing the profitability and capability of AIs than the research advances done by people directly optimizing for profitability and capability, then safety researchers could substantially speed up timelines. (In other words, if safety targeted research is better at profit and capabilities than research which is directly targeted at these aims.)
I dispute this being true.
(I do think it’s plausible that safety interested people have historically substantially advanced timelines (and might continue to do so to some extent now), but not via doing research targeted at improving safety, by just directly doing capabilities research for various reasons.)
Ryan, this is kind of a side-note but I notice that you have a very Paul-like approach to arguments and replies on LW.
Two things that come to notice:
You have a tendency to reply to certain posts or comments with “I don’t quite understand what is being said here, and I disagree with it.” or, “It doesn’t track with my views”, or equivalent replies that seem not very useful for understanding your object level arguments. (Although I notice that in the recent comments I see, you usually postfix it with some elaboration on your model.)
In the comment I’m replying to, you use a strategy of black-box-like abstraction modeling of a situation to try to argue for a conclusion, one that usually involves numbers such as multipliers or percentages. (I have the impression that Paul uses this a lot, and one concrete example that comes to mind is the takeoff speeds essay. I usually consider such arguments invalid when they seem to throw away information we already have, or seem to use a set of abstractions that don’t particularly feel appropriate to the information I believe we have.
I just found this interesting and plausible enough to highlight to you. Its a moderate investment of my time to find out examples from your comment history to highlight all these instances, but writing this comment still seemed valuable.
I don’t think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.
That’s a lot of what I intend to do with this post, yes. I think a lot of people do not think about the impact of publishing very much and just blurt-out/publish things as a default action, and I would like them to think about their actions more.
There currently seems to be >10x as many people directly trying to build AGI/improve capabilities as trying to improve safety.
Suppose that the safety people have as good ideas and research ability as the capabilities people. (As a simplifying assumption.)
Then, if all the safety people switched to working full time on maximally advancing capabilities, this would only advance capabilites by less than 10%.
If, on the other hand, they stopped publically publishing safety work and this resulted in a 50% slow down, all safety work would slow down by 50%.
Naively, it seems very hard for publishing less to make sense if the number of safety researchers is much smaller than the number of capabilities researchers and safety researchers aren’t much better at capabilities than capabilities researchers.
But safety research can actually disproportionally help capabilities, e.g. the development of RLHF allowed OAI to turn their weird text predictors into a very generally useful product.
I’m skeptical of the RLHF example (see also this post by Paul on the topic).
That said, I agree that if indeed safety researchers produce (highly counterfactual) research advances that are much more effective at increasing the profitability and capability of AIs than the research advances done by people directly optimizing for profitability and capability, then safety researchers could substantially speed up timelines. (In other words, if safety targeted research is better at profit and capabilities than research which is directly targeted at these aims.)
I dispute this being true.
(I do think it’s plausible that safety interested people have historically substantially advanced timelines (and might continue to do so to some extent now), but not via doing research targeted at improving safety, by just directly doing capabilities research for various reasons.)
Ryan, this is kind of a side-note but I notice that you have a very Paul-like approach to arguments and replies on LW.
Two things that come to notice:
You have a tendency to reply to certain posts or comments with “I don’t quite understand what is being said here, and I disagree with it.” or, “It doesn’t track with my views”, or equivalent replies that seem not very useful for understanding your object level arguments. (Although I notice that in the recent comments I see, you usually postfix it with some elaboration on your model.)
In the comment I’m replying to, you use a strategy of black-box-like abstraction modeling of a situation to try to argue for a conclusion, one that usually involves numbers such as multipliers or percentages. (I have the impression that Paul uses this a lot, and one concrete example that comes to mind is the takeoff speeds essay. I usually consider such arguments invalid when they seem to throw away information we already have, or seem to use a set of abstractions that don’t particularly feel appropriate to the information I believe we have.
I just found this interesting and plausible enough to highlight to you. Its a moderate investment of my time to find out examples from your comment history to highlight all these instances, but writing this comment still seemed valuable.
Note that I agree with your sentiment here, although my concrete argument is basically what LawrenceC wrote as a reply to this post.