I think the status quo around publishing safety research is mostly fine (though being a bit more careful seems good); more confidently, I think going as far as the vibe of this post suggests would be bad.
Some possible cruxes, or reasons the post basically didn’t move my view on that:
Most importantly, I think the research published by people working on x-risk tends to overall help safety/alignment more than capabilities.
I suspect the main disagreement might be what kind of research is needed to make AI go well, and whether the research currently happening helps.
Probably less importantly, I disagree a bit about how helpful that research likely is for advancing capabilities. In particular, I don’t buy the argument that safety researchers have unusually good ideas/research compared to capability researchers at top labs (part of this is that my impression is capabilities aren’t mainly bottlenecked by ideas, though of course sufficiently good ideas would help).
It’s getting harder to draw the boundary since people use “safety” or “alignment” for a lot of things now. So, to be clear, I’m talking about research published by people who think there are catastrophic risks from AI and care a lot about preventing those, it seems like that’s your target audience.
Secondarily, longer timelines are only helpful if useful things are happening, and I think if everyone working on x-risk stopped publishing their research, way fewer useful things would happen on the research side. Maybe the plan is to mostly use the additional time for policy interventions? I think that’s also complicated though (so far, visibly advancing capabilities have been one of the main things making policy progress feasible). Overall, I think more time would help, but it’s not clear how much and I’m not even totally sure about the sign (taking into account worries from hardware overhang).
I think there are more structural downsides to not publishing anything. E.g. that makes it much harder to get academia on board (and getting academia on board has been pretty important for policy as far as I can tell, and I think getting them even more on board would be pretty good). Not sure this is an actual crux though, if I thought the research that’s happening wasn’t helpful enough, this point would also be weaker.
I think most of these are pretty long-standing disagreements, and I don’t think the post really tries to argue its side of them, so my guess is it’s not going to convince the main people it would need to convince (who are currently publishing prosaic safety/alignment research). That said, if someone hasn’t thought at all about concepts like “differentially advancing safety” or “capabilities externalities,” then reading this post would probably be helpful, and I’d endorse thinking about those issues. And I agree that some of the “But …” objections you list are pretty weak.
I don’t buy the argument that safety researchers have unusually good ideas/research compared to capability researchers at top labs
I don’t think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.
That said, if someone hasn’t thought at all about concepts like “differentially advancing safety” or “capabilities externalities,” then reading this post would probably be helpful, and I’d endorse thinking about those issues.
That’s a lot of what I intend to do with this post, yes. I think a lot of people do not think about the impact of publishing very much and just blurt-out/publish things as a default action, and I would like them to think about their actions more.
I don’t think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.
There currently seems to be >10x as many people directly trying to build AGI/improve capabilities as trying to improve safety.
Suppose that the safety people have as good ideas and research ability as the capabilities people. (As a simplifying assumption.)
Then, if all the safety people switched to working full time on maximally advancing capabilities, this would only advance capabilites by less than 10%.
If, on the other hand, they stopped publically publishing safety work and this resulted in a 50% slow down, all safety work would slow down by 50%.
Naively, it seems very hard for publishing less to make sense if the number of safety researchers is much smaller than the number of capabilities researchers and safety researchers aren’t much better at capabilities than capabilities researchers.
But safety research can actually disproportionally help capabilities, e.g. the development of RLHF allowed OAI to turn their weird text predictors into a very generally useful product.
That said, I agree that if indeed safety researchers produce (highly counterfactual) research advances that are much more effective at increasing the profitability and capability of AIs than the research advances done by people directly optimizing for profitability and capability, then safety researchers could substantially speed up timelines. (In other words, if safety targeted research is better at profit and capabilities than research which is directly targeted at these aims.)
I dispute this being true.
(I do think it’s plausible that safety interested people have historically substantially advanced timelines (and might continue to do so to some extent now), but not via doing research targeted at improving safety, by just directly doing capabilities research for various reasons.)
Ryan, this is kind of a side-note but I notice that you have a very Paul-like approach to arguments and replies on LW.
Two things that come to notice:
You have a tendency to reply to certain posts or comments with “I don’t quite understand what is being said here, and I disagree with it.” or, “It doesn’t track with my views”, or equivalent replies that seem not very useful for understanding your object level arguments. (Although I notice that in the recent comments I see, you usually postfix it with some elaboration on your model.)
In the comment I’m replying to, you use a strategy of black-box-like abstraction modeling of a situation to try to argue for a conclusion, one that usually involves numbers such as multipliers or percentages. (I have the impression that Paul uses this a lot, and one concrete example that comes to mind is the takeoff speeds essay. I usually consider such arguments invalid when they seem to throw away information we already have, or seem to use a set of abstractions that don’t particularly feel appropriate to the information I believe we have.
I just found this interesting and plausible enough to highlight to you. Its a moderate investment of my time to find out examples from your comment history to highlight all these instances, but writing this comment still seemed valuable.
I think the status quo around publishing safety research is mostly fine (though being a bit more careful seems good); more confidently, I think going as far as the vibe of this post suggests would be bad.
Some possible cruxes, or reasons the post basically didn’t move my view on that:
Most importantly, I think the research published by people working on x-risk tends to overall help safety/alignment more than capabilities.
I suspect the main disagreement might be what kind of research is needed to make AI go well, and whether the research currently happening helps.
Probably less importantly, I disagree a bit about how helpful that research likely is for advancing capabilities. In particular, I don’t buy the argument that safety researchers have unusually good ideas/research compared to capability researchers at top labs (part of this is that my impression is capabilities aren’t mainly bottlenecked by ideas, though of course sufficiently good ideas would help).
It’s getting harder to draw the boundary since people use “safety” or “alignment” for a lot of things now. So, to be clear, I’m talking about research published by people who think there are catastrophic risks from AI and care a lot about preventing those, it seems like that’s your target audience.
Secondarily, longer timelines are only helpful if useful things are happening, and I think if everyone working on x-risk stopped publishing their research, way fewer useful things would happen on the research side. Maybe the plan is to mostly use the additional time for policy interventions? I think that’s also complicated though (so far, visibly advancing capabilities have been one of the main things making policy progress feasible). Overall, I think more time would help, but it’s not clear how much and I’m not even totally sure about the sign (taking into account worries from hardware overhang).
I think there are more structural downsides to not publishing anything. E.g. that makes it much harder to get academia on board (and getting academia on board has been pretty important for policy as far as I can tell, and I think getting them even more on board would be pretty good). Not sure this is an actual crux though, if I thought the research that’s happening wasn’t helpful enough, this point would also be weaker.
I think most of these are pretty long-standing disagreements, and I don’t think the post really tries to argue its side of them, so my guess is it’s not going to convince the main people it would need to convince (who are currently publishing prosaic safety/alignment research). That said, if someone hasn’t thought at all about concepts like “differentially advancing safety” or “capabilities externalities,” then reading this post would probably be helpful, and I’d endorse thinking about those issues. And I agree that some of the “But …” objections you list are pretty weak.
I don’t think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.
That’s a lot of what I intend to do with this post, yes. I think a lot of people do not think about the impact of publishing very much and just blurt-out/publish things as a default action, and I would like them to think about their actions more.
There currently seems to be >10x as many people directly trying to build AGI/improve capabilities as trying to improve safety.
Suppose that the safety people have as good ideas and research ability as the capabilities people. (As a simplifying assumption.)
Then, if all the safety people switched to working full time on maximally advancing capabilities, this would only advance capabilites by less than 10%.
If, on the other hand, they stopped publically publishing safety work and this resulted in a 50% slow down, all safety work would slow down by 50%.
Naively, it seems very hard for publishing less to make sense if the number of safety researchers is much smaller than the number of capabilities researchers and safety researchers aren’t much better at capabilities than capabilities researchers.
But safety research can actually disproportionally help capabilities, e.g. the development of RLHF allowed OAI to turn their weird text predictors into a very generally useful product.
I’m skeptical of the RLHF example (see also this post by Paul on the topic).
That said, I agree that if indeed safety researchers produce (highly counterfactual) research advances that are much more effective at increasing the profitability and capability of AIs than the research advances done by people directly optimizing for profitability and capability, then safety researchers could substantially speed up timelines. (In other words, if safety targeted research is better at profit and capabilities than research which is directly targeted at these aims.)
I dispute this being true.
(I do think it’s plausible that safety interested people have historically substantially advanced timelines (and might continue to do so to some extent now), but not via doing research targeted at improving safety, by just directly doing capabilities research for various reasons.)
Ryan, this is kind of a side-note but I notice that you have a very Paul-like approach to arguments and replies on LW.
Two things that come to notice:
You have a tendency to reply to certain posts or comments with “I don’t quite understand what is being said here, and I disagree with it.” or, “It doesn’t track with my views”, or equivalent replies that seem not very useful for understanding your object level arguments. (Although I notice that in the recent comments I see, you usually postfix it with some elaboration on your model.)
In the comment I’m replying to, you use a strategy of black-box-like abstraction modeling of a situation to try to argue for a conclusion, one that usually involves numbers such as multipliers or percentages. (I have the impression that Paul uses this a lot, and one concrete example that comes to mind is the takeoff speeds essay. I usually consider such arguments invalid when they seem to throw away information we already have, or seem to use a set of abstractions that don’t particularly feel appropriate to the information I believe we have.
I just found this interesting and plausible enough to highlight to you. Its a moderate investment of my time to find out examples from your comment history to highlight all these instances, but writing this comment still seemed valuable.
Note that I agree with your sentiment here, although my concrete argument is basically what LawrenceC wrote as a reply to this post.
It may be producing green nodes faster, but it seems on track to produce a red node before a yellow node.