I generally think that people interested in existential safety should want the deployment of AI to be slower and smoother, and that accelerating AI progress is generally bad.
The main interesting version of this tradeoff is when people specifically work on advancing AI in order to make it safer, either so that safety-conscious people have more influence or so that the rollout is more gradual. I think that’s a way bigger deal than capability externalities from safety work, and in my view it’s also closer to the actual margin of what’s net-positive vs net-negative.
I think that OpenAI provides altruistic justifications for accelerating AI development, but is probably net bad for existential safety (and also would likely be dispreferred by existing humans, though it depends a lot on how you elicit their preferences and integrate them with my empirical views). That said, I don’t think this case is totally obvious or a slam dunk, and I think that people interested in existential safety should be laying out their arguments (both to labs and to the broader world) and engaging productively rather than restating the conclusion louder and with stronger moral language.
I think that Anthropic’s work also accelerates AI arrival, but it is much easier for it to come out ahead on a cost-benefit: they have significantly smaller effects on acceleration, and a more credible case that they will be safer than alternative AI developers. I have significant unease about this kind of plan, partly for the kinds of reasons you list and also a broader set of moral intuitions. As a result it’s not something I would do personally. But I’ve spent some time thinking it through as best I can and it does seem like the expected impact is good. To the extent that people disagree I again think they should be actually making the argument and trying to understand the disagreement.
I think that Anthropic’s work also accelerates AI arrival, but it is much easier for it to come out ahead on a cost-benefit: they have significantly smaller effects on acceleration, and a more credible case that they will be safer than alternative AI developers. I have significant unease about this kind of plan, partly for the kinds of reasons you list and also a broader set of moral intuitions. As a result it’s not something I would do personally.
From the outside perspective of someone quite new to the AI safety field and with no contact with the Bay Area scene, the reasoning behind this plan is completely illegible to me. What is only visible instead is that they’re working ChatGPT-like systems and capabilities, as well as some empirical work on evaluations and interpretability.The only system more powerful than ChatGPT I’ve seen so far is the unnamed one behind Bing, and I’ve personally heard rumours that both Anthropic and OpenAI are already working on systems beyond ChatGPT/GPT-3.5 level.
1. Fully agree and we appreciate you stating that.
2. While we are concerned about capability externalities from safety work (that’s why we have an infohazard policy), what we are most concerned about, and that we cover in this post, is deliberate capabilities acceleration justified as being helpful to alignment. Or, to put this in reverse, using the notion that working on systems that are closer to being dangerous might be more fruitful for safety work, to justify actively pushing the capabilities frontier and thus accelerating the arrival of the dangers themselves.
3. We fully agree that engaging with arguments is good, this is why we’re writing this and other work, and we would love all relevant players to do so more. For example, we would love to hear a more detailed, more concrete story from OpenAI of why they believe accelerating AI development has an altruistic justification. We do appreciate that OpenAI and Jan Leike have published their own approach to AI alignment, even though we disagree with some of its contents, and we would strongly support all other players in the field doing the same.
4.
I think that Anthropic’s work also accelerates AI arrival, but it is much easier for it to come out ahead on a cost-benefit: they have significantly smaller effects on acceleration, and a more credible case that they will be safer than alternative AI developers. I have significant unease about this kind of plan, partly for the kinds of reasons you list and also a broader set of moral intuitions. As a result it’s not something I would do personally.
But I’ve spent some time thinking it through as best I can and it does seem like the expected impact is good.
We share your significant unease with such plans. But given what you say here, why at the same time you wouldn’t pursue this plan yourself, yet you say that it seems to you like the expected impact is good?
From our point of view, an unease-generating, AI arrival-accelerating plan seems pretty bad unless proven otherwise. It would be great for the field to hear the reasons why, despite these red flags, this is nevertheless a good plan.
And of course, it would be best to hear the reasoning about the plan directly from those who are pursuing it.
I think the best argument on “Accelerating capabilities is good”, is that it forces you to touch reality instead of theorizing, and given that iteration is good for other fields, we need to ask why we think AI safety is resistant to iterative solutions.
And this in a nutshell is why ML/AGI people can rationally increase capabilities: LW has a non-trivial chance of having broken epistemics, and AGI people do tend towards more selfish utility functions.
My overall take is:
I generally think that people interested in existential safety should want the deployment of AI to be slower and smoother, and that accelerating AI progress is generally bad.
The main interesting version of this tradeoff is when people specifically work on advancing AI in order to make it safer, either so that safety-conscious people have more influence or so that the rollout is more gradual. I think that’s a way bigger deal than capability externalities from safety work, and in my view it’s also closer to the actual margin of what’s net-positive vs net-negative.
I think that OpenAI provides altruistic justifications for accelerating AI development, but is probably net bad for existential safety (and also would likely be dispreferred by existing humans, though it depends a lot on how you elicit their preferences and integrate them with my empirical views). That said, I don’t think this case is totally obvious or a slam dunk, and I think that people interested in existential safety should be laying out their arguments (both to labs and to the broader world) and engaging productively rather than restating the conclusion louder and with stronger moral language.
I think that Anthropic’s work also accelerates AI arrival, but it is much easier for it to come out ahead on a cost-benefit: they have significantly smaller effects on acceleration, and a more credible case that they will be safer than alternative AI developers. I have significant unease about this kind of plan, partly for the kinds of reasons you list and also a broader set of moral intuitions. As a result it’s not something I would do personally. But I’ve spent some time thinking it through as best I can and it does seem like the expected impact is good. To the extent that people disagree I again think they should be actually making the argument and trying to understand the disagreement.
From the outside perspective of someone quite new to the AI safety field and with no contact with the Bay Area scene, the reasoning behind this plan is completely illegible to me. What is only visible instead is that they’re working ChatGPT-like systems and capabilities, as well as some empirical work on evaluations and interpretability.The only system more powerful than ChatGPT I’ve seen so far is the unnamed one behind Bing, and I’ve personally heard rumours that both Anthropic and OpenAI are already working on systems beyond ChatGPT/GPT-3.5 level.
1. Fully agree and we appreciate you stating that.
2. While we are concerned about capability externalities from safety work (that’s why we have an infohazard policy), what we are most concerned about, and that we cover in this post, is deliberate capabilities acceleration justified as being helpful to alignment. Or, to put this in reverse, using the notion that working on systems that are closer to being dangerous might be more fruitful for safety work, to justify actively pushing the capabilities frontier and thus accelerating the arrival of the dangers themselves.
3. We fully agree that engaging with arguments is good, this is why we’re writing this and other work, and we would love all relevant players to do so more. For example, we would love to hear a more detailed, more concrete story from OpenAI of why they believe accelerating AI development has an altruistic justification. We do appreciate that OpenAI and Jan Leike have published their own approach to AI alignment, even though we disagree with some of its contents, and we would strongly support all other players in the field doing the same.
4.
We share your significant unease with such plans. But given what you say here, why at the same time you wouldn’t pursue this plan yourself, yet you say that it seems to you like the expected impact is good?
From our point of view, an unease-generating, AI arrival-accelerating plan seems pretty bad unless proven otherwise. It would be great for the field to hear the reasons why, despite these red flags, this is nevertheless a good plan.
And of course, it would be best to hear the reasoning about the plan directly from those who are pursuing it.
I think the best argument on “Accelerating capabilities is good”, is that it forces you to touch reality instead of theorizing, and given that iteration is good for other fields, we need to ask why we think AI safety is resistant to iterative solutions.
And this in a nutshell is why ML/AGI people can rationally increase capabilities: LW has a non-trivial chance of having broken epistemics, and AGI people do tend towards more selfish utility functions.