1. Fully agree and we appreciate you stating that.
2. While we are concerned about capability externalities from safety work (that’s why we have an infohazard policy), what we are most concerned about, and that we cover in this post, is deliberate capabilities acceleration justified as being helpful to alignment. Or, to put this in reverse, using the notion that working on systems that are closer to being dangerous might be more fruitful for safety work, to justify actively pushing the capabilities frontier and thus accelerating the arrival of the dangers themselves.
3. We fully agree that engaging with arguments is good, this is why we’re writing this and other work, and we would love all relevant players to do so more. For example, we would love to hear a more detailed, more concrete story from OpenAI of why they believe accelerating AI development has an altruistic justification. We do appreciate that OpenAI and Jan Leike have published their own approach to AI alignment, even though we disagree with some of its contents, and we would strongly support all other players in the field doing the same.
4.
I think that Anthropic’s work also accelerates AI arrival, but it is much easier for it to come out ahead on a cost-benefit: they have significantly smaller effects on acceleration, and a more credible case that they will be safer than alternative AI developers. I have significant unease about this kind of plan, partly for the kinds of reasons you list and also a broader set of moral intuitions. As a result it’s not something I would do personally.
But I’ve spent some time thinking it through as best I can and it does seem like the expected impact is good.
We share your significant unease with such plans. But given what you say here, why at the same time you wouldn’t pursue this plan yourself, yet you say that it seems to you like the expected impact is good?
From our point of view, an unease-generating, AI arrival-accelerating plan seems pretty bad unless proven otherwise. It would be great for the field to hear the reasons why, despite these red flags, this is nevertheless a good plan.
And of course, it would be best to hear the reasoning about the plan directly from those who are pursuing it.
I think the best argument on “Accelerating capabilities is good”, is that it forces you to touch reality instead of theorizing, and given that iteration is good for other fields, we need to ask why we think AI safety is resistant to iterative solutions.
And this in a nutshell is why ML/AGI people can rationally increase capabilities: LW has a non-trivial chance of having broken epistemics, and AGI people do tend towards more selfish utility functions.
1. Fully agree and we appreciate you stating that.
2. While we are concerned about capability externalities from safety work (that’s why we have an infohazard policy), what we are most concerned about, and that we cover in this post, is deliberate capabilities acceleration justified as being helpful to alignment. Or, to put this in reverse, using the notion that working on systems that are closer to being dangerous might be more fruitful for safety work, to justify actively pushing the capabilities frontier and thus accelerating the arrival of the dangers themselves.
3. We fully agree that engaging with arguments is good, this is why we’re writing this and other work, and we would love all relevant players to do so more. For example, we would love to hear a more detailed, more concrete story from OpenAI of why they believe accelerating AI development has an altruistic justification. We do appreciate that OpenAI and Jan Leike have published their own approach to AI alignment, even though we disagree with some of its contents, and we would strongly support all other players in the field doing the same.
4.
We share your significant unease with such plans. But given what you say here, why at the same time you wouldn’t pursue this plan yourself, yet you say that it seems to you like the expected impact is good?
From our point of view, an unease-generating, AI arrival-accelerating plan seems pretty bad unless proven otherwise. It would be great for the field to hear the reasons why, despite these red flags, this is nevertheless a good plan.
And of course, it would be best to hear the reasoning about the plan directly from those who are pursuing it.
I think the best argument on “Accelerating capabilities is good”, is that it forces you to touch reality instead of theorizing, and given that iteration is good for other fields, we need to ask why we think AI safety is resistant to iterative solutions.
And this in a nutshell is why ML/AGI people can rationally increase capabilities: LW has a non-trivial chance of having broken epistemics, and AGI people do tend towards more selfish utility functions.