First, I should have written it, but my baseline (or my counterfactual) is a world where OpenAI doesn’t exists but the people working there still exists. This might be an improvement if you think that pushing the scaling hypothesis is dangerous and that most of the safety team would find money to keep working, or an issue if you think someone else, probably less aligned, would have pushed the scaling hypothesis, and that the structure given by OpenAI to its safety team is really special and important.
As for your obstacle, I agree that they pose problem. It’s the reason why I don’t expect a full answer to this question. On the other hand, as you show yourself with the end of your post, I still believe we can have a fruitful discussion and debate on some of the issues. This might result in a different stance toward OpenAI, or arguments to defending it, or something completely different. But I don’t think there is nothing to be gained by having this discussion.
On resources, imagine that there’s Dr. Light, whose research interests point in a positive direction, and Dr. Wily, whose research interests point in a negative direction, and the more money you give to Dr. Light the better things get, and the more money you give to Dr. Wily, the worse things get. [But actually what we care about is counterfactuals; if you don’t give Dr. Wily access to any of your compute, he might go elsewhere and get similar amounts of compute, or possibly even more.]
This aims at one criticism of OpenAI I often see: the amount of resources they give to capability research. Your other arguments (particularly osmosis) might influence this, but there’s an intuitive reason for which you might want to only given resources to the Dr Light out there.
On the other hand, your counterfactual world hints that maybe redirecting Dr. Wily or putting it in an environment where the issues of safety are mentioned a lot might help stirs his research in a positive direction.
On direction-shifting, imagine someone has a good idea for how to make machine learning better, and they don’t really care what the underlying problem is. You might be able to dramatically change their impact by pointing them at cancer-detection instead of missile guidance, for example. Similarly, they might have a default preference for releasing models, but not actually care much if management says the release should be delayed.
Here too, I see a way for this part to mean that OpenAI has a positive impact and to mean that it has a negative impact. On the positive impact side, the constraints on models released, the fact of even having a safety team and discussing safety might push new researchers to go into safety or to consider more safety related issues in their work. But on the negative impact side, GPT-3 (as an example) is really cool. If you’re a young student, you might be convinced by it to go work on AI capabilities, without much thought about safety.
On osmosis, imagine there are lots of machine learning researchers who are mostly focused on technical problems, and mostly get their ‘political’ opinions for social reasons instead of philosophical reasons. Then the main determinant of whether they think that, say, the benefits of AI should be dispersed or concentrated might be whether they hang out at lunch with people who think the former or the latter.
This is probably the most clearly positive point for OpenAI. Still, I’m curious of how much safety plays a role in the culture of OpenAI. For example, are all researchers and engineers sensibilized to safety issues? If that’s the case, then the culture would seem to lessen the risks significantly.
This might result in a different stance toward OpenAI
But part of the problem here is that the question “what’s the impact of our stance on OpenAI on existential risks?” is potentially very different from “is OpenAI’s current direction increasing or decreasing existential risks?”, and as people outside of OpenAI have much more control over their stance than they do over OpenAI’s current direction, the first question is much more actionable. And so we run into the standard question substitution problems, where we might be pretending to talk about a probabilistic assessment of an org’s impact while actually targeting the question of “how do I think people should relate to OpenAI?”.
[That said, I see the desire to have clear discussion of the current direction, and that’s why I wrote as much as I did, but I think it has prerequisites that aren’t quite achieved yet.]
Thanks a lot for this great answer!
First, I should have written it, but my baseline (or my counterfactual) is a world where OpenAI doesn’t exists but the people working there still exists. This might be an improvement if you think that pushing the scaling hypothesis is dangerous and that most of the safety team would find money to keep working, or an issue if you think someone else, probably less aligned, would have pushed the scaling hypothesis, and that the structure given by OpenAI to its safety team is really special and important.
As for your obstacle, I agree that they pose problem. It’s the reason why I don’t expect a full answer to this question. On the other hand, as you show yourself with the end of your post, I still believe we can have a fruitful discussion and debate on some of the issues. This might result in a different stance toward OpenAI, or arguments to defending it, or something completely different. But I don’t think there is nothing to be gained by having this discussion.
This aims at one criticism of OpenAI I often see: the amount of resources they give to capability research. Your other arguments (particularly osmosis) might influence this, but there’s an intuitive reason for which you might want to only given resources to the Dr Light out there.
On the other hand, your counterfactual world hints that maybe redirecting Dr. Wily or putting it in an environment where the issues of safety are mentioned a lot might help stirs his research in a positive direction.
Here too, I see a way for this part to mean that OpenAI has a positive impact and to mean that it has a negative impact. On the positive impact side, the constraints on models released, the fact of even having a safety team and discussing safety might push new researchers to go into safety or to consider more safety related issues in their work. But on the negative impact side, GPT-3 (as an example) is really cool. If you’re a young student, you might be convinced by it to go work on AI capabilities, without much thought about safety.
This is probably the most clearly positive point for OpenAI. Still, I’m curious of how much safety plays a role in the culture of OpenAI. For example, are all researchers and engineers sensibilized to safety issues? If that’s the case, then the culture would seem to lessen the risks significantly.
But part of the problem here is that the question “what’s the impact of our stance on OpenAI on existential risks?” is potentially very different from “is OpenAI’s current direction increasing or decreasing existential risks?”, and as people outside of OpenAI have much more control over their stance than they do over OpenAI’s current direction, the first question is much more actionable. And so we run into the standard question substitution problems, where we might be pretending to talk about a probabilistic assessment of an org’s impact while actually targeting the question of “how do I think people should relate to OpenAI?”.
[That said, I see the desire to have clear discussion of the current direction, and that’s why I wrote as much as I did, but I think it has prerequisites that aren’t quite achieved yet.]