[Speaking solely for myself in this comment; I know some people at OpenAI, but don’t have much in the way of special info. I also previously worked at MIRI, but am not currently.]
I think “increasing” requires some baseline, and I don’t think it’s obvious what baseline to pick here.
For example, consider instead the question “is MIRI decreasing the existential risks related to AI?”. Well, are we comparing to the world where everyone currently employed at MIRI vanishes? Or are we comparing to the world where MIRI as an organization implodes, but the employees are still around, and find jobs somewhere else? Or are we comparing to the world where MIRI as an organization gets absorbed by some other entity? Or are we comparing to the world where MIRI still exists, the same employees still work there, but the mission is somehow changed to be the null mission?
Or perhaps we’re interested in the effects on the margins—if MIRI had more dollars to spend, or less dollars, how would the existential risks change? Even the answers to those last two questions could easily be quite different—perhaps firing any current MIRI employee would make things worse, but there are no additional people that could be hired by MIRI to make things better. [Prove me wrong!]
---
With that preamble out of the way, I think there are three main obstacles to discussing this in public, a la Benquo’s earlier post.
The main one is something like “appeals to consequences.” Talking in public has two main functions: coordinating and information-processing, and it’s quite difficult to separate the two functions. [See this post and the related posts at the bottom.] Suppose I think OpenAI makes humanity less safe, and I want humanity to be more safe; I might try to figure out which strategy will be most persuasive (while still correcting me if I’m the mistaken one!) and pursue that strategy, instead of employing a strategy that more quickly ‘settles the question’ at the cost of making it harder to shift OpenAI’s beliefs. More generally, the people with the most information will be people closest to OpenAI, which probably makes them more careful about what they will or won’t say. There also seem to be significant asymmetries here, as it might be very easy to say “here are three OpenAI researchers I think are making existential risk lower” but very difficult to say “here are three OpenAI researchers I think are making existential risk higher.” [Setting aside the social costs, there’s their personal safety to consider.]
The second one is something like “prediction is hard.” One of my favorite math stories is the history of the Markov chain; in the version I heard, Markov’s rival said a thing, Markov thought to himself “that’s not true!” and then formalized the counterexample in a way that dramatically improved that field. Supposing Benquo’s story of how OpenAI came about is true, and OpenAI will succeed at making beneficial AI, and (counterfactually) DeepMind wouldn’t have succeeded. In this hypothetical world, then it would be the case that while the direct effect of DeepMind on existential AI risk would have been negative, the indirect effect would be positive (as otherwise OpenAI, which succeeded, wouldn’t have existed). While we often think we have a good sense of the direct effect of things, in complicated systems it becomes very non-obvious what the total effects are.
The third one is something like “heterogeneity.” Rather than passing a judgment on the org as a whole, it would make more sense to make my judgments more narrow; “widespread access to AI seems like it makes things worse instead of better,” for example, which OpenAI seems to already have shifted their views on, instead focusing on widespread benefits instead of widespread access.
---
With those obstacles out of the way, here’s some limited thoughts:
I think OpenAI has changed for the better in several important ways over time; for example, the ‘Open’ part of the name is not really appropriate anymore, but this seems good instead of bad on my models of how to avoid existential risks from AI. I think their fraction of technical staff devoted to reasoning about and mitigating risks is higher than DeepMind’s, although lower than MIRI’s (tho MIRI’s fraction is a very high bar); I don’t have a good sense whether that fraction is high enough.
I think the main effects of OpenAI are the impacts they have on the people they hire (and the impacts they don’t have on the people they don’t hire). There are three main effects to consider here: resources, direction-shifting, and osmosis.
On resources, imagine that there’s Dr. Light, whose research interests point in a positive direction, and Dr. Wily, whose research interests point in a negative direction, and the more money you give to Dr. Light the better things get, and the more money you give to Dr. Wily, the worse things get. [But actually what we care about is counterfactuals; if you don’t give Dr. Wily access to any of your compute, he might go elsewhere and get similar amounts of compute, or possibly even more.]
On direction-shifting, imagine someone has a good idea for how to make machine learning better, and they don’t really care what the underlying problem is. You might be able to dramatically change their impact by pointing them at cancer-detection instead of missile guidance, for example. Similarly, they might have a default preference for releasing models, but not actually care much if management says the release should be delayed.
On osmosis, imagine there are lots of machine learning researchers who are mostly focused on technical problems, and mostly get their ‘political’ opinions for social reasons instead of philosophical reasons. Then the main determinant of whether they think that, say, the benefits of AI should be dispersed or concentrated might be whether they hang out at lunch with people who think the former or the latter.
I don’t have a great sense of how those factors aggregate into an overall sense of “OpenAI: increasing or decreasing risks?”, but I think people who take safety seriously should consider working at OpenAI, especially on teams clearly related to decreasing existential risks. [I think people who don’t take safety seriously should consider taking safety seriously.]
First, I should have written it, but my baseline (or my counterfactual) is a world where OpenAI doesn’t exists but the people working there still exists. This might be an improvement if you think that pushing the scaling hypothesis is dangerous and that most of the safety team would find money to keep working, or an issue if you think someone else, probably less aligned, would have pushed the scaling hypothesis, and that the structure given by OpenAI to its safety team is really special and important.
As for your obstacle, I agree that they pose problem. It’s the reason why I don’t expect a full answer to this question. On the other hand, as you show yourself with the end of your post, I still believe we can have a fruitful discussion and debate on some of the issues. This might result in a different stance toward OpenAI, or arguments to defending it, or something completely different. But I don’t think there is nothing to be gained by having this discussion.
On resources, imagine that there’s Dr. Light, whose research interests point in a positive direction, and Dr. Wily, whose research interests point in a negative direction, and the more money you give to Dr. Light the better things get, and the more money you give to Dr. Wily, the worse things get. [But actually what we care about is counterfactuals; if you don’t give Dr. Wily access to any of your compute, he might go elsewhere and get similar amounts of compute, or possibly even more.]
This aims at one criticism of OpenAI I often see: the amount of resources they give to capability research. Your other arguments (particularly osmosis) might influence this, but there’s an intuitive reason for which you might want to only given resources to the Dr Light out there.
On the other hand, your counterfactual world hints that maybe redirecting Dr. Wily or putting it in an environment where the issues of safety are mentioned a lot might help stirs his research in a positive direction.
On direction-shifting, imagine someone has a good idea for how to make machine learning better, and they don’t really care what the underlying problem is. You might be able to dramatically change their impact by pointing them at cancer-detection instead of missile guidance, for example. Similarly, they might have a default preference for releasing models, but not actually care much if management says the release should be delayed.
Here too, I see a way for this part to mean that OpenAI has a positive impact and to mean that it has a negative impact. On the positive impact side, the constraints on models released, the fact of even having a safety team and discussing safety might push new researchers to go into safety or to consider more safety related issues in their work. But on the negative impact side, GPT-3 (as an example) is really cool. If you’re a young student, you might be convinced by it to go work on AI capabilities, without much thought about safety.
On osmosis, imagine there are lots of machine learning researchers who are mostly focused on technical problems, and mostly get their ‘political’ opinions for social reasons instead of philosophical reasons. Then the main determinant of whether they think that, say, the benefits of AI should be dispersed or concentrated might be whether they hang out at lunch with people who think the former or the latter.
This is probably the most clearly positive point for OpenAI. Still, I’m curious of how much safety plays a role in the culture of OpenAI. For example, are all researchers and engineers sensibilized to safety issues? If that’s the case, then the culture would seem to lessen the risks significantly.
This might result in a different stance toward OpenAI
But part of the problem here is that the question “what’s the impact of our stance on OpenAI on existential risks?” is potentially very different from “is OpenAI’s current direction increasing or decreasing existential risks?”, and as people outside of OpenAI have much more control over their stance than they do over OpenAI’s current direction, the first question is much more actionable. And so we run into the standard question substitution problems, where we might be pretending to talk about a probabilistic assessment of an org’s impact while actually targeting the question of “how do I think people should relate to OpenAI?”.
[That said, I see the desire to have clear discussion of the current direction, and that’s why I wrote as much as I did, but I think it has prerequisites that aren’t quite achieved yet.]
I would reemphasize that the “does OpenAI increase risks” is a counterfactual question. That means we need to be clearer about what we are asking as a matter of predicting what the counterfactuals are, and consider strategy options for going forward. This is a major set of questions, and increasing or decreasing risks as a single metric isn’t enough to capture much of interest.
For a taste of what we’d want to consider, what about the following:
Are we asking OpenAI to pick a different, “safer” strategy?
Perhaps they should focus more on hiring people to work on safety and strategy, and hire fewer capabilities researchers. That brings us to the Dr. Wily/Dr. Light question—Perhaps Dr. Capabilities B. Wily shouldn’t be hired, and Dr. Safety R. Light should be, instead. That means Wily does capabilities research elsewhere, perhaps with more resources, and Light does safety research at OpenAI. But the counterfactual is that Light would do (perhaps slightly less well funded) research on safety anyways, and Wily would work on (approximately as useful) capabilities research at OpenAI—advantaging OpenAI in any capabilities races in the future.
Are we asking OpenAI to be larger, and (if needed,) we should find them funding?
Perhaps the should hire both, along with all of Dr. Light and Dr. Wily’s research teams. Fast growth will dilute OpenAI’s culture, but give them an additional marginal advantage over other groups. Perhaps bringing them in would help OpenAI in race dynamics, but make it more likely that they’d engage in such races.
How much funding would this need? Perhaps none—they have cash, they just need to do this. Or perhaps tons, and we need them to be profitable, and focus on that strategy, with all of the implications of that. Or perhaps a moderate amount, and we just need OpenPhil to give them another billion dollars, and then we need to ask about the counterfactual impact of that money.
Or OpenAI should focus on redirecting their capabilities staff to work on safety, and have a harder time hiring the best people who want to work on capabilities? Or OpenAI should be smaller and more focused, and reserve cash?
These are all important questions, but need much more time than I, or I suspect, most of the readers here have available—and are probably already being discussed more usefully by both OpenAI, and their advisors.
[Speaking solely for myself in this comment; I know some people at OpenAI, but don’t have much in the way of special info. I also previously worked at MIRI, but am not currently.]
I think “increasing” requires some baseline, and I don’t think it’s obvious what baseline to pick here.
For example, consider instead the question “is MIRI decreasing the existential risks related to AI?”. Well, are we comparing to the world where everyone currently employed at MIRI vanishes? Or are we comparing to the world where MIRI as an organization implodes, but the employees are still around, and find jobs somewhere else? Or are we comparing to the world where MIRI as an organization gets absorbed by some other entity? Or are we comparing to the world where MIRI still exists, the same employees still work there, but the mission is somehow changed to be the null mission?
Or perhaps we’re interested in the effects on the margins—if MIRI had more dollars to spend, or less dollars, how would the existential risks change? Even the answers to those last two questions could easily be quite different—perhaps firing any current MIRI employee would make things worse, but there are no additional people that could be hired by MIRI to make things better. [Prove me wrong!]
---
With that preamble out of the way, I think there are three main obstacles to discussing this in public, a la Benquo’s earlier post.
The main one is something like “appeals to consequences.” Talking in public has two main functions: coordinating and information-processing, and it’s quite difficult to separate the two functions. [See this post and the related posts at the bottom.] Suppose I think OpenAI makes humanity less safe, and I want humanity to be more safe; I might try to figure out which strategy will be most persuasive (while still correcting me if I’m the mistaken one!) and pursue that strategy, instead of employing a strategy that more quickly ‘settles the question’ at the cost of making it harder to shift OpenAI’s beliefs. More generally, the people with the most information will be people closest to OpenAI, which probably makes them more careful about what they will or won’t say. There also seem to be significant asymmetries here, as it might be very easy to say “here are three OpenAI researchers I think are making existential risk lower” but very difficult to say “here are three OpenAI researchers I think are making existential risk higher.” [Setting aside the social costs, there’s their personal safety to consider.]
The second one is something like “prediction is hard.” One of my favorite math stories is the history of the Markov chain; in the version I heard, Markov’s rival said a thing, Markov thought to himself “that’s not true!” and then formalized the counterexample in a way that dramatically improved that field. Supposing Benquo’s story of how OpenAI came about is true, and OpenAI will succeed at making beneficial AI, and (counterfactually) DeepMind wouldn’t have succeeded. In this hypothetical world, then it would be the case that while the direct effect of DeepMind on existential AI risk would have been negative, the indirect effect would be positive (as otherwise OpenAI, which succeeded, wouldn’t have existed). While we often think we have a good sense of the direct effect of things, in complicated systems it becomes very non-obvious what the total effects are.
The third one is something like “heterogeneity.” Rather than passing a judgment on the org as a whole, it would make more sense to make my judgments more narrow; “widespread access to AI seems like it makes things worse instead of better,” for example, which OpenAI seems to already have shifted their views on, instead focusing on widespread benefits instead of widespread access.
---
With those obstacles out of the way, here’s some limited thoughts:
I think OpenAI has changed for the better in several important ways over time; for example, the ‘Open’ part of the name is not really appropriate anymore, but this seems good instead of bad on my models of how to avoid existential risks from AI. I think their fraction of technical staff devoted to reasoning about and mitigating risks is higher than DeepMind’s, although lower than MIRI’s (tho MIRI’s fraction is a very high bar); I don’t have a good sense whether that fraction is high enough.
I think the main effects of OpenAI are the impacts they have on the people they hire (and the impacts they don’t have on the people they don’t hire). There are three main effects to consider here: resources, direction-shifting, and osmosis.
On resources, imagine that there’s Dr. Light, whose research interests point in a positive direction, and Dr. Wily, whose research interests point in a negative direction, and the more money you give to Dr. Light the better things get, and the more money you give to Dr. Wily, the worse things get. [But actually what we care about is counterfactuals; if you don’t give Dr. Wily access to any of your compute, he might go elsewhere and get similar amounts of compute, or possibly even more.]
On direction-shifting, imagine someone has a good idea for how to make machine learning better, and they don’t really care what the underlying problem is. You might be able to dramatically change their impact by pointing them at cancer-detection instead of missile guidance, for example. Similarly, they might have a default preference for releasing models, but not actually care much if management says the release should be delayed.
On osmosis, imagine there are lots of machine learning researchers who are mostly focused on technical problems, and mostly get their ‘political’ opinions for social reasons instead of philosophical reasons. Then the main determinant of whether they think that, say, the benefits of AI should be dispersed or concentrated might be whether they hang out at lunch with people who think the former or the latter.
I don’t have a great sense of how those factors aggregate into an overall sense of “OpenAI: increasing or decreasing risks?”, but I think people who take safety seriously should consider working at OpenAI, especially on teams clearly related to decreasing existential risks. [I think people who don’t take safety seriously should consider taking safety seriously.]
Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?
Thanks a lot for this great answer!
First, I should have written it, but my baseline (or my counterfactual) is a world where OpenAI doesn’t exists but the people working there still exists. This might be an improvement if you think that pushing the scaling hypothesis is dangerous and that most of the safety team would find money to keep working, or an issue if you think someone else, probably less aligned, would have pushed the scaling hypothesis, and that the structure given by OpenAI to its safety team is really special and important.
As for your obstacle, I agree that they pose problem. It’s the reason why I don’t expect a full answer to this question. On the other hand, as you show yourself with the end of your post, I still believe we can have a fruitful discussion and debate on some of the issues. This might result in a different stance toward OpenAI, or arguments to defending it, or something completely different. But I don’t think there is nothing to be gained by having this discussion.
This aims at one criticism of OpenAI I often see: the amount of resources they give to capability research. Your other arguments (particularly osmosis) might influence this, but there’s an intuitive reason for which you might want to only given resources to the Dr Light out there.
On the other hand, your counterfactual world hints that maybe redirecting Dr. Wily or putting it in an environment where the issues of safety are mentioned a lot might help stirs his research in a positive direction.
Here too, I see a way for this part to mean that OpenAI has a positive impact and to mean that it has a negative impact. On the positive impact side, the constraints on models released, the fact of even having a safety team and discussing safety might push new researchers to go into safety or to consider more safety related issues in their work. But on the negative impact side, GPT-3 (as an example) is really cool. If you’re a young student, you might be convinced by it to go work on AI capabilities, without much thought about safety.
This is probably the most clearly positive point for OpenAI. Still, I’m curious of how much safety plays a role in the culture of OpenAI. For example, are all researchers and engineers sensibilized to safety issues? If that’s the case, then the culture would seem to lessen the risks significantly.
But part of the problem here is that the question “what’s the impact of our stance on OpenAI on existential risks?” is potentially very different from “is OpenAI’s current direction increasing or decreasing existential risks?”, and as people outside of OpenAI have much more control over their stance than they do over OpenAI’s current direction, the first question is much more actionable. And so we run into the standard question substitution problems, where we might be pretending to talk about a probabilistic assessment of an org’s impact while actually targeting the question of “how do I think people should relate to OpenAI?”.
[That said, I see the desire to have clear discussion of the current direction, and that’s why I wrote as much as I did, but I think it has prerequisites that aren’t quite achieved yet.]
I would reemphasize that the “does OpenAI increase risks” is a counterfactual question. That means we need to be clearer about what we are asking as a matter of predicting what the counterfactuals are, and consider strategy options for going forward. This is a major set of questions, and increasing or decreasing risks as a single metric isn’t enough to capture much of interest.
For a taste of what we’d want to consider, what about the following:
Are we asking OpenAI to pick a different, “safer” strategy?
Perhaps they should focus more on hiring people to work on safety and strategy, and hire fewer capabilities researchers. That brings us to the Dr. Wily/Dr. Light question—Perhaps Dr. Capabilities B. Wily shouldn’t be hired, and Dr. Safety R. Light should be, instead. That means Wily does capabilities research elsewhere, perhaps with more resources, and Light does safety research at OpenAI. But the counterfactual is that Light would do (perhaps slightly less well funded) research on safety anyways, and Wily would work on (approximately as useful) capabilities research at OpenAI—advantaging OpenAI in any capabilities races in the future.
Are we asking OpenAI to be larger, and (if needed,) we should find them funding?
Perhaps the should hire both, along with all of Dr. Light and Dr. Wily’s research teams. Fast growth will dilute OpenAI’s culture, but give them an additional marginal advantage over other groups. Perhaps bringing them in would help OpenAI in race dynamics, but make it more likely that they’d engage in such races.
How much funding would this need? Perhaps none—they have cash, they just need to do this. Or perhaps tons, and we need them to be profitable, and focus on that strategy, with all of the implications of that. Or perhaps a moderate amount, and we just need OpenPhil to give them another billion dollars, and then we need to ask about the counterfactual impact of that money.
Or OpenAI should focus on redirecting their capabilities staff to work on safety, and have a harder time hiring the best people who want to work on capabilities? Or OpenAI should be smaller and more focused, and reserve cash?
These are all important questions, but need much more time than I, or I suspect, most of the readers here have available—and are probably already being discussed more usefully by both OpenAI, and their advisors.
Also apparently Megaman is less popular than I thought so I added links to the names.
Fwiw I recently listened to the excellent song ‘The Good Doctor’ which has me quite delighted to get random megaman references.
Oh. Right. I should have gotten the reference, but wasn’t thinking about it.
Just so you know, I got the reference. ;)