I think it’s fairly self-evident that you should have exceedingly high standards for projects intending to build AGI (OpenAI, DeepMind, others). It’s really hard to reduce existential risk from AI, and I think much thought around this has been naive and misguided.
(Two examples of this outside of OpenAI include: senior AI researchers talking about military use of AI instead of misalignment, and senior AI researchers saying responding to the problems of specification gaming by saying “objectives can be changed quickly when issues surface” and “existential threats to humanity have to be explicitly designed as such”.)
An obvious reason to think OpenAI’s impact will be net negative is that they seem to be trying to reach AGI as fast as possible, and trying a route different from DeepMind and other competitors, so are in some world shortening the timeline until AI. (I’m aware that there are arguments about why a shorter timeline is better, but I’m not sold on them right now.)
There are also more detailed conversations, about alignment, what the core of the problem actually is, and other strategic questions. I expect (and take from occasional things I hear) I have substantial disagreements with OpenAI decision-makers, which I think alone is sufficient reason for me to feel doomy about humanity’s prospects.
That said, I’m quite impressed with their actions around release practises and also their work in becoming a profit-capped entity. I felt like they were a live player with these acts and were clearly acting against their short-term self-interest in favour of humanity’s broader good, with some relatively sane models around these specific aspects of what’s important. Those were both substantial updates for me, and make me feel pretty cooperative with them.
And of course I’m very happy indeed about a bunch of the safety work they do and support. The org give lots of support and engineers to people like Paul Christiano, Chris Olah, etc that I think is better than those people probably would get counterfactually, and I’m very grateful that the organisation provides this.
Overall I don’t feel my opinion is very robust, and could easily change. Here’s some example of things that I think could substantially change my opinion:
How senior decision-making happens at OpenAI
What technical models of AGI senior researchers at OpenAI have
Broader trends that would have happened to the field of AI (and the field of AI alignment) in the counterfactual world where they were not founded
The first one feels a bit too optimistic. It’s something more like: Are they able to be direct in their disagreement with one another? What level of internal politicking is there? How much ability do some of the leadership have to make unilateral decisions? Etc.
The second one is the one more about alignment, takeoff dynamics, and timelines. All the details, like the likelihood of Mesa optimisers. What are their thoughts on this, and how much do they think about it?
For the third, that one’s good. Also things about how differently things would’ve gone at DeepMind, and also how good/bad the world would be if Musk hadn’t shifted The Overton window so much (which I think is counterfactually linked up with OpenAI existing, you get both or neither).
I think it’s fairly self-evident that you should have exceedingly high standards for projects intending to build AGI (OpenAI, DeepMind, others). It’s really hard to reduce existential risk from AI, and I think much thought around this has been naive and misguided.
(Two examples of this outside of OpenAI include: senior AI researchers talking about military use of AI instead of misalignment, and senior AI researchers saying responding to the problems of specification gaming by saying “objectives can be changed quickly when issues surface” and “existential threats to humanity have to be explicitly designed as such”.)
An obvious reason to think OpenAI’s impact will be net negative is that they seem to be trying to reach AGI as fast as possible, and trying a route different from DeepMind and other competitors, so are in some world shortening the timeline until AI. (I’m aware that there are arguments about why a shorter timeline is better, but I’m not sold on them right now.)
There are also more detailed conversations, about alignment, what the core of the problem actually is, and other strategic questions. I expect (and take from occasional things I hear) I have substantial disagreements with OpenAI decision-makers, which I think alone is sufficient reason for me to feel doomy about humanity’s prospects.
That said, I’m quite impressed with their actions around release practises and also their work in becoming a profit-capped entity. I felt like they were a live player with these acts and were clearly acting against their short-term self-interest in favour of humanity’s broader good, with some relatively sane models around these specific aspects of what’s important. Those were both substantial updates for me, and make me feel pretty cooperative with them.
And of course I’m very happy indeed about a bunch of the safety work they do and support. The org give lots of support and engineers to people like Paul Christiano, Chris Olah, etc that I think is better than those people probably would get counterfactually, and I’m very grateful that the organisation provides this.
Overall I don’t feel my opinion is very robust, and could easily change. Here’s some example of things that I think could substantially change my opinion:
How senior decision-making happens at OpenAI
What technical models of AGI senior researchers at OpenAI have
Broader trends that would have happened to the field of AI (and the field of AI alignment) in the counterfactual world where they were not founded
Thanks for your answer! Trying to make your examples of what might change your opinion substantially more concrete, I got these:
Does senior decision-making at OpenAI always consider safety issues before greenlighting new capability research?
Do senior researchers at OpenAI believe that their current research directly leads to AGI in the short term?
Would the Scaling Hypothesis (and thus GPT-N) have been vindicated as soon in a world without OpenAI?
Do you agree with these? Do you have other ideas of concrete questions?
The first one feels a bit too optimistic. It’s something more like: Are they able to be direct in their disagreement with one another? What level of internal politicking is there? How much ability do some of the leadership have to make unilateral decisions? Etc.
The second one is the one more about alignment, takeoff dynamics, and timelines. All the details, like the likelihood of Mesa optimisers. What are their thoughts on this, and how much do they think about it?
For the third, that one’s good. Also things about how differently things would’ve gone at DeepMind, and also how good/bad the world would be if Musk hadn’t shifted The Overton window so much (which I think is counterfactually linked up with OpenAI existing, you get both or neither).
Post OpenAI exodus update: does the exit of Dario Amodei, Chris Olah, Jack Clarke and potentially others from OpenAI make you change your opinion?