One comment in this thread compares the OP to Philip Morris’ claims to be working toward a “smoke-free future.” I think this analogy is overstated, in that I expect Philip Morris is being more intentionally deceptive than Jacob Hilton here. But I quite liked the comment anyway, because I share the sense that (regardless of Jacob’s intention) the OP has an effect much like safetywashing, and I think the exaggerated satire helps make that easier to see.
The OP is framed as addressing common misconceptions about OpenAI, of which it lists five:
OpenAI is not working on scalable alignment.
Most people who were working on alignment at OpenAI left for Anthropic.
OpenAI is a purely for-profit organization.
OpenAI is not aware of the risks of race dynamics.
OpenAI leadership is dismissive of existential risk from AI.
Of these, I think 1, 3, and 4 address positions that are held by basically no one. So by “debunking” much dumber versions of the claims people actually make, the post gives the impression of engaging with criticism, without actually meaningfully doing that. 2 at least addresses a real argument, but at least as I understand it, is quite misleading—while technically true, it seriously underplays the degree to which there was an exodus of key safety-conscious staff, who left because they felt OpenAI leadership was too reckless. So of these, only 5 strikes me as responding non-misleadingly to a real criticism people actually regularly make.
In response to the Philip Morris analogy, Jacob advised caution:
rhetoric like this seems like an excellent way to discourage OpenAI employees from ever engaging with the alignment community.
For many years, the criticism I heard of OpenAI in private was dramatically more vociferous than what I heard in public. I think much of this was because many people shared Jacob’s concern—if we say what we actually think about their strategy, maybe they’ll write us off as enemies, and not listen later when it really counts?
But I think this is starting to change. I’ve seen a lot more public criticism lately, which I think is probably at least in part because it’s become so obvious that the strategy of mincing our words hasn’t worked. If they mostly ignore all but the very most optimistic alignment researchers now, why should we expect that will change later, as long as we keep being careful to avoid stating any of our offensive-sounding beliefs?
From talking with early employees and others, my impression is that OpenAI’s founding was incredibly reckless, in the sense that they rushed to deploy their org, before first taking much time to figure out how to ensure that went well. The founders’ early comments about accident risk mostly strike me as so naive and unwise, that I find it hard to imagine they thought much at all about the existing alignment literature before deciding to charge ahead and create a new lab. Their initial plan—the one still baked into their name—would have been terribly dangerous if implemented, for reasons I’d think should have been immediately obvious to them had they stopped to think hard about accident risk at all.
And I think their actions since then have mostly been similarly reckless. When they got the scaling laws result, they published a paper about it, thereby popularizing the notion that “just making the black box bigger” might be a viable path to AGI. When they demoed this strategy with products like GPT-3, DALL-E, and CLIP, they described much of the architecture publicly, inspiring others to pursue similar research directions.
So in effect, as far as I can tell, they created a very productive “creating the x-risk” department, alongside a smaller “mitigating that risk” department—the presence of which I take the OP to describe as reassuring—staffed by a few of the most notably optimistic alignment researchers, many of whom left because even they felt too worried about OpenAI’s recklessness.
After all of that, why would we expect they’ll suddenly start being prudent and cautious when it comes time to deploy transformative tech? I don’t think we should.
My strong bet is that OpenAI leadership are good people, in the standard deontological sense, and I think that’s overwhelmingly the sense that should govern interpersonal interactions. I think they’re very likely trying hard, from their perspective, to make this go well, and I urge you, dear reader, not to be an asshole to them. Figuring out what makes sense is hard; doing things is hard; attempts to achieve goals often somehow accidentally end up causing the opposite thing to happen; nobody will want to work with you if small strategic updates might cause you to suddenly treat them totally differently.
But I think we are well past the point where it plausibly makes sense for pessimistic folks to refrain from stating their true views about OpenAI (or any other lab) just to be polite. They didn’t listen the first times alignment researchers screamed in horror, and they probably won’t listen the next times either. So you might as well just say what you actually think—at least that way, anyone who does listen will find a message worth hearing.
Another bit of evidence about OpenAI that I think is worth mentioning in this context: OPP recommended a grant of $30M to OpenAI in a deal that involved OPP’s then-CEO becoming a board member of OpenAI. OPP hoped that this will allow them to make OpenAI improve their approach to safety and governance. Later, OpenAI appointed both the CEO’s fiancée and the fiancée’s sibling to VP positions.
Yes. To be clear, the point here is that OpenAI’s behavior in that situation seems similar to how, seemingly, for-profit companies sometimes try to capture regulators by paying their family members. (See 30 seconds from this John Oliver monologue as evidence that such tactics are not rare in the for-profit world.)
Makes sense; it wouldn’t surprise me if that’s what’s happening. I think this perhaps understates the degree to which the attempts at capture were mutual—a theory of change where OPP gives money to OpenAI in exchange for a board seat and the elevation of safety-conscious employees at OpenAI seems like a pretty good way to have an effect. [This still leaves the question of how OPP assesses safety-consciousness.]
I should also note find the ‘nondisparagement agreements’ people have signed with OpenAI somewhat troubling because it means many people with high context will not be writing comments like Adam Scholl’s above if they wanted to, and so the absence of evidence is not as much evidence of absence as one would hope.
Sooo this was such an intriguing idea that I did some research—but reality appears to be more boring:
In a recent informal discussion I believe said OPP CEO remarked he had to give up the OpenAI board seat as his fiancée joining Anthropic creates a conflict of interest. Naively this is much more likely, and I think is much better supported by the timelines. According to LinkedIn of the mentioned fiancée joined in already as VP in 2018 and was promoted to a probably more serious position in 2020, and her sibling was promoted to VP in 2019. The Anthropic split occurred in June 2021. A new board member (who is arguably very aligned to OPP) was inducted in September 2021, probably in place of OPP CEO. It is unclear when OPP CEO exactly left the board, but I would guess sometime in 2021. This seem better explained by “conflict of interest with his fiancée joining-cofounding Anthropic” and OpenAI putting an other OPP-aligned board member in his place wouldn’t make for very productive scheming.
(I work at OpenAI). Is the main thing you think has the effect of safetywashing here the claim that the misconceptions are common? Like if the post was “some misconceptions I’ve encountered about OpenAI” it would mostly not have that effect? (Point 2 was edited to clarify that it wasn’t a full account of the Anthropic split.)
“the presence of which I take the OP to describe as reassuring”
I get the sense from this, and from the rest of your comment here that you think we should in fact not find this even mildly reassuring. I’m not going to argue with such a claim, because I don’t think such an effort on my part would be very useful to anyone. However, if I’m not completely off base or I’m not overstating your position (which I totally could be) , then could you go into some more detail as to why you think that we shouldn’t find their presence reassuring at all?
Suppose you’re in middle school, and one day you learn that your teachers are planning a mandatory field trip, during which the entire grade will jump off of a skyscraper without a parachute. You approach a school administrator to talk to them about how dangerous that would be, and they say, “Don’t worry! We’ll all be wearing hard hats the entire time.”
Hearing that probably does not reassure you even a little bit, because hard hats alone would not nudge the probability of death below ~100%. It might actually make you more worried, because the fact that they have a prepared response means school administrators were aware of potential issues and then decided the hard hat solution was appropriate. It’s generally harder to argue someone out of believing in an incorrect solution to a problem, than into believing the problem exists in the first place.
This analogy overstates the obviousness of (and my personal confidence in) the risk, but to a lot of alignment researchers it’s an essentially accurate metaphor for how ineffective they think OpenAI’s current precautions will turn out in practice, even if making a doomsday AI feels like a more “understandable” mistake.
One comment in this thread compares the OP to Philip Morris’ claims to be working toward a “smoke-free future.” I think this analogy is overstated, in that I expect Philip Morris is being more intentionally deceptive than Jacob Hilton here. But I quite liked the comment anyway, because I share the sense that (regardless of Jacob’s intention) the OP has an effect much like safetywashing, and I think the exaggerated satire helps make that easier to see.
The OP is framed as addressing common misconceptions about OpenAI, of which it lists five:
OpenAI is not working on scalable alignment.
Most people who were working on alignment at OpenAI left for Anthropic.
OpenAI is a purely for-profit organization.
OpenAI is not aware of the risks of race dynamics.
OpenAI leadership is dismissive of existential risk from AI.
Of these, I think 1, 3, and 4 address positions that are held by basically no one. So by “debunking” much dumber versions of the claims people actually make, the post gives the impression of engaging with criticism, without actually meaningfully doing that. 2 at least addresses a real argument, but at least as I understand it, is quite misleading—while technically true, it seriously underplays the degree to which there was an exodus of key safety-conscious staff, who left because they felt OpenAI leadership was too reckless. So of these, only 5 strikes me as responding non-misleadingly to a real criticism people actually regularly make.
In response to the Philip Morris analogy, Jacob advised caution:
For many years, the criticism I heard of OpenAI in private was dramatically more vociferous than what I heard in public. I think much of this was because many people shared Jacob’s concern—if we say what we actually think about their strategy, maybe they’ll write us off as enemies, and not listen later when it really counts?
But I think this is starting to change. I’ve seen a lot more public criticism lately, which I think is probably at least in part because it’s become so obvious that the strategy of mincing our words hasn’t worked. If they mostly ignore all but the very most optimistic alignment researchers now, why should we expect that will change later, as long as we keep being careful to avoid stating any of our offensive-sounding beliefs?
From talking with early employees and others, my impression is that OpenAI’s founding was incredibly reckless, in the sense that they rushed to deploy their org, before first taking much time to figure out how to ensure that went well. The founders’ early comments about accident risk mostly strike me as so naive and unwise, that I find it hard to imagine they thought much at all about the existing alignment literature before deciding to charge ahead and create a new lab. Their initial plan—the one still baked into their name—would have been terribly dangerous if implemented, for reasons I’d think should have been immediately obvious to them had they stopped to think hard about accident risk at all.
And I think their actions since then have mostly been similarly reckless. When they got the scaling laws result, they published a paper about it, thereby popularizing the notion that “just making the black box bigger” might be a viable path to AGI. When they demoed this strategy with products like GPT-3, DALL-E, and CLIP, they described much of the architecture publicly, inspiring others to pursue similar research directions.
So in effect, as far as I can tell, they created a very productive “creating the x-risk” department, alongside a smaller “mitigating that risk” department—the presence of which I take the OP to describe as reassuring—staffed by a few of the most notably optimistic alignment researchers, many of whom left because even they felt too worried about OpenAI’s recklessness.
After all of that, why would we expect they’ll suddenly start being prudent and cautious when it comes time to deploy transformative tech? I don’t think we should.
My strong bet is that OpenAI leadership are good people, in the standard deontological sense, and I think that’s overwhelmingly the sense that should govern interpersonal interactions. I think they’re very likely trying hard, from their perspective, to make this go well, and I urge you, dear reader, not to be an asshole to them. Figuring out what makes sense is hard; doing things is hard; attempts to achieve goals often somehow accidentally end up causing the opposite thing to happen; nobody will want to work with you if small strategic updates might cause you to suddenly treat them totally differently.
But I think we are well past the point where it plausibly makes sense for pessimistic folks to refrain from stating their true views about OpenAI (or any other lab) just to be polite. They didn’t listen the first times alignment researchers screamed in horror, and they probably won’t listen the next times either. So you might as well just say what you actually think—at least that way, anyone who does listen will find a message worth hearing.
Another bit of evidence about OpenAI that I think is worth mentioning in this context: OPP recommended a grant of $30M to OpenAI in a deal that involved OPP’s then-CEO becoming a board member of OpenAI. OPP hoped that this will allow them to make OpenAI improve their approach to safety and governance. Later, OpenAI appointed both the CEO’s fiancée and the fiancée’s sibling to VP positions.
Both of whom then left for Anthropic with the split, right?
Yes. To be clear, the point here is that OpenAI’s behavior in that situation seems similar to how, seemingly, for-profit companies sometimes try to capture regulators by paying their family members. (See 30 seconds from this John Oliver monologue as evidence that such tactics are not rare in the for-profit world.)
Makes sense; it wouldn’t surprise me if that’s what’s happening. I think this perhaps understates the degree to which the attempts at capture were mutual—a theory of change where OPP gives money to OpenAI in exchange for a board seat and the elevation of safety-conscious employees at OpenAI seems like a pretty good way to have an effect. [This still leaves the question of how OPP assesses safety-consciousness.]
I should also note find the ‘nondisparagement agreements’ people have signed with OpenAI somewhat troubling because it means many people with high context will not be writing comments like Adam Scholl’s above if they wanted to, and so the absence of evidence is not as much evidence of absence as one would hope.
Does everyone who work at OpenAI sign a non-disparagement agreement? (Including those who work on governance/policy?)
Sooo this was such an intriguing idea that I did some research—but reality appears to be more boring:
In a recent informal discussion I believe said OPP CEO remarked he had to give up the OpenAI board seat as his fiancée joining Anthropic creates a conflict of interest. Naively this is much more likely, and I think is much better supported by the timelines.
According to LinkedIn of the mentioned fiancée joined in already as VP in 2018 and was promoted to a probably more serious position in 2020, and her sibling was promoted to VP in 2019.
The Anthropic split occurred in June 2021.
A new board member (who is arguably very aligned to OPP) was inducted in September 2021, probably in place of OPP CEO.
It is unclear when OPP CEO exactly left the board, but I would guess sometime in 2021. This seem better explained by “conflict of interest with his fiancée joining-cofounding Anthropic” and OpenAI putting an other OPP-aligned board member in his place wouldn’t make for very productive scheming.
The “conflict of interest” explanation also matches my understanding of the situation better.
(I work at OpenAI). Is the main thing you think has the effect of safetywashing here the claim that the misconceptions are common? Like if the post was “some misconceptions I’ve encountered about OpenAI” it would mostly not have that effect? (Point 2 was edited to clarify that it wasn’t a full account of the Anthropic split.)
“the presence of which I take the OP to describe as reassuring”
I get the sense from this, and from the rest of your comment here that you think we should in fact not find this even mildly reassuring. I’m not going to argue with such a claim, because I don’t think such an effort on my part would be very useful to anyone. However, if I’m not completely off base or I’m not overstating your position (which I totally could be) , then could you go into some more detail as to why you think that we shouldn’t find their presence reassuring at all?
Suppose you’re in middle school, and one day you learn that your teachers are planning a mandatory field trip, during which the entire grade will jump off of a skyscraper without a parachute. You approach a school administrator to talk to them about how dangerous that would be, and they say, “Don’t worry! We’ll all be wearing hard hats the entire time.”
Hearing that probably does not reassure you even a little bit, because hard hats alone would not nudge the probability of death below ~100%. It might actually make you more worried, because the fact that they have a prepared response means school administrators were aware of potential issues and then decided the hard hat solution was appropriate. It’s generally harder to argue someone out of believing in an incorrect solution to a problem, than into believing the problem exists in the first place.
This analogy overstates the obviousness of (and my personal confidence in) the risk, but to a lot of alignment researchers it’s an essentially accurate metaphor for how ineffective they think OpenAI’s current precautions will turn out in practice, even if making a doomsday AI feels like a more “understandable” mistake.
Thank you! I think I understand this position a good deal more now.