Epistemic Note: The implications of this argument being true are quite substantial, and I do not have any knowledge of the internal workings of Open Phil.
(Both title and this note have been edited, cheers to Ben Pace for very constructive feedback.)
Premise 1: It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.
Premise 2: This was the default outcome.
Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.
Edit: To clarify, you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).
Premise 3: Without repercussions for terrible decisions, decision makers have no skin in the game.
To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.
This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.
To quote OpenPhil: ”OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela.”
We expect the primary benefits of this grant to stem from our partnership with OpenAI, rather than simply from contributing funding toward OpenAI’s work. While we would also expect general support for OpenAI to be likely beneficial on its own, the case for this grant hinges on the benefits we anticipate from our partnership, particularly the opportunity to help play a role in OpenAI’s approach to safety and governance issues.
So the case for the grant wasn’t “we think it’s good to make OAI go faster/better”.
Why do you think the grant was bad? E.g. I don’t think “OAI is bad” would suffice to establish that the grant was bad.
So the case for the grant wasn’t “we think it’s good to make OAI go faster/better”.
I agree. My intended meaning is not that the grant is bad because its purpose was to accelerate capabilities. I apologize that the original post was ambiguous
Rather, the grant was bad for numerous reasons, including but not limited to:
It appears to have had an underwhelming governance impact (as demonstrated by the board being unable to remove Sam).
It enabled OpenAI to “safety-wash” their product (although how important this has been is unclear to me.)
From what I’ve seen at conferences and job boards, it seems reasonable to assert that the relationship between Open Phil and OpenAI has lead people to work at OpenAI.
Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you’re only concerned with human misuse and not misalignment.
Finally, it’s giving money directly to an organisation with the stated goal of producing an AGI. There is substantial negative -EV if the grant sped up timelines.
This last claim seems very important. I have not been able to find data that would let me confidently estimate OpenAI’s value at the time the grant was given. However, wikipedia mentions that “In 2017 OpenAI spent $7.9 million, or a quarter of its functional expenses, on cloud computing alone.” This certainly makes it seem that the grant provided OpenAI with a significant amount of capital, enough to have increased its research output.
Keep in mind, the grant needs to have generated 30 million in EV just to break even. I’m now going to suggest some other uses for the money, but keep in mind these are just rough estimates and I haven’t adjusted for inflation. I’m not claiming these are the best uses of 30 million dollars.
The money could have funded an organisation the size of MIRI for roughly a decade (basing my estimate on MIRI’s 2017 fundraiser, using 2020 numbers gives an estimate of ~4 years).
Imagine the shift in public awareness if there had been an AI safety Superbowl ad for 3-5 years.
In your initial post, it sounded like you were trying to say:
This grant was obviously ex ante bad. In fact, it’s so obvious that it was ex ante bad that we should strongly update against everyone involved in making it.
I think that this argument is in principle reasonable. But to establish it, you have to demonstrate that the grant was extremely obviously ex ante bad. I don’t think your arguments here come close to persuading me of this.
For example, re governance impact, when the board fired sama, markets thought it was plausible he would stay gone. If that had happened, I don’t think you’d assess the governance impact as “underwhelming”. So I think that (if you’re in favor of sama being fired in that situation, which you probably are) you shouldn’t consider the governance impact of this grant to be obviously ex ante ineffective.
I think that arguing about the impact of grants requires much more thoroughness than you’re using here. I think your post has a bad “ratio of heat to light”: you’re making a provocative claim but not really spelling out why you believe the premises.
“This grant was obviously ex ante bad. In fact, it’s so obvious that it was ex ante bad that we should strongly update against everyone involved in making it.”
This is an accurate summary.
“arguing about the impact of grants requires much more thoroughness than you’re using here”
We might not agree on the level of effort required for a quick take. I do not currently have the time available to expand this into a full write up on the EA forum but am still interested in discussing this with the community.
“you’re making a provocative claim but not really spelling out why you believe the premises.”
I think this is a fair criticism and something I hope I can improve on.
I feel frustrated that your initial comment (which is now the top reply) implies I either hadn’t read the 1700 word grant justification that is at the core of my argument, or was intentionally misrepresenting it to make my point. This seems to be an extremely uncharitable interpretation of my initial post. (Edit: I am retracting this statement and now understand Buck’s comment was meaningful context. Apologies to Buck and see commentary by Ryan Greenblat below)
Your reply has been quite meta, which makes it difficult to convince you on specific points.
Your argument on betting markets has updated me slightly towards your position, but I am not particularly convinced. My understanding is that Open Phil and OpenAI had a close relationship, and hence Open Phil had substantially more information to work with than the average manifold punter.
I feel frustrated that your initial comment (which is now the top reply) implies I either hadn’t read the 1700 word grant justification that is at the core of my argument, or was intentionally misrepresenting it to make my point.
I think this comment is extremely important for bystanders to understand the context of the grant and it isn’t mentioned in your original short form post.
So, regardless of whether you understand the situation, it’s important that other people understand the intention of the grant (and this intention isn’t obvious from your original comment). Thus, this comment from Buck is valuable.
I also think that the main interpretation from bystanders of your original shortform would be something like:
OpenPhil made a grant to OpenAI
OpenAI is bad (and this was ex-ante obvious)
Therefore this grant is bad and the people who made this grant are bad.
Fair enough if this wasn’t your intention, but I think it will be how bystanders interact with this.
Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you’re only concerned with human misuse and not misalignment.
Hmmm, can you point to where you think the grant shows this? I think the following paragraph from the grant seems to indicate otherwise:
When OpenAI launched, it characterized the nature of the risks – and the most appropriate strategies for reducing them – in a way that we disagreed with. In particular, it emphasized the importance of distributing AI broadly;1 our current view is that this may turn out to be a promising strategy for reducing potential risks, but that the opposite may also turn out to be true (for example, if it ends up being important for institutions to keep some major breakthroughs secure to prevent misuse and/or to prevent accidents). Since then, OpenAI has put out more recent content consistent with the latter view,2 and we are no longer aware of any clear disagreements. However, it does seem that our starting assumptions and biases on this topic are likely to be different from those of OpenAI’s leadership, and we won’t be surprised if there are disagreements in the future.
“In particular, it emphasized the importance of distributing AI broadly;1 our current view is that this may turn out to be a promising strategy for reducing potential risks”
Yes, I’m interpreting the phrase “may turn out” to be treating the idea with more seriousness than it deserves.
Rereading the paragraph, it seems reasonable to interpret it as politely downplaying it, in which case my statement about Open Phil taking the idea seriously is incorrect.
“we would also expect general support for OpenAI to be likely beneficial on its own” seems to imply that they did think it was good to make OAI go faster/better, unless that statement was a lie to avoid badmouthing a grantee.
I just realized that Paul Christiano and Dario Amodei both probably have signed non-disclosure + non-disparagement contracts since they both left OpenAI.
That impacts how I’d interpret Paul’s (and Dario’s) claims and opinions (or the lack thereof), that relates to OpenAI or alignment proposals entangled with what OpenAI is doing. If Paul has systematically silenced himself, and a large amount of OpenPhil and SFF money has been mis-allocated because of systematically skewed beliefs that these organizations have had due to Paul’s opinions or lack thereof, well. I don’t think this is the case though—I expect Paul, Dario, and Holden all seem to have converged on similar beliefs (whether they track reality or not) and have taken actions consistent with those beliefs.
I don’t know the answer, but it would be fun to have a twitter comment with a zillion likes asking Sam Altman this question. Maybe someone should make one?
I think there’s a solid case for anyone who supported funding OpenAI being considered at best well intentioned but very naive. I think the idea that we should align and develop superintelligence but, like, good, has always been a blind spot in this community—an obviously flawed but attractive goal, because it dodged the painful choice between extinction risk and abandoning hopes of personally witnessing the singularity or at least a post scarcity world. This is also a case where people’s politics probably affected them, because plenty of others would be instinctively distrustful of corporation driven solutions to anything—it’s something of a Godzilla Strategy after all, aligning corporations is also an unsolved problem—but those with an above average level of trust in free markets weren’t so averse.
Such people don’t necessarily have conflicts of interest (though some may, and that’s another story) but they at least need to drop the fantasy land stuff and accept harsh reality on this before being of any use.
It’s also notable that the topic of OpenAI nondisparagement agreements was brought to Holden Karnofsky’s attention in 2022, and he replied with “I don’t know whether OpenAI uses nondisparagement agreements; I haven’t signed one.” (He could have asked his contacts inside OAI about it, or asked the EA board member to investigate. Or even set himself up earlier as someone OpenAI employees could whistleblow to on such issues.)
If the point was to buy a ticket to play the inside game, then it was played terribly and negative credit should be assigned on that basis, and for misleading people about how prosocial OpenAI was likely to be (due to having an EA board member).
I don’t know whether OpenAI uses nondisparagement agreements; I haven’t signed one.
This can also be glomarizing. “I haven’t signed one.” is a fact, intended for the reader to use it as anecdotal evidence. “I don’t know whether OpenAI uses nondisparagement agreements” can mean that he doesn’t know for sure, and will not try to find out.
Obviously, the context of the conversation and the events surrounding Holden stating this matters for interpreting this statement, but I’m not interested in looking further into this, so I’m just going to highlight the glomarization possibility.
On a meta note, IF proposition 2 is true, THEN the best way to tell this would be if people had been saying so AT THE TIME. If instead, actually everyone at the time disagreed with proposition 2, then it’s not clear that there’s someone “we” know to hand over decision making power to instead. Personally, I was pretty new to the area, and as a Yudkowskyite I’d probably have reflexively decried giving money to any sort of non-X-risk-pilled non-alignment-differential capabilities research. But more to the point, as a newcomer, I wouldn’t have tried hard to have independent opinions about stuff that wasn’t in my technical focus area, or to express those opinions with much conviction, maybe because it seemed like Many Highly Respected Community Members With Substantially Greater Decision Making Experience would know far better, and would not have the time or the non-status to let me in on the secret subtle reasons for doing counterintuitive things. Now I think everyone’s dumb and everyone should say their opinions a lot so that later they can say that they’ve been saying this all along. I’ve become extremely disagreeable in the last few years, I’m still not disagreeable enough, and approximately no one I know personally is disagreeable enough.
What about large numbers of people working at OpenAI directly on capabilities for many years? (Which is surely worth far more than $30 million.)
Separately, this grant seems to have been done to influence the goverance at OpenAI, not make OpenAI go faster. (Directly working on capabilities seems modestly more accelerating and risky than granting money in exchange for a partnership.)
(ETA: TBC, there is a relationship between the grant and people working at OpenAI on capabilities: the grant was associated with a general vague endorsement of trying to play inside game at OpenAI.)
Epistemic Note: Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion.
FYI I wish to register my weak disapproval of this opening. A la Scott Alexander’s “Against Bravery Debates”, I think it is actively distracting and a little mind-killing to open by making a claim about status and popularity of a position even if it’s accurate.
I think in this case it would be reasonable to say something like “the implications of this argument being true involve substantial reallocation of status and power, so please be conscious of that and let’s all try to assess the evidence accurately and avoid overheating”. This is different from something like “I know lots of people will disagree with me on this but I’m going to say it”.
I’m not saying this was an easy post to write, but I think the standard to aim for is not having openings like this.
Honestly, maybe further controversial opinion, but this [30 million for a board seat at what would become the lead co. for AGI, with a novel structure for nonprofit control that could work?] still doesn’t feel like necessarily as bad a decision now as others are making it out to be?
The thing that killed all value of this deal was losing the board seat(s?), and I at least haven’t seen much discussion of this as a mistake.
I’m just surprised so little prioritization was given to keeping this board seat, it was probably one of the most important assets of the “AI safety community and allies”, and there didn’t seem to be any real fight with Sam Altman’s camp for it.
So Holden has the board seat, but has to leave because of COI, and endorses Toner to replace, ”… Karnofsky cited a potential conflict of interest because his wife, Daniela Amodei, a former OpenAI employee, helped to launch the AI company Anthropic.
Given that Toner previously worked as a senior research analyst at Open Philanthropy, Loeber speculates that Karnofsky might’ve endorsed her as his replacement.”
Like, maybe it was doomed if they only had one board seat (Open Phil) vs whoever else is on the board, and there’s a lot of shuffling about as Musk and Hoffman also leave for COIs, but start of 2023 it seems like there is an “AI Safety” half to the board, and a year later there are now none. Maybe it was further doomed if Sam Altman has the, take the whole company elsewhere, card, but idk… was this really inevitable? Was there really not a better way to, idk, maintain some degree of control and supervision of this vital board over the years since OP gave the grant?
To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.
I like a lot of this post, but the sentence above seems very out of touch to me. Who are these third parties who are completely objective? Why is objective the adjective here, instead of “good judgement” or “predicted this problem at the time”?
I downvoted this comment because it felt uncomfortably scapegoat-y to me. If you think the OpenAI grant was a big mistake, it’s important to have a detailed investigation of what went wrong, and that sort of detailed investigation is most likely to succeed if you have cooperation from people who are involved. I’ve been reading a fair amount about what it takes to instill a culture of safety in an organization, and nothing I’ve seen suggests that scapegoating is a good approach.
Writing a postmortem is not punishment—it is a learning opportunity for the entire company.
...
Blameless postmortems are a tenet of SRE culture. For a postmortem to be truly blameless, it must focus on identifying the contributing causes of the incident without indicting any individual or team for bad or inappropriate behavior. A blamelessly written postmortem assumes that everyone involved in an incident had good intentions and did the right thing with the information they had. If a culture of finger pointing and shaming individuals or teams for doing the “wrong” thing prevails, people will not bring issues to light for fear of punishment.
Blameless culture originated in the healthcare and avionics industries where mistakes can be fatal. These industries nurture an environment where every “mistake” is seen as an opportunity to strengthen the system. When postmortems shift from allocating blame to investigating the systematic reasons why an individual or team had incomplete or incorrect information, effective prevention plans can be put in place. You can’t “fix” people, but you can fix systems and processes to better support people making the right choices when designing and maintaining complex systems.
...
Removing blame from a postmortem gives people the confidence to escalate issues without fear. It is also important not to stigmatize frequent production of postmortems by a person or team. An atmosphere of blame risks creating a culture in which incidents and issues are swept under the rug, leading to greater risk for the organization [Boy13].
...
We can say with confidence that thanks to our continuous investment in cultivating a postmortem culture, Google weathers fewer outages and fosters a better user experience.
If you start with the assumption that there was a moral failing on the part of the grantmakers, and you are wrong, there’s a good chance you’ll never learn that.
If you start with the assumption that there was a moral failing on the part of the grantmakers, and you are wrong, there’s a good chance you’ll never learn that.
I think you are misinterpreting the grandparent comment. I do not read any mention of a ‘moral failing’ in that comment. You seem worried because of the commenter’s clear description of what they think would be a sensible step for us to take given what they believe are egregious flaws in the decision-making processes of the people involved. I don’t think there’s anything wrong with such claims.
Again: You can care about people while also seeing their flaws and noticing how they are hurting you and others you care about. You can be empathetic to people having flawed decision making and care about them, while also wanting to keep them away from certain decision-making positions.
If you think the OpenAI grant was a big mistake, it’s important to have a detailed investigation of what went wrong, and that sort of detailed investigation is most likely to succeed if you have cooperation from people who are involved.
Oh, interesting. Who exactly do you think influential people like Holden Karnofsky and Paul Christiano are accountable to, exactly? This “detailed investigation” you speak of, and this notion of a “blameless culture”, makes a lot of sense when you are the head of an organization and you are conducting an investigation as to the systematic mistakes made by people who work for you, and who you are responsible for. I don’t think this situation is similar enough that you can use these intuitions blandly without thinking through the actual causal factors involved in this situation.
Note that I don’t necessarily endorse the grandparent comment claims. This is a complex situation and I’d spend more time analyzing it and what occurred.
Enforcing social norms to prevent scapegoating also destroys information that is valuable for accurate credit assignment and causally modelling reality.
I read the Ben Hoffman post you linked. I’m not finding it very clear, but the gist seems to be something like: Statements about others often import some sort of good/bad moral valence; trying to avoid this valence can decrease the accuracy of your statements.
If OP was optimizing purely for descriptive accuracy, disregarding everyone’s feelings, that would be one thing. But the discussion of “repercussions” before there’s been an investigation goes into pure-scapegoating territory if you ask me.
I do not read any mention of a ‘moral failing’ in that comment.
If OP wants to clarify that he doesn’t think there was a moral failing, I expect that to be helpful for a post-mortem. I expect some other people besides me also saw that subtext, even if it’s not explicit.
You can be empathetic to people having flawed decision making and care about them, while also wanting to keep them away from certain decision-making positions.
“Keep people away” sounds like moral talk to me. If you think someone’s decisionmaking is actively bad, i.e. you’d better off reversing any advice from them, then maybe you should keep them around so you can do that! But more realistically, someone who’s fucked up in a big way will probably have learned from that, and functional cultures don’t throw away hard-won knowledge.
Imagine a world where AI is just an inherently treacherous domain, and we throw out the leadership whenever they make a mistake. So we get a continuous churn of inexperienced leaders in an inherently treacherous domain—doesn’t sound like a recipe for success!
Oh, interesting. Who exactly do you think influential people like Holden Karnofsky and Paul Christiano are accountable to, exactly? This “detailed investigation” you speak of, and this notion of a “blameless culture”, makes a lot of sense when you are the head of an organization and you are conducting an investigation as to the systematic mistakes made by people who work for you, and who you are responsible for. I don’t think this situation is similar enough that you can use these intuitions blandly without thinking through the actual causal factors involved in this situation.
I agree that changes things. I’d be much more sympathetic to the OP if they were demanding an investigation or an apology.
But the discussion of “repercussions” before there’s been an investigation goes into pure-scapegoating territory if you ask me.
Just to be clear, OP themselves seem to think that what they are saying will have little effect on the status quo. They literally called it “Very Spicy Take”. Their intention was to allow them to express how they felt about the situation. I’m not sure why you find this threatening, because again, the people they think ideally wouldn’t continue to have influence over AI safety related decisions are incredibly influential and will very likely continue to have the influence they currently possess. Almost everyone else in this thread implicitly models this fact as they are discussing things related to the OP comment.
There is not going to be any scapegoating that will occur. I imagine that everything I say is something I would say in person to the people involved, or to third parties, and not expect any sort of coordinated action to reduce their influence—they are that irreplaceable to the community and to the ecosystem.
So basically, I think it is a bad idea and you think we can’t do it anyway. In that case let’s stop calling for it, and call for something more compassionate and realistic like a public apology.
I’ll bet an apology would be a more effective way to pressure OpenAI to clean up its act anyways. Which is a better headline—“OpenAI cofounder apologizes for their role in creating OpenAI”, or some sort of internal EA movement drama? If we can generate a steady stream of negative headlines about OpenAI, there’s a chance that Sam is declared too much of a PR and regulatory liability. I don’t think it’s a particularly good plan, but I haven’t heard a better one.
Can you not be close friends with someone while also expecting them to be bad at self-control when it comes to alcohol? Or perhaps they are great at technical stuff like research but pretty bad at negotiation, especially when dealing with experienced adverserial situations such as when talking to VCs?
If you think someone’s decisionmaking is actively bad, i.e. you’d better off reversing any advice from them, then maybe you should keep them around so you can do that!
It is not that people people’s decision-making skill is optimized such that you can consistently reverse someone’s opinion to get something that accurately tracks reality. If that was the case then they are implicitly tracking reality very well already. Reversed stupidity is not intelligence.
But more realistically, someone who’s fucked up in a big way will probably have learned from that, and functional cultures don’t throw away hard-won knowledge.
Again you seem to not be trying to track the context of our discussion here. This advice again is usually said when it comes to junior people embedded in an institution, because the ability to blame someone and / or hold them responsible is a power that senior / executive people hold. This attitude you describe makes a lot of sense when it comes to people who are learning things, yes. I don’t know if you can plainly bring it into this domain, and you even acknowledge this in the next few lines.
Imagine a world where AI is just an inherently treacherous domain, and we throw out the leadership whenever they make a mistake.
I think it is incredibly unlikely that the rationalist community has an ability to ‘throw out’ the ‘leadership’ involved here. I find this notion incredibly silly, given the amount of influence OpenPhil has over the alignment community, especially through their funding (including the pipeline, such as MATS).
It is not that people people’s decision-making skill is optimized such that you can consistently reverse someone’s opinion to get something that accurately tracks reality. If that was the case then they are implicitly tracking reality very well already. Reversed stupidity is not intelligence.
Sure, I think this helps tease out the moral valence point I was trying to make. “Don’t allow them near” implies their advice is actively harmful, which in turn suggests that reversing it could be a good idea. But as you say, this is implausible. A more plausible statement is that their advice is basically noise—you shouldn’t pay too much attention to it. I expect OP would’ve said something like that if they were focused on descriptive accuracy rather than scapegoating.
Another way to illuminate the moral dimension of this conversation: If we’re talking about poor decision-making, perhaps MIRI and FHI should also be discussed? They did a lot to create interest in AGI, and MIRI failed to create good alignment researchers by its own lights. Now after doing advocacy off and on for years, and creating this situation, they’re pivoting to 100% advocacy.
Could MIRI be made up of good people who are “great at technical stuff”, yet apt to shoot themselves in the foot when it comes to communicating with the public? It’s hard for me to imagine an upvoted post on this forum saying “MIRI shouldn’t be allowed anywhere near AI safety communications”.
Agreed that it reflects on badly on the people involved, although less on Paul since he was only a “technical advisor” and arguably less responsible for thinking through / due diligence on the social aspects. It’s frustrating to see the EA community (on EAF and Twitter at least) and those directly involved all ignoring this.
(“shouldn’t be allowed anywhere near AI Safety decision making in the future” may be going too far though.)
In 2019, OpenAI restructured to ensure that the company could raise capital in pursuit of this mission, while preserving the nonprofit’s mission, governance, and oversight. The majority of the board is independent, and the independent directors do not hold equity in OpenAI.
A serious effective altruism movement with clean house. Everyone who pushed the ‘work with AI capabilities company’ line should retire or be forced to retire. There is no need to blame anyone for mistakes, the decision makers had reasons. But they chose wrong and should not continue to be leaders.
Do you think that whenever anyone makes a decision that ends up being bad ex-post they should be forced to retire?
Doesn’t this strongly disincentivize making positive EV bets which are likely to fail?
Edit: I interpreted this comment as a generic claim about how the EA community should relate to things which went poorly ex-post, I now think this comment was intended to be less generic.
Not OP, but I take the claim to be “endorsing getting into bed with companies on-track to make billions of dollars profiting from risking the extinction of humanity in order to nudge them a bit, is in retrospect an obviously doomed strategy, and yet many self-identified effective altruists trusted their leadership to have secret good reasons for doing so and followed them in supporting the companies (e.g. working there for years including in capabilities roles and also helping advertise the company jobs). now that a new consensus is forming that it indeed was obviously a bad strategy, it is also time to have evaluated the leadership’s decision as bad at the time of making the decision and impose costs on them accordingly, including loss of respect and power”.
So no, not disincentivizing making positive EV bets, but updating about the quality of decision-making that has happened in the past.
So no, not disincentivizing making positive EV bets, but updating about the quality of decision-making that has happened in the past.
I think there’s a decent case that such updating will indeed disincentivize making positive EV bets (in some cases, at least).
In principle we’d want to update on the quality of all past decision-making. That would include both [made an explicit bet by taking some action] and [made an implicit bet through inaction]. With such an approach, decision-makers could be punished/rewarded with the symmetry required to avoid undesirable incentives (mostly). Even here it’s hard, since there’d always need to be a [gain more influence] mechanism to balance the possibility of losing your influence.
In practice, most of the implicit bets made through inaction go unnoticed—even where they’re high-stakes (arguably especially when they’re high-stakes: most counterfactual value lies in the actions that won’t get done by someone else; you won’t be punished for being late to the party when the party never happens). That leaves the explicit bets. To look like a good decision-maker the incentive is then to make low-variance explicit positive EV bets, and rely on the fact that most of the high-variance, high-EV opportunities you’re not taking will go unnoticed.
From my by-no-means-fully-informed perspective, the failure mode at OpenPhil in recent years seems not to be [too many explicit bets that don’t turn out well], but rather [too many failures to make unclear bets, so that most EV is left on the table]. I don’t see support for hits-based research. I don’t see serious attempts to shape the incentive landscape to encourage sufficient exploration. It’s not clear that things are structurally set up so anyone at OP has time to do such things well (my impression is that they don’t have time, and that thinking about such things is no-one’s job (?? am I wrong ??)).
It’s not obvious to me whether the OpenAI grant was a bad idea ex-ante. (though probably not something I’d have done)
However, I think that another incentive towards middle-of-the-road, risk-averse grant-making is the last thing OP needs.
That said, I suppose much of the downside might be mitigated by making a distinction between [you wasted a lot of money in ways you can’t legibly justify] and [you funded a process with (clear, ex-ante) high negative impact]. If anyone’s proposing punishing the latter, I’d want it made very clear that this doesn’t imply punishing the former. I expect that the best policies do involve wasting a bunch of money in ways that can’t be legibly justified on the individual-funding-decision level.
I interpreted the comment as being more general than this. (As in, if someone does something that works out very badly, they should be forced to resign.)
Upon rereading the comment, it reads as less generic than my original interpretation. I’m not sure if I just misread the comment or if it was edited. (Would be nice to see the original version if actually edited.)
(Edit: Also, you shouldn’t interpret my comment as an endorsement or agreement with the the rest of the content of Ben’s comment.)
endorsing getting into bed with companies on-track to make billions of dollars profiting from risking the extinction of humanity in order to nudge them a bit
By “positive EV bets” I meant positive EV with respect to shared values, not with respect to personal gain.
Edit: Maybe your view is that leaders should take this bets anyway even though they know they are likely to result in a forced retirement. (E.g. ignoring the disincentive.) I was actually thinking of the disincentive effect as: you are actually a good leader, so you remaining in power would be good, therefore you should avoid actions that result in you losing power for unjustified reasons. Therefore you should avoid making positive EV bets (as making these bets is now overall negative EV as it will result in a forced leadership transition which is bad). More minimally, you strongly select for leaders which don’t make such bets.
“ETA” commonly is short for “estimated time of arrival”. I understand you are using it to mean “edited” but I don’t quite know what it is short for, and also it seems like using this is just confusing for people in general.
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.
OK
This was the default outcome.
OK
Without repercussions for terrible decisions, decision makers have no skin in the game.
It’s an article of faith for some people that that makes a difference, but I’ve never seen why.
I mean, many of the “decision makers” on these particular issues already believe that their actual, personal, biological skins are at stake, along with those of everybody else they know. And yet...
Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn’t be allowed anywhere near AI Safety decision making in the future.
Thinking “seven years from now, a significant number of independent players in a relatively large and diverse field might somehow band together to exclude me” seems very distant from the way I’ve seen actual humans make decisions.
Perhaps, but “seven years from now my reputation in my industry will drop markedly on the basis of this decision” seems to me like a normal human thing that happens all the time.
Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.
OpenAI wasn’t a private company (ie for-profit) at the time of the OP grant though.
I don’t think this is true. Nonprofits can aim to amass large amounts of wealth, they just aren’t allowed to distribute that wealth to its shareholders. A good chunk of obviously very wealthy and powerful companies are nonprofits.
I’m not sure if those are precisely the terms of the charter, but that’s besides the point. It is still “private” in the sense that there is a small group of private citizens who own the thing and decide what it should do with no political accountability to anyone else. As for the “non-profit” part, we’ve seen what happens to that as soon as it’s in the way.
I’m not trying to say “it’s bad to give large sums of money to any group because humans have a tendency to to seek power.”
I’m saying “you should be exceptionally cautious about giving large sums of money to a group of humans with the stated goal of constructing an AGI.”
You need to weight any reassurances they give you against two observations:
The commonly observed pattern of individual humans or organisations seeking power (and/or wealth) at the expense of the wider community.
The strong likelihood that there will be an opportunity for organisations pushing ahead with AI research to obtain incredible wealth or power.
So, it isn’t “humans seek power therefore giving any group of humans money is bad”. It’s “humans seek power” and, in the specific case of AI companies, there may be incredibly strong rewards for groups that behave in a self-interested way.
The general idea I’m working off is that you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).
That seems like a valuable argument. It might be worth updating the wording under premise 2 to clarifying this? To me it reads as saying that the configuration, rather than the aim, of OpenAI was the major red flag.
I would be happy to defend roughly the position above (I don’t agree with all of it, but agree with roughly something like “the strategy of trying to play the inside game at labs was really bad, failed in predictable ways, and has deeply eroded trust in community leadership due to the adversarial dynamics present in such a strategy and many people involved should be let go”).
I do think most people who disagree with me here are under substantial confidentiality obligations and de-facto non-disparagement obligations (such as really not wanting to imply anything bad about Anthropic or wanting to maintain a cultivated image for policy purposes) so that it will be hard to find a good public debate partner, but it isn’t impossible.
I largely disagree (even now I think having tried to play the inside game at labs looks pretty good, although I have sometimes disagreed with particular decisions in that direction because of opportunity costs). I’d be happy to debate if you’d find it productive (although I’m not sure whether I’m disagreeable enough to be a good choice).
For me, the key question in situations when leaders made a decision with really bad consequences is, “How did they engage with criticism and opposing views?”
If they did well on this front, then I don’t think it’s at all mandatory to push for leadership changes (though certainly, the worse someones track record gets, the more that speaks against them).
By contrast, if leaders tried to make the opposition look stupid or if they otherwise used their influence to dampen the reach of opposing views, then being wrong later is unacceptable.
Basically, I want to allow for a situation where someone was like, “this is a tough call and I can see reasons why others wouldn’t agree with me, but I think we should do this,” and then ends up being wrong, but I don’t want to allow situations where someone is wrong after having expressed something more like, “listen to me, I know better than you, go away.”
In the first situation, it might still be warranted to push for leadership changes (esp. if there’s actually a better alternative), but I don’t see it as mandatory.
The author of the original short form says we need to hold leaders accountable for bad decisions because otherwise the incentives are wrong. I agree with that, but I think it’s being too crude to tie incentives to whether a decision looks right or wrong in hindsight. We can do better and evaluate how someone went about making a decision and how they handled opposing views. (Basically, if opposing views aren’t loud enough that you’d have to actively squish them using your influence illegitimately, then the mistake isn’t just yours as the leader; it’s also that the situation wasn’t significantly obvious to others around you.) I expect that everyone who has strong opinions on things and is ambitious and agenty in a leadership position is going to make some costly mistakes. The incentives shouldn’t be such that leaders shy away from consequential interventions.
I have indeed been publicly advocating against the inside game strategy at labs for many years (going all the way back to 2018), predicting it would fail due to incentive issues and have large negative externalities due to conflict of interest issues. I could dig up my comments, but I am confident almost anyone who I’ve interfaced with at the labs, or who I’ve talked to about any adjacent topic in leadership would be happy to confirm.
Are you just referring to the profit incentive conflicting with the need for safety, or something else?
I’m struggling to see how we get aligned AI without “inside game at labs” in some way, shape, or form.
My sense is that evaporative cooling is the biggest thing which went wrong at OpenAI. So I feel OK about e.g. Anthropic if it’s not showing signs of evaporative cooling.
I’d like to know what Holden did while serving on the board, and what OpenAI would have done if he hadn’t joined. That’s crucial for assessing the grant’s impact.
But since board meetings are private, this will remain unknown for a long time. Unfortunately, the best we can do is speculate.
On the OpenPhil / OpenAI Partnership
Epistemic Note:
The implications of this argument being true are quite substantial, and I do not have any knowledge of the internal workings of Open Phil.
(Both title and this note have been edited, cheers to Ben Pace for very constructive feedback.)
Premise 1:
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.
Premise 2:
This was the default outcome.
Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.
Edit: To clarify, you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).
Premise 3:
Without repercussions for terrible decisions, decision makers have no skin in the game.
Conclusion:
Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn’t be allowed anywhere near AI Safety decision making in the future.
To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.
This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.
To quote OpenPhil:
”OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela.”
From that page:
So the case for the grant wasn’t “we think it’s good to make OAI go faster/better”.
Why do you think the grant was bad? E.g. I don’t think “OAI is bad” would suffice to establish that the grant was bad.
So the case for the grant wasn’t “we think it’s good to make OAI go faster/better”.
I agree. My intended meaning is not that the grant is bad because its purpose was to accelerate capabilities. I apologize that the original post was ambiguous
Rather, the grant was bad for numerous reasons, including but not limited to:
It appears to have had an underwhelming governance impact (as demonstrated by the board being unable to remove Sam).
It enabled OpenAI to “safety-wash” their product (although how important this has been is unclear to me.)
From what I’ve seen at conferences and job boards, it seems reasonable to assert that the relationship between Open Phil and OpenAI has lead people to work at OpenAI.
Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you’re only concerned with human misuse and not misalignment.
Finally, it’s giving money directly to an organisation with the stated goal of producing an AGI. There is substantial negative -EV if the grant sped up timelines.
This last claim seems very important. I have not been able to find data that would let me confidently estimate OpenAI’s value at the time the grant was given. However, wikipedia mentions that “In 2017 OpenAI spent $7.9 million, or a quarter of its functional expenses, on cloud computing alone.” This certainly makes it seem that the grant provided OpenAI with a significant amount of capital, enough to have increased its research output.
Keep in mind, the grant needs to have generated 30 million in EV just to break even. I’m now going to suggest some other uses for the money, but keep in mind these are just rough estimates and I haven’t adjusted for inflation. I’m not claiming these are the best uses of 30 million dollars.
The money could have funded an organisation the size of MIRI for roughly a decade (basing my estimate on MIRI’s 2017 fundraiser, using 2020 numbers gives an estimate of ~4 years).
Imagine the shift in public awareness if there had been an AI safety Superbowl ad for 3-5 years.
Or it could have saved the lives of ~1300 children.
This analysis is obviously much worse if in fact the grant was negative EV.
In your initial post, it sounded like you were trying to say:
I think that this argument is in principle reasonable. But to establish it, you have to demonstrate that the grant was extremely obviously ex ante bad. I don’t think your arguments here come close to persuading me of this.
For example, re governance impact, when the board fired sama, markets thought it was plausible he would stay gone. If that had happened, I don’t think you’d assess the governance impact as “underwhelming”. So I think that (if you’re in favor of sama being fired in that situation, which you probably are) you shouldn’t consider the governance impact of this grant to be obviously ex ante ineffective.
I think that arguing about the impact of grants requires much more thoroughness than you’re using here. I think your post has a bad “ratio of heat to light”: you’re making a provocative claim but not really spelling out why you believe the premises.
“This grant was obviously ex ante bad. In fact, it’s so obvious that it was ex ante bad that we should strongly update against everyone involved in making it.”
This is an accurate summary.
“arguing about the impact of grants requires much more thoroughness than you’re using here”
We might not agree on the level of effort required for a quick take. I do not currently have the time available to expand this into a full write up on the EA forum but am still interested in discussing this with the community.
“you’re making a provocative claim but not really spelling out why you believe the premises.”
I think this is a fair criticism and something I hope I can improve on.
I feel frustrated that your initial comment (which is now the top reply) implies I either hadn’t read the 1700 word grant justification that is at the core of my argument, or was intentionally misrepresenting it to make my point. This seems to be an extremely uncharitable interpretation of my initial post. (Edit: I am retracting this statement and now understand Buck’s comment was meaningful context. Apologies to Buck and see commentary by Ryan Greenblat below)Your reply has been quite meta, which makes it difficult to convince you on specific points.
Your argument on betting markets has updated me slightly towards your position, but I am not particularly convinced. My understanding is that Open Phil and OpenAI had a close relationship, and hence Open Phil had substantially more information to work with than the average manifold punter.
I think this comment is extremely important for bystanders to understand the context of the grant and it isn’t mentioned in your original short form post.
So, regardless of whether you understand the situation, it’s important that other people understand the intention of the grant (and this intention isn’t obvious from your original comment). Thus, this comment from Buck is valuable.
I also think that the main interpretation from bystanders of your original shortform would be something like:
OpenPhil made a grant to OpenAI
OpenAI is bad (and this was ex-ante obvious)
Therefore this grant is bad and the people who made this grant are bad.
Fair enough if this wasn’t your intention, but I think it will be how bystanders interact with this.
Thank you, this explains my error. I’ve retracted that part of my response.
Hmmm, can you point to where you think the grant shows this? I think the following paragraph from the grant seems to indicate otherwise:
“In particular, it emphasized the importance of distributing AI broadly;1 our current view is that this may turn out to be a promising strategy for reducing potential risks”
Yes, I’m interpreting the phrase “may turn out” to be treating the idea with more seriousness than it deserves.
Rereading the paragraph, it seems reasonable to interpret it as politely downplaying it, in which case my statement about Open Phil taking the idea seriously is incorrect.
“we would also expect general support for OpenAI to be likely beneficial on its own” seems to imply that they did think it was good to make OAI go faster/better, unless that statement was a lie to avoid badmouthing a grantee.
I just realized that Paul Christiano and Dario Amodei both probably have signed non-disclosure + non-disparagement contracts since they both left OpenAI.
That impacts how I’d interpret Paul’s (and Dario’s) claims and opinions (or the lack thereof), that relates to OpenAI or alignment proposals entangled with what OpenAI is doing. If Paul has systematically silenced himself, and a large amount of OpenPhil and SFF money has been mis-allocated because of systematically skewed beliefs that these organizations have had due to Paul’s opinions or lack thereof, well. I don’t think this is the case though—I expect Paul, Dario, and Holden all seem to have converged on similar beliefs (whether they track reality or not) and have taken actions consistent with those beliefs.
Can anybody confirm whether Paul is likely systematically silenced re OpenAI?
I mean, if Paul doesn’t confirm that he is not under any non-disparagement obligations to OpenAI like Cullen O’ Keefe did, we have our answer.
In fact, given this asymmetry of information situation, it makes sense to assume that Paul is under such an obligation until he claims otherwise.
I don’t know the answer, but it would be fun to have a twitter comment with a zillion likes asking Sam Altman this question. Maybe someone should make one?
https://x.com/panickssery/status/1792586407623393435
Mhhh, that seems very bad for someone in an AISI in general. I’d guess Jade Leung might sadly be under the same obligations…
That seems like a huge deal to me with disastrous consequences, thanks a lot for flagging.
I mostly agree with premises 1, 2, and 3, but I don’t see how the conclusion follows.
It is possible for things to be hard to influence and yet still worth it to try to influence them.
(Note that the $30 million grant was not an endorsement and was instead a partnership (e.g. it came with a board seat), see Buck’s comment.)
(Ex-post, I think this endeavour was probably net negative, though I’m pretty unsure and ex-ante I currently think it seems great.)
I think there’s a solid case for anyone who supported funding OpenAI being considered at best well intentioned but very naive. I think the idea that we should align and develop superintelligence but, like, good, has always been a blind spot in this community—an obviously flawed but attractive goal, because it dodged the painful choice between extinction risk and abandoning hopes of personally witnessing the singularity or at least a post scarcity world. This is also a case where people’s politics probably affected them, because plenty of others would be instinctively distrustful of corporation driven solutions to anything—it’s something of a Godzilla Strategy after all, aligning corporations is also an unsolved problem—but those with an above average level of trust in free markets weren’t so averse.
Such people don’t necessarily have conflicts of interest (though some may, and that’s another story) but they at least need to drop the fantasy land stuff and accept harsh reality on this before being of any use.
It’s also notable that the topic of OpenAI nondisparagement agreements was brought to Holden Karnofsky’s attention in 2022, and he replied with “I don’t know whether OpenAI uses nondisparagement agreements; I haven’t signed one.” (He could have asked his contacts inside OAI about it, or asked the EA board member to investigate. Or even set himself up earlier as someone OpenAI employees could whistleblow to on such issues.)
If the point was to buy a ticket to play the inside game, then it was played terribly and negative credit should be assigned on that basis, and for misleading people about how prosocial OpenAI was likely to be (due to having an EA board member).
This can also be glomarizing. “I haven’t signed one.” is a fact, intended for the reader to use it as anecdotal evidence. “I don’t know whether OpenAI uses nondisparagement agreements” can mean that he doesn’t know for sure, and will not try to find out.
Obviously, the context of the conversation and the events surrounding Holden stating this matters for interpreting this statement, but I’m not interested in looking further into this, so I’m just going to highlight the glomarization possibility.
On a meta note, IF proposition 2 is true, THEN the best way to tell this would be if people had been saying so AT THE TIME. If instead, actually everyone at the time disagreed with proposition 2, then it’s not clear that there’s someone “we” know to hand over decision making power to instead. Personally, I was pretty new to the area, and as a Yudkowskyite I’d probably have reflexively decried giving money to any sort of non-X-risk-pilled non-alignment-differential capabilities research. But more to the point, as a newcomer, I wouldn’t have tried hard to have independent opinions about stuff that wasn’t in my technical focus area, or to express those opinions with much conviction, maybe because it seemed like Many Highly Respected Community Members With Substantially Greater Decision Making Experience would know far better, and would not have the time or the non-status to let me in on the secret subtle reasons for doing counterintuitive things. Now I think everyone’s dumb and everyone should say their opinions a lot so that later they can say that they’ve been saying this all along. I’ve become extremely disagreeable in the last few years, I’m still not disagreeable enough, and approximately no one I know personally is disagreeable enough.
Why focus on the $30 million grant?
What about large numbers of people working at OpenAI directly on capabilities for many years? (Which is surely worth far more than $30 million.)
Separately, this grant seems to have been done to influence the goverance at OpenAI, not make OpenAI go faster. (Directly working on capabilities seems modestly more accelerating and risky than granting money in exchange for a partnership.)
(ETA: TBC, there is a relationship between the grant and people working at OpenAI on capabilities: the grant was associated with a general vague endorsement of trying to play inside game at OpenAI.)
FYI I wish to register my weak disapproval of this opening. A la Scott Alexander’s “Against Bravery Debates”, I think it is actively distracting and a little mind-killing to open by making a claim about status and popularity of a position even if it’s accurate.
I think in this case it would be reasonable to say something like “the implications of this argument being true involve substantial reallocation of status and power, so please be conscious of that and let’s all try to assess the evidence accurately and avoid overheating”. This is different from something like “I know lots of people will disagree with me on this but I’m going to say it”.
I’m not saying this was an easy post to write, but I think the standard to aim for is not having openings like this.
Honestly, maybe further controversial opinion, but this [30 million for a board seat at what would become the lead co. for AGI, with a novel structure for nonprofit control that could work?] still doesn’t feel like necessarily as bad a decision now as others are making it out to be?
The thing that killed all value of this deal was losing the board seat(s?), and I at least haven’t seen much discussion of this as a mistake.
I’m just surprised so little prioritization was given to keeping this board seat, it was probably one of the most important assets of the “AI safety community and allies”, and there didn’t seem to be any real fight with Sam Altman’s camp for it.
So Holden has the board seat, but has to leave because of COI, and endorses Toner to replace, ”… Karnofsky cited a potential conflict of interest because his wife, Daniela Amodei, a former OpenAI employee, helped to launch the AI company Anthropic.
Given that Toner previously worked as a senior research analyst at Open Philanthropy, Loeber speculates that Karnofsky might’ve endorsed her as his replacement.”
Like, maybe it was doomed if they only had one board seat (Open Phil) vs whoever else is on the board, and there’s a lot of shuffling about as Musk and Hoffman also leave for COIs, but start of 2023 it seems like there is an “AI Safety” half to the board, and a year later there are now none. Maybe it was further doomed if Sam Altman has the, take the whole company elsewhere, card, but idk… was this really inevitable? Was there really not a better way to, idk, maintain some degree of control and supervision of this vital board over the years since OP gave the grant?
COI == conflict of interest.
I like a lot of this post, but the sentence above seems very out of touch to me. Who are these third parties who are completely objective? Why is objective the adjective here, instead of “good judgement” or “predicted this problem at the time”?
That’s a good point. You have pushed me towards thinking that this is an unreasonable statement and “predicted this problem at the time” is better.
I downvoted this comment because it felt uncomfortably scapegoat-y to me. If you think the OpenAI grant was a big mistake, it’s important to have a detailed investigation of what went wrong, and that sort of detailed investigation is most likely to succeed if you have cooperation from people who are involved. I’ve been reading a fair amount about what it takes to instill a culture of safety in an organization, and nothing I’ve seen suggests that scapegoating is a good approach.
https://sre.google/sre-book/postmortem-culture/
If you start with the assumption that there was a moral failing on the part of the grantmakers, and you are wrong, there’s a good chance you’ll never learn that.
Enforcing social norms to prevent scapegoating also destroys information that is valuable for accurate credit assignment and causally modelling reality.
I think you are misinterpreting the grandparent comment. I do not read any mention of a ‘moral failing’ in that comment. You seem worried because of the commenter’s clear description of what they think would be a sensible step for us to take given what they believe are egregious flaws in the decision-making processes of the people involved. I don’t think there’s anything wrong with such claims.
Again: You can care about people while also seeing their flaws and noticing how they are hurting you and others you care about. You can be empathetic to people having flawed decision making and care about them, while also wanting to keep them away from certain decision-making positions.
Oh, interesting. Who exactly do you think influential people like Holden Karnofsky and Paul Christiano are accountable to, exactly? This “detailed investigation” you speak of, and this notion of a “blameless culture”, makes a lot of sense when you are the head of an organization and you are conducting an investigation as to the systematic mistakes made by people who work for you, and who you are responsible for. I don’t think this situation is similar enough that you can use these intuitions blandly without thinking through the actual causal factors involved in this situation.
Note that I don’t necessarily endorse the grandparent comment claims. This is a complex situation and I’d spend more time analyzing it and what occurred.
I read the Ben Hoffman post you linked. I’m not finding it very clear, but the gist seems to be something like: Statements about others often import some sort of good/bad moral valence; trying to avoid this valence can decrease the accuracy of your statements.
If OP was optimizing purely for descriptive accuracy, disregarding everyone’s feelings, that would be one thing. But the discussion of “repercussions” before there’s been an investigation goes into pure-scapegoating territory if you ask me.
If OP wants to clarify that he doesn’t think there was a moral failing, I expect that to be helpful for a post-mortem. I expect some other people besides me also saw that subtext, even if it’s not explicit.
“Keep people away” sounds like moral talk to me. If you think someone’s decisionmaking is actively bad, i.e. you’d better off reversing any advice from them, then maybe you should keep them around so you can do that! But more realistically, someone who’s fucked up in a big way will probably have learned from that, and functional cultures don’t throw away hard-won knowledge.
Imagine a world where AI is just an inherently treacherous domain, and we throw out the leadership whenever they make a mistake. So we get a continuous churn of inexperienced leaders in an inherently treacherous domain—doesn’t sound like a recipe for success!
I agree that changes things. I’d be much more sympathetic to the OP if they were demanding an investigation or an apology.
Just to be clear, OP themselves seem to think that what they are saying will have little effect on the status quo. They literally called it “Very Spicy Take”. Their intention was to allow them to express how they felt about the situation. I’m not sure why you find this threatening, because again, the people they think ideally wouldn’t continue to have influence over AI safety related decisions are incredibly influential and will very likely continue to have the influence they currently possess. Almost everyone else in this thread implicitly models this fact as they are discussing things related to the OP comment.
There is not going to be any scapegoating that will occur. I imagine that everything I say is something I would say in person to the people involved, or to third parties, and not expect any sort of coordinated action to reduce their influence—they are that irreplaceable to the community and to the ecosystem.
So basically, I think it is a bad idea and you think we can’t do it anyway. In that case let’s stop calling for it, and call for something more compassionate and realistic like a public apology.
I’ll bet an apology would be a more effective way to pressure OpenAI to clean up its act anyways. Which is a better headline—“OpenAI cofounder apologizes for their role in creating OpenAI”, or some sort of internal EA movement drama? If we can generate a steady stream of negative headlines about OpenAI, there’s a chance that Sam is declared too much of a PR and regulatory liability. I don’t think it’s a particularly good plan, but I haven’t heard a better one.
Can you not be close friends with someone while also expecting them to be bad at self-control when it comes to alcohol? Or perhaps they are great at technical stuff like research but pretty bad at negotiation, especially when dealing with experienced adverserial situations such as when talking to VCs?
It is not that people people’s decision-making skill is optimized such that you can consistently reverse someone’s opinion to get something that accurately tracks reality. If that was the case then they are implicitly tracking reality very well already. Reversed stupidity is not intelligence.
Again you seem to not be trying to track the context of our discussion here. This advice again is usually said when it comes to junior people embedded in an institution, because the ability to blame someone and / or hold them responsible is a power that senior / executive people hold. This attitude you describe makes a lot of sense when it comes to people who are learning things, yes. I don’t know if you can plainly bring it into this domain, and you even acknowledge this in the next few lines.
I think it is incredibly unlikely that the rationalist community has an ability to ‘throw out’ the ‘leadership’ involved here. I find this notion incredibly silly, given the amount of influence OpenPhil has over the alignment community, especially through their funding (including the pipeline, such as MATS).
Sure, I think this helps tease out the moral valence point I was trying to make. “Don’t allow them near” implies their advice is actively harmful, which in turn suggests that reversing it could be a good idea. But as you say, this is implausible. A more plausible statement is that their advice is basically noise—you shouldn’t pay too much attention to it. I expect OP would’ve said something like that if they were focused on descriptive accuracy rather than scapegoating.
Another way to illuminate the moral dimension of this conversation: If we’re talking about poor decision-making, perhaps MIRI and FHI should also be discussed? They did a lot to create interest in AGI, and MIRI failed to create good alignment researchers by its own lights. Now after doing advocacy off and on for years, and creating this situation, they’re pivoting to 100% advocacy.
Could MIRI be made up of good people who are “great at technical stuff”, yet apt to shoot themselves in the foot when it comes to communicating with the public? It’s hard for me to imagine an upvoted post on this forum saying “MIRI shouldn’t be allowed anywhere near AI safety communications”.
Agreed that it reflects on badly on the people involved, although less on Paul since he was only a “technical advisor” and arguably less responsible for thinking through / due diligence on the social aspects. It’s frustrating to see the EA community (on EAF and Twitter at least) and those directly involved all ignoring this.
(“shouldn’t be allowed anywhere near AI Safety decision making in the future” may be going too far though.)
Did OpenAI have the for-profit element at that time?
No. E.g. see here
A serious effective altruism movement with clean house. Everyone who pushed the ‘work with AI capabilities company’ line should retire or be forced to retire. There is no need to blame anyone for mistakes, the decision makers had reasons. But they chose wrong and should not continue to be leaders.
Do you think that whenever anyone makes a decision that ends up being bad ex-post they should be forced to retire?
Doesn’t this strongly disincentivize making positive EV bets which are likely to fail?
Edit: I interpreted this comment as a generic claim about how the EA community should relate to things which went poorly ex-post, I now think this comment was intended to be less generic.
Not OP, but I take the claim to be “endorsing getting into bed with companies on-track to make billions of dollars profiting from risking the extinction of humanity in order to nudge them a bit, is in retrospect an obviously doomed strategy, and yet many self-identified effective altruists trusted their leadership to have secret good reasons for doing so and followed them in supporting the companies (e.g. working there for years including in capabilities roles and also helping advertise the company jobs). now that a new consensus is forming that it indeed was obviously a bad strategy, it is also time to have evaluated the leadership’s decision as bad at the time of making the decision and impose costs on them accordingly, including loss of respect and power”.
So no, not disincentivizing making positive EV bets, but updating about the quality of decision-making that has happened in the past.
I think there’s a decent case that such updating will indeed disincentivize making positive EV bets (in some cases, at least).
In principle we’d want to update on the quality of all past decision-making. That would include both [made an explicit bet by taking some action] and [made an implicit bet through inaction]. With such an approach, decision-makers could be punished/rewarded with the symmetry required to avoid undesirable incentives (mostly).
Even here it’s hard, since there’d always need to be a [gain more influence] mechanism to balance the possibility of losing your influence.
In practice, most of the implicit bets made through inaction go unnoticed—even where they’re high-stakes (arguably especially when they’re high-stakes: most counterfactual value lies in the actions that won’t get done by someone else; you won’t be punished for being late to the party when the party never happens).
That leaves the explicit bets. To look like a good decision-maker the incentive is then to make low-variance explicit positive EV bets, and rely on the fact that most of the high-variance, high-EV opportunities you’re not taking will go unnoticed.
From my by-no-means-fully-informed perspective, the failure mode at OpenPhil in recent years seems not to be [too many explicit bets that don’t turn out well], but rather [too many failures to make unclear bets, so that most EV is left on the table]. I don’t see support for hits-based research. I don’t see serious attempts to shape the incentive landscape to encourage sufficient exploration. It’s not clear that things are structurally set up so anyone at OP has time to do such things well (my impression is that they don’t have time, and that thinking about such things is no-one’s job (?? am I wrong ??)).
It’s not obvious to me whether the OpenAI grant was a bad idea ex-ante. (though probably not something I’d have done)
However, I think that another incentive towards middle-of-the-road, risk-averse grant-making is the last thing OP needs.
That said, I suppose much of the downside might be mitigated by making a distinction between [you wasted a lot of money in ways you can’t legibly justify] and [you funded a process with (clear, ex-ante) high negative impact].
If anyone’s proposing punishing the latter, I’d want it made very clear that this doesn’t imply punishing the former. I expect that the best policies do involve wasting a bunch of money in ways that can’t be legibly justified on the individual-funding-decision level.
I interpreted the comment as being more general than this. (As in, if someone does something that works out very badly, they should be forced to resign.)
Upon rereading the comment, it reads as less generic than my original interpretation. I’m not sure if I just misread the comment or if it was edited. (Would be nice to see the original version if actually edited.)
(Edit: Also, you shouldn’t interpret my comment as an endorsement or agreement with the the rest of the content of Ben’s comment.)
Wasn’t edited, based on my memory.
Wasn’t OpenAI a nonprofit at the time?
Leadership is supposed to be about service not personal gain.
I don’t see how this is relevant to my comment.
By “positive EV bets” I meant positive EV with respect to shared values, not with respect to personal gain.
Edit: Maybe your view is that leaders should take this bets anyway even though they know they are likely to result in a forced retirement. (E.g. ignoring the disincentive.) I was actually thinking of the disincentive effect as: you are actually a good leader, so you remaining in power would be good, therefore you should avoid actions that result in you losing power for unjustified reasons. Therefore you should avoid making positive EV bets (as making these bets is now overall negative EV as it will result in a forced leadership transition which is bad). More minimally, you strongly select for leaders which don’t make such bets.
“ETA” commonly is short for “estimated time of arrival”. I understand you are using it to mean “edited” but I don’t quite know what it is short for, and also it seems like using this is just confusing for people in general.
ETA = edit time addition
I should probably not use this term, I think I picked up this habit from some other people on LW.
Oh, weird. I always thought “ETA” means “Edited To Add”.
I didn’t know it meant either.
The Internet seems to agree with you. I wonder why I remember “edit time addition”.
OK
OK
It’s an article of faith for some people that that makes a difference, but I’ve never seen why.
I mean, many of the “decision makers” on these particular issues already believe that their actual, personal, biological skins are at stake, along with those of everybody else they know. And yet...
Thinking “seven years from now, a significant number of independent players in a relatively large and diverse field might somehow band together to exclude me” seems very distant from the way I’ve seen actual humans make decisions.
Perhaps, but “seven years from now my reputation in my industry will drop markedly on the basis of this decision” seems to me like a normal human thing that happens all the time.
OpenAI wasn’t a private company (ie for-profit) at the time of the OP grant though.
Aren’t these different things? Private yes, for profit no. It was private because it’s not like it was run by the US government.
As a non-profit it is obligated to not take opportunities to profit, unless those opportunities are part of it satisfying its altruistic mission.
I don’t think this is true. Nonprofits can aim to amass large amounts of wealth, they just aren’t allowed to distribute that wealth to its shareholders. A good chunk of obviously very wealthy and powerful companies are nonprofits.
I’m not sure if those are precisely the terms of the charter, but that’s besides the point. It is still “private” in the sense that there is a small group of private citizens who own the thing and decide what it should do with no political accountability to anyone else. As for the “non-profit” part, we’ve seen what happens to that as soon as it’s in the way.
So the argument is that Open Phil should only give large sums of money to (democratic) governments? That seems too overpowered for the OpenAI case.
I was more focused on the ‘company’ part. To my knowledge there is no such thing as a non-profit company?
This does not feel super cruxy as the the power incentive still remains.
In that case OP’s argument would be saying that donors shouldn’t give large sums of money to any sort of group of people, which is a much bolder claim
(I’m the OP)
I’m not trying to say “it’s bad to give large sums of money to any group because humans have a tendency to to seek power.”
I’m saying “you should be exceptionally cautious about giving large sums of money to a group of humans with the stated goal of constructing an AGI.”
You need to weight any reassurances they give you against two observations:
The commonly observed pattern of individual humans or organisations seeking power (and/or wealth) at the expense of the wider community.
The strong likelihood that there will be an opportunity for organisations pushing ahead with AI research to obtain incredible wealth or power.
So, it isn’t “humans seek power therefore giving any group of humans money is bad”. It’s “humans seek power” and, in the specific case of AI companies, there may be incredibly strong rewards for groups that behave in a self-interested way.
The general idea I’m working off is that you need to be skeptical of seemingly altruistic statements and commitments made by humans when there are exceptionally lucrative incentives to break these commitments at a later point in time (and limited ways to enforce the original commitment).
That seems like a valuable argument. It might be worth updating the wording under premise 2 to clarifying this? To me it reads as saying that the configuration, rather than the aim, of OpenAI was the major red flag.
I’d like to see people who are more informed than I am have a conversation about this. Maybe at Less.online?
https://www.lesswrong.com/posts/zAqqeXcau9y2yiJdi/can-we-build-a-better-public-doublecrux
I would be happy to defend roughly the position above (I don’t agree with all of it, but agree with roughly something like “the strategy of trying to play the inside game at labs was really bad, failed in predictable ways, and has deeply eroded trust in community leadership due to the adversarial dynamics present in such a strategy and many people involved should be let go”).
I do think most people who disagree with me here are under substantial confidentiality obligations and de-facto non-disparagement obligations (such as really not wanting to imply anything bad about Anthropic or wanting to maintain a cultivated image for policy purposes) so that it will be hard to find a good public debate partner, but it isn’t impossible.
I largely disagree (even now I think having tried to play the inside game at labs looks pretty good, although I have sometimes disagreed with particular decisions in that direction because of opportunity costs). I’d be happy to debate if you’d find it productive (although I’m not sure whether I’m disagreeable enough to be a good choice).
For me, the key question in situations when leaders made a decision with really bad consequences is, “How did they engage with criticism and opposing views?”
If they did well on this front, then I don’t think it’s at all mandatory to push for leadership changes (though certainly, the worse someones track record gets, the more that speaks against them).
By contrast, if leaders tried to make the opposition look stupid or if they otherwise used their influence to dampen the reach of opposing views, then being wrong later is unacceptable.
Basically, I want to allow for a situation where someone was like, “this is a tough call and I can see reasons why others wouldn’t agree with me, but I think we should do this,” and then ends up being wrong, but I don’t want to allow situations where someone is wrong after having expressed something more like, “listen to me, I know better than you, go away.”
In the first situation, it might still be warranted to push for leadership changes (esp. if there’s actually a better alternative), but I don’t see it as mandatory.
The author of the original short form says we need to hold leaders accountable for bad decisions because otherwise the incentives are wrong. I agree with that, but I think it’s being too crude to tie incentives to whether a decision looks right or wrong in hindsight. We can do better and evaluate how someone went about making a decision and how they handled opposing views. (Basically, if opposing views aren’t loud enough that you’d have to actively squish them using your influence illegitimately, then the mistake isn’t just yours as the leader; it’s also that the situation wasn’t significantly obvious to others around you.) I expect that everyone who has strong opinions on things and is ambitious and agenty in a leadership position is going to make some costly mistakes. The incentives shouldn’t be such that leaders shy away from consequential interventions.
If the strategy failed in predictable ways, shouldn’t we expect to find “pre-registered” predictions that it would fail?
I have indeed been publicly advocating against the inside game strategy at labs for many years (going all the way back to 2018), predicting it would fail due to incentive issues and have large negative externalities due to conflict of interest issues. I could dig up my comments, but I am confident almost anyone who I’ve interfaced with at the labs, or who I’ve talked to about any adjacent topic in leadership would be happy to confirm.
Are you just referring to the profit incentive conflicting with the need for safety, or something else?
I’m struggling to see how we get aligned AI without “inside game at labs” in some way, shape, or form.
My sense is that evaporative cooling is the biggest thing which went wrong at OpenAI. So I feel OK about e.g. Anthropic if it’s not showing signs of evaporative cooling.
I’d like to know what Holden did while serving on the board, and what OpenAI would have done if he hadn’t joined. That’s crucial for assessing the grant’s impact.
But since board meetings are private, this will remain unknown for a long time. Unfortunately, the best we can do is speculate.