Adding my two cents as someone who has a pretty different lens from Habryka but has still been fairly disappointed with OpenPhil, especially in the policy domain.
Relative to Habryka, I am generally more OK with people “playing politics”. I think it’s probably good for AI safely folks to exhibit socially-common levels of “playing the game”– networking, finding common ground, avoiding offending other people, etc. I think some people in the rationalist sphere have a very strong aversion to some things in this genre, and labels like “power-seeking” and “deceptive” get thrown around too liberally. I also think I’m pretty with OpenPhil deciding it doesn’t want to fund certain parts of the rationalist ecosystem (and probably less bothered than Habryka about how their comms around this wasn’t direct/clear).
In that sense, I don’t penalize OP much for trying to “play politics” or for breaking deontological norms. Nonetheless, I still feel pretty disappointed with them, particularly for their impact on comms/policy. Some thoughts here:
I agree with Habryka that it is quite bad that OP is not willing to fund right-coded things. Even many of the “bipartisan” things funded by OP are quite left-coded. (As a useful heuristic, whenever you hear of someone launching a bipartisan initiative, I think one should ask “what % of the staff of this organization is Republican?” Obviously just a heuristic– there are some cases in which a 90%-Dem staff can actually truly engage in “real” bipartisan efforts. But in some cases, you will have a 90%-Dem staff claiming to be interested in bipartisan work without any real interest in Republican ideas, few if any Republican contacts, and only a cursory understanding of Republican stances.)
I also agree with Habryka that OP seems overly focused on PR risks and not doing things that are weird/controversial. “To be a longtermist grantee these days you have to be the kind of person that OP thinks is not and will not be a PR risk, IE will not say weird or controversial stuff” sounds pretty accurate to me. OP cannot publicly admit this because this would be bad for its reputation– instead, it operates more subtly.
Separately, I have seen OpenPhil attempt to block or restrain multiple efforts in which people were trying to straightforwardly explain AI risks to policymakers. My understanding is that OpenPhil would say that they believed the messengers weren’t the right people (e.g., too inexperienced), and they thought the downside risks were too high. In practice, there are some real tradeoffs here: there are often people who seem to have strong models of AGI risk but little/no policy experience, and sometimes people who have extensive policy experience but only recently started engaging with AI/AGI issues. With that in mind, I think OpenPhil has systematically made poor tradeoffs here and failed to invest into (or in some cases, actively blocked) people who were willing to be explicit about AGI risks, loss of control risks, capability progress, and the need for regulation. (I also think the “actively blocking” thing has gotten less severe over time, perhaps in part because OpenPhil changed its mind a bit on the value of direct advocacy or perhaps because OpenPhil just decided to focus its efforts on things like research and advocacy projects found funding elsewhere.)
I think OpenPhil has an intellectual monoculture and puts explicit/implicit cultural pressure on people in the OP orbit to “stay in line.” There is a lot of talk about valuing people who can think for themselves, but I think the groupthink problems are pretty real. There is a strong emphasis on “checking-in” with people before saying/doing things, and the OP bubble is generally much more willing to criticize action than inaction. I suspect that something like the CAIS statement or even a lot of the early Bengio comms would not have occured if Dan Hendrycks or Yoshua were deeply ingrained in the OP orbit. It is both the case that they would’ve been forced to write 10+ page Google Docs defending their theories of change and the case that the intellectual culture simply wouldn’t have fostered this kind of thinking.
I think the focus on evals/RSPs can largely be explained by a bias toward trusting labs. OpenPhil steered a lot of talent toward the evals/RSPs theory of change (specifically, if I recall correctly, OpenPhil leadership on AI was especially influential in steering a lot of the ecosystem to support and invest in the evals/RSPs theory of change.) I expect that when we look back in a few years, there will be a pretty strong feeling that this was the wrong call & that this should’ve been more apparent even without the benefit of hindsight.
I would be more sympathetic to OpenPhil in a world where their aversion to weirdness/PR risks resulted in them having a strong reputation, a lot of political capital, and real-world influence that matched the financial resources they possess. Sadly, I think we’re in a “lose-lose” world: OpenPhil’s reputation tends to be poor in many policy/journalism circles even while OpenPhil pursues a strategy that seems to be largely focused on avoiding PR risks. I think some of this is unjustified (e.g., a result of propaganda campaigns designed to paint anyone who cares about AI risk as awful). But then some of it actually is kind of reasonable (e.g., impartial observers viewing OpenPhil as kind of shady, not direct in its communications, not very willing to engage directly or openly with policymakers or journalists, having lots of conflicts of interests, trying to underplay the extent to which its funding priorities are influenced/constrained by a single Billionaire, being pretty left-coded, etc.)
To defend OpenPhil a bit, I do think it’s quite hard to navigate trade-offs and I think sometimes people don’t seem to recognize these tradeoffs. In AI policy, I think the biggest tradeoff is something like “lots of people who have engaged with technical AGI arguments and AGI threat models don’t have policy experience, and lots of people who have policy experience don’t have technical expertise or experience engaging with AGI threat models” (this is a bit of an oversimplification– there are some shining stars who have both.)
I also think OpenPhil folks probably tend to have a different probability distribution over threat models (compared to me and probably also Habryka). For instance, it seems likely to me that OpenPhil employees operate in more of a “there are a lot of ways AGI could play out and a lot of uncertainty– we just need smart people thinking seriously about the problem. And who really know how hard alignment will be, maybe Anthropic will just figure it out” lens and less of a “ASI is coming and our priority needs to be making sure humanity understands the dangers associated with a reckless race toward ASI, and there’s a substantial chance that we are seriously not on track to solve the necessary safety and security challenges unless we fundamentally reorient our whole approach” lens.
And finally, I think despite these criticisms, OpenPhil is also responsible for some important wins (e.g., building the field, raising awareness about AGI risk on university campuses, funding some people early on before AI safety was a “big deal”, jumpstarting the careers of some leaders in the policy space [especially in the UK]. It’s also plausible to me that there are some cases in which OpenPhil gatekeeping was actually quite useful in preventing people from causing harm, even though I probably disagree with OpenPhil about the # and magnitude of these cases).
Adding my two cents as someone who has a pretty different lens from Habryka but has still been fairly disappointed with OpenPhil, especially in the policy domain.
Relative to Habryka, I am generally more OK with people “playing politics”. I think it’s probably good for AI safely folks to exhibit socially-common levels of “playing the game”– networking, finding common ground, avoiding offending other people, etc. I think some people in the rationalist sphere have a very strong aversion to some things in this genre, and labels like “power-seeking” and “deceptive” get thrown around too liberally. I also think I’m pretty with OpenPhil deciding it doesn’t want to fund certain parts of the rationalist ecosystem (and probably less bothered than Habryka about how their comms around this wasn’t direct/clear).
In that sense, I don’t penalize OP much for trying to “play politics” or for breaking deontological norms. Nonetheless, I still feel pretty disappointed with them, particularly for their impact on comms/policy. Some thoughts here:
I agree with Habryka that it is quite bad that OP is not willing to fund right-coded things. Even many of the “bipartisan” things funded by OP are quite left-coded. (As a useful heuristic, whenever you hear of someone launching a bipartisan initiative, I think one should ask “what % of the staff of this organization is Republican?” Obviously just a heuristic– there are some cases in which a 90%-Dem staff can actually truly engage in “real” bipartisan efforts. But in some cases, you will have a 90%-Dem staff claiming to be interested in bipartisan work without any real interest in Republican ideas, few if any Republican contacts, and only a cursory understanding of Republican stances.)
I also agree with Habryka that OP seems overly focused on PR risks and not doing things that are weird/controversial. “To be a longtermist grantee these days you have to be the kind of person that OP thinks is not and will not be a PR risk, IE will not say weird or controversial stuff” sounds pretty accurate to me. OP cannot publicly admit this because this would be bad for its reputation– instead, it operates more subtly.
Separately, I have seen OpenPhil attempt to block or restrain multiple efforts in which people were trying to straightforwardly explain AI risks to policymakers. My understanding is that OpenPhil would say that they believed the messengers weren’t the right people (e.g., too inexperienced), and they thought the downside risks were too high. In practice, there are some real tradeoffs here: there are often people who seem to have strong models of AGI risk but little/no policy experience, and sometimes people who have extensive policy experience but only recently started engaging with AI/AGI issues. With that in mind, I think OpenPhil has systematically made poor tradeoffs here and failed to invest into (or in some cases, actively blocked) people who were willing to be explicit about AGI risks, loss of control risks, capability progress, and the need for regulation. (I also think the “actively blocking” thing has gotten less severe over time, perhaps in part because OpenPhil changed its mind a bit on the value of direct advocacy or perhaps because OpenPhil just decided to focus its efforts on things like research and advocacy projects found funding elsewhere.)
I think OpenPhil has an intellectual monoculture and puts explicit/implicit cultural pressure on people in the OP orbit to “stay in line.” There is a lot of talk about valuing people who can think for themselves, but I think the groupthink problems are pretty real. There is a strong emphasis on “checking-in” with people before saying/doing things, and the OP bubble is generally much more willing to criticize action than inaction. I suspect that something like the CAIS statement or even a lot of the early Bengio comms would not have occured if Dan Hendrycks or Yoshua were deeply ingrained in the OP orbit. It is both the case that they would’ve been forced to write 10+ page Google Docs defending their theories of change and the case that the intellectual culture simply wouldn’t have fostered this kind of thinking.
I think the focus on evals/RSPs can largely be explained by a bias toward trusting labs. OpenPhil steered a lot of talent toward the evals/RSPs theory of change (specifically, if I recall correctly, OpenPhil leadership on AI was especially influential in steering a lot of the ecosystem to support and invest in the evals/RSPs theory of change.) I expect that when we look back in a few years, there will be a pretty strong feeling that this was the wrong call & that this should’ve been more apparent even without the benefit of hindsight.
I would be more sympathetic to OpenPhil in a world where their aversion to weirdness/PR risks resulted in them having a strong reputation, a lot of political capital, and real-world influence that matched the financial resources they possess. Sadly, I think we’re in a “lose-lose” world: OpenPhil’s reputation tends to be poor in many policy/journalism circles even while OpenPhil pursues a strategy that seems to be largely focused on avoiding PR risks. I think some of this is unjustified (e.g., a result of propaganda campaigns designed to paint anyone who cares about AI risk as awful). But then some of it actually is kind of reasonable (e.g., impartial observers viewing OpenPhil as kind of shady, not direct in its communications, not very willing to engage directly or openly with policymakers or journalists, having lots of conflicts of interests, trying to underplay the extent to which its funding priorities are influenced/constrained by a single Billionaire, being pretty left-coded, etc.)
To defend OpenPhil a bit, I do think it’s quite hard to navigate trade-offs and I think sometimes people don’t seem to recognize these tradeoffs. In AI policy, I think the biggest tradeoff is something like “lots of people who have engaged with technical AGI arguments and AGI threat models don’t have policy experience, and lots of people who have policy experience don’t have technical expertise or experience engaging with AGI threat models” (this is a bit of an oversimplification– there are some shining stars who have both.)
I also think OpenPhil folks probably tend to have a different probability distribution over threat models (compared to me and probably also Habryka). For instance, it seems likely to me that OpenPhil employees operate in more of a “there are a lot of ways AGI could play out and a lot of uncertainty– we just need smart people thinking seriously about the problem. And who really know how hard alignment will be, maybe Anthropic will just figure it out” lens and less of a “ASI is coming and our priority needs to be making sure humanity understands the dangers associated with a reckless race toward ASI, and there’s a substantial chance that we are seriously not on track to solve the necessary safety and security challenges unless we fundamentally reorient our whole approach” lens.
And finally, I think despite these criticisms, OpenPhil is also responsible for some important wins (e.g., building the field, raising awareness about AGI risk on university campuses, funding some people early on before AI safety was a “big deal”, jumpstarting the careers of some leaders in the policy space [especially in the UK]. It’s also plausible to me that there are some cases in which OpenPhil gatekeeping was actually quite useful in preventing people from causing harm, even though I probably disagree with OpenPhil about the # and magnitude of these cases).