Towards more cooperative AI safety strategies
This post is written in a spirit of constructive criticism. It’s phrased fairly abstractly, in part because it’s a sensitive topic, but I welcome critiques and comments below. The post is structured in terms of three claims about the strategic dynamics of AI safety efforts; my main intention is to raise awareness of these dynamics, rather than advocate for any particular response to them. Disclaimer: I work at OpenAI, although this is a personal post that was not reviewed by OpenAI.
Claim 1: The AI safety community is structurally power-seeking.
By “structurally power-seeking” I mean: tends to take actions which significantly increase its power. This does not imply that people in the AI safety community are selfish or power-hungry; or even that these strategies are misguided. Taking the right actions for the right reasons often involves accumulating some amount of power. However, from the perspective of an external observer, it’s difficult to know how much to trust stated motivations, especially when they often lead to the same outcomes as self-interested power-seeking.
Some prominent examples of structural power-seeking include:
Trying to raise a lot of money.
Trying to gain influence within governments, corporations, etc.
Trying to control the ways in which AI values are shaped.
Favoring people who are concerned about AI risk for jobs and grants.
Trying to ensure non-release of information (e.g. research, model weights, etc).
Trying to recruit (high school and college) students.
To be clear, you can’t get anything done without being structurally power-seeking to some extent. However, I do think that the AI safety community is more structurally power-seeking than other analogous communities, such as most other advocacy groups. Some reasons for this disparity include:
The AI safety community is more consequentialist and more focused on effectiveness than most other communities. When reasoning on a top-down basis, seeking power is an obvious strategy for achieving one’s desired consequences (but can be aversive to deontologists or virtue ethicists).
The AI safety community feels a stronger sense of urgency and responsibility than most other communities. Many in the community believe that the rest of the world won’t take action until it’s too late; and that it’s necessary to have a centralized plan.
The AI safety community is more focused on elites with homogeneous motivations than most other communities. In part this is because it’s newer than (e.g.) the environmentalist movement; in part it’s because the risks involved are more abstract; in part it’s a founder effect.
Again, these are intended as descriptions rather than judgments. Traits like urgency, consequentialism, etc, are often appropriate. But the fact that the AI safety community is structurally power-seeking to an unusual degree makes it important to grapple with another point:
Claim 2: The world has strong defense mechanisms against (structural) power-seeking.
In general, we should think of the wider world as being very cautious about perceived attempts to gain power; and we should expect that such attempts will often encounter backlash. In the context of AI safety, some types of backlash have included:
Strong public criticism of not releasing models publicly.
Strong public criticism of centralized funding (e.g. billionaire philanthropy).
Various journalism campaigns taking a “conspiratorial” angle on AI safety.
Strong criticism from the AI ethics community about “whose values” AIs will be aligned to.
The development of an accelerationist movement focused on open-source AI.
These defense mechanisms often apply regardless of stated motivations. That is, even if there are good arguments for a particular policy, people will often look at the net effect on overall power balance when judging it. This is a useful strategy in a world where arguments are often post-hoc justifications for power-seeking behavior.
To be clear, it’s not necessary to avoid these defense mechanisms at all costs. It’s easy to overrate the effect of negative publicity; and attempts to avoid that publicity are often more costly than the publicity itself. But reputational costs do accumulate over time, and also contribute to a tribalist mindset of “us vs them” (as seen most notably in the open-source debate) which makes truth-seeking harder.
Note that most big companies (especially AGI companies) are strongly structurally power-seeking too, and this is a big reason why society at large is so skeptical of and hostile to them. I focused on AI safety in this post both because companies being power-seeking is an idea that’s mostly “priced in”, and because I think that these ideas are still useful even when dealing with other power-seeking actors.
Claim 3: The variance of (structurally) power-seeking strategies will continue to increase.
Those who currently take AGI and ASI seriously have opportunities to make investments (of money, time, social capital, etc) which will lead to much more power in the future if AI continues to become a much, much bigger deal.
But increasing attention to AI will also lead to increasingly high-stakes power struggles over who gets to control it. So far, we’ve seen relatively few such power struggles because people don’t believe that control over AI is an important type of power. That will change. To some extent this has already happened (with AI safety advocates being involved in the foundation of three leading AGI labs) but as power struggles become larger-scale, more people who are extremely good at winning them will become involved. That makes AI safety strategies which require power-seeking more difficult to carry out successfully.
How can we mitigate this issue? Two things come to mind. Firstly, focusing more on legitimacy. Work that focuses on informing the public, or creating mechanisms to ensure that power doesn’t become too concentrated even in the face of AGI, is much less likely to be perceived as power-seeking.
Secondly, prioritizing competence. Ultimately, humanity is mostly in the same boat: we’re the incumbents who face displacement by AGI. Right now, many people are making predictable mistakes because they don’t yet take AGI very seriously. We should expect this effect to decrease over time, as AGI capabilities and risks become less speculative. This consideration makes it less important that decision-makers are currently concerned about AI risk, and more important that they’re broadly competent, and capable of responding sensibly to confusing and stressful situations, which will become increasingly common as the AI revolution speeds up.
EDIT: A third thing, which may be the most important takeaway in practice: the mindset that it’s your job to “ensure” that things go well, or come up with a plan that’s “sufficient” for things to go well, inherently biases you towards trying to control other people—because otherwise they might be unreasonable enough to screw up your plan. But trying to control others will very likely backfire for all the reasons laid out above. Worse, it might get you stuck in a self-reinforcing negative loop: the more things backfire, the more worried you are, and so the more control you try to gain, causing further backfiring… So you shouldn’t be in that mindset unless you’re literally the US President (and maybe not even then). Instead, your job is to make contributions such that, if the wider world cooperates with you, then things are more likely to go well. AI safety is in the fortunate position that, as AI capabilities steadily grow, more and more people will become worried enough to join our coalition. Let’s not screw that up.
Meta: I think these kinds of posts should include some sort of disclaimer acknowledging that you are an OpenAI employee & also mentioning whether or not the post was reviewed by OpenAI staff, OpenAI comms, etc.
I imagine you didn’t do this because many people who read this forum are aware of this fact (and it’s on your profile– it’s not like you’re trying to hide it), but I suspect this information could be useful for newcomers who are engaging with this kind of material.
Yeah, this omission felt pretty glaring to me. OpenAI is explicitly aiming to build “the most powerful technology humanity has yet invented.” Obviously that doesn’t mean Richard is wrong that the AI safety community is too power-seeking, but I would sure have appreciated him acknowledging/grappling with the fact that the company he works for is seeking to obtain more power than any group of people in history by a gigantic margin.
An elephant in the room (IMO) is that moving forward, OpenAI probably benefits from a world in which the AI safety community does not have much influence.
There’s a fine line between “play nice with others and be more cooperative” and “don’t actually advocate for policies that you think would help the world, and only do things that the Big Companies and Their Allies are comfortable with.”
Again, I don’t think Richard sat in his room and thought “how do I spread a meme that is good for my company.” I think he’s genuinely saying what he believes and giving advice that he thinks will be useful to the AI safety community and improve society’s future .
But I also think that one of the reasons why Richard still works at OpenAI is because he’s the kind of agent who genuinely believes things that tend to be pretty aligned with OpenAI’s interests, and I suspect his perspective is informed by having lots of friends/colleagues at OpenAI.
Someone who works for a tobacco company can still have genuinely useful advice for the community of people concerned about the health effects of smoking. But I still think it’s an important epistemic norm that they add (at least) a brief disclaimer acknowledging that they work for a tobacco company.
(And the case becomes even stronger in the event that they have to get approval from the tobacco company comms team, or they filter out any ideas that they have that could get them in trouble with the company. Or perhaps before writing/publishing a post they consider the fact that other people have been fired from their company for sharing information that was against company interests, that the CEO attempted to remove a board member under the justification that she published a paper that went against company interests, that the company has previously used history of using highly restrive NDAs to prevent people from saying things that go against company interests.)
Added a disclaimer, as suggested. It seems like a good practice for this sort of post. Though note that I disagree with this paragraph; I don’t think “being the kind of agent who X” or “being informed by many people at Y” are good reasons to give disclaimers. Whereas I do buy that “they filter out any ideas that they have that could get them in trouble with the company” is an important (conscious or unconscious) effect, and worth a disclaimer.
I’ve also added this note to the text:
I appreciate you adding the note, though I do think the situation is far more unusual than described. I agree it’s widely priced in that companies in general seek power, but I think probably less so that the author of this post personally works for a company which is attempting to acquire drastically more power than any other company ever, and that much of the behavior the post describes as power-seeking amounts to “people trying to stop the author and his colleagues from attempting that.”
Thanks!
(I think “being the kind of agent who survives the selection process” can sometimes be an important epistemic thing to consider, though mostly when thinking about how systems work and what kinds of people/views those systems promote. Agreed that “being informed by many people who Y” is a rather weak one & certainly would not on its own warrant a disclosure.)
I’m not claiming it’s zero information, but there are lots of things that convey non-zero information which it’d be bad to set disclosure norms based on. E.g. “I’ve only ever worked at nonprofits” should definitely affect your opinion of someone’s epistemics (e.g. when they’re trying to evaluate corporate dynamics) but once we start getting people to disclose that sort of thing there’s no clear stopping point. So mostly I want the line to be “current relevant conflicts of interest”.
My take atm is “seems right that this shouldn’t be a permanent norm, there are definitely costs of disclaimer-ratcheting that are pretty bad. I think it might still be the right thing to do of your own accord in some cases, which is, like, superogetory.”
I think there’s maybe a weird thing with this post, where, it’s trying to be the timeless, abstract version of itself. It’s certainly easier to write the timeless abstract version than the “digging into specific examples and calling people out” version. But, I think the digging into specific examples is actually kind of important here – it’s easy to come away with vague takeaways that disagree, where everyone nods along but then mostly thinks it’s Those Other Guys who are being power seeking.
Given that it’s probably 10-50x harder to write the Post With Specific Examples, I think actually a pretty okay outcome is “ship the vague post, and let discussion in the comments get into the inside-baseball-details.” And, then, it’d be remiss for the post-author’s role in the ecosystem not coming up as an example to dig into.
I think that vast majority of comparative claims (like “AI Safety community is more X than any other advocacy group”) is more based on vibes than facts. Are you sure than Sierra Club, Club of Rome, Mont Pelerin Society, Fabian Society, Anti-Defamation League et cetera are less power-seeking than AI Safety community? Are you sure that “let’s totally reorganize society around ecological sustainability” is less power-seeking than “let’s ensure that AGI corporations have management that is not completely blind to alignment problem”?
I never claimed that AI safety is more X than “any other advocacy group”; I specifically said “most other advocacy groups”. And of course I’m not sure about this, asking for that is an isolated demand for rigor. It feels like your objection is the thing that’s vibe-based here.
On the object level: these are good examples, but because movements vary on so many axes, it’s hard to weigh up two of them against each other. That’s why I identified the three features of AI safety which seem to set it apart from most other movements. (Upon reflection I’d also add a fourth: the rapid growth.)
I’m curious if there are specific features which some of the movements you named have that you think contribute significantly to their power-seeking-ness, which AI safety doesn’t have.
I agree that the word “any” is wrong here. I used “sure” in the sense of “reasonably everyday I-won’t-get-hit-by-a-car-if-I-cross-the-road-now sure,” not in the “100% math-proof sure” sense.
By “vibe-based,” I refer to the features that you mention. Yes, there is a lot of talk about consequentialism and efficiency in rationalist/EA-adjacent AIS circles, and consequentialism gives a base for power-seeking actions, but it’s unclear how much this talk leads to actual power-seeking actions and how much it’s just local discourse framing.
The same applies to the feel of urgency.
Funnily, consequentialism can lead to less power-seeking if we define the problem in a less open-ended manner. If your task is to “maximize the number of good things in the world,” you benefit from power-seeking. If your task is to “design a safe system capable of producing superintelligent work,” you are, in fact, interested in completing this task with minimal effort and, therefore, minimal resources.
I think that the broad environmentalist movement is at least in the same tier as the AIS movement for all of the mentioned features.
Power-seeking inferred from consequentialism? Environmentalism is a movement about political control from the very start. The Club of Rome was initially founded inside the OECD and tried to influence its policy towards degrowth.
Consequentialism leading to questionable practices? Mass sterilizations in the 1980s are far beyond whatever AI Safety has done to date.
Urgency? You can look at all the people claiming that 2030 is a point of no return for climate change.
Focus of elites? The Club of Rome and the Sierra Club are literally what is written in the name—they are elite clubs.
I think I would agree if you say that there are a lot of nuances and the chance of AIS being more power-seeking than the environmentalism movement is non-negligible, but to measure power-seeking with necessary accuracy, we would need not a blog post written by one person, but the work of an army of sociologists.
Regardless of who is more power-seeking, it would probably be a good idea to look at how being power-seeking has been a disadvantage to other movements. It looks to me like the insistence/power-seeking of the environmental movement may well have been an immense disadvantage; it may have created a backlash that’s almost as strong as the entire movement.
Backlash for environmentalism was largely inevitable. The whole point of environmentalism is to internalize externalities in some way, i.e., impose costs of pollution/ecological damage on polluters. Nobody likes to get new costs, so backlash ensues.
That’s a good point. But not all of the imposed costs were strategically wise, so the backlash didn’t need to be that large to get the important things done. It could be argued that the most hardline, strident environmentalists might’ve cost the overall movement immensely by pushing for minor environmental gains that come at large perceived costs.
I think that did happen, and that similarly pushing for AI safety measures should be carefully weighed in cost vs benefit. The opposite argument is that we should just get everyone used to paying costs for ai safety (in terms of limiting ai progress that would not probably be highly dangerous). I think that strategy backfired badly for environmentalism and would backfire for us.
Maybe. Again, I’m not expert in PR and I’d really like to have people who are expert involved in coming up with strategies.
I think at least some of this backlash comes from Earth being very bad at coordination. “To get moderate result you should scare opponents with radicals” and other negotiation frictions.
Sure. But scaring opponents with inflated arguments and demands by radicals didn’t seem to work well for the environmental movement, so the AI safety movement probably shouldn’t employ those tactics.
To clarify in more general way: Earth is bad in coordination in a sense that you can’t expect that if industrial producer dumps toxic waste into environment, special government agency will walk in with premise: “hey, you seem to destroy environmental commons to get profits, let’s negotiate point on Pareto frontier in space of environmental commons-profits which leave both of us not very upset”.
Until recently the MIRI default plan was basically “obtain god-like AI and use it to take over the world”(“pivotal act”), it’s hard to get more power-seeking than that. Other wings of the community have been more circumspect but also more active in things like founding AI labs, influencing government policy, etc., to the tune of many billions of dollars worth of total influence. Not saying this is necessarily wrong but it does seem empirically clear that AI-risk-avoiders are more power-seeking than most movements.
Seems like this is already the case.
My understanding of MIRI plan was “have a controllable, safe AI that’s just powerful enough to take some action that prevents anyone else from building a more powerful and more dangerous AI”. I wouldn’t call that God-like or an intention to take over the world. The go-to [acknowledged as that plausible] example is “melt all the GPUs”] Your description feels grossly inaccurate.
Basically any plan of the form “use AI to prevent anyone from building more powerful and more dangerous AI” is incredibly power-grabbing by normal standards: in order to do this, you’ll have to take actions that start out as terrorism and then might quickly need to evolve into insurrection (given that the government will surely try to coerce you into handing over control over the AI-destroying systems); this goes against normal standards for what types of actions private citizens are allowed to take.
I agree that “obtain enough hard power that you can enforce your will against all governments in the world including your own” is a bit short of “try to take over the world”, but I think that it’s pretty world-takeover-adjacent.
I mean, it really matters whether you are suggesting someone else to take that action or whether you are planning to take that action yourself. Asking the U.S. government to use AI to prevent anyone from building more powerful and more dangerous AI is not in any way a power-grabbing action, because it does not in any meaningful way make you more powerful (like, yes, you are part of the U.S. so I guess you end up with a bit more power as the U.S. ends up with more power, but that effect is pretty negligible). Even asking random AI capability companies to do that is also not a power-grabbing action, because you yourself do not end up in charge of those companies as part of that.
Yes, unilaterally deploying such a system yourself would be, but I have no idea what people are referring to when they say that MIRI was planning on doing that (maybe they were, but all I’ve seen them do is to openly discuss plans about what ideally someone with access to a frontier model should do in a way that really did not sound like it would end up with MIRI meaningfully in charge).
I think they talked explicitly about planning to deploy the AI themselves back in the early days(2004-ish) then gradually transitioned to talking generally about what someone with a powerful AI could do.
But I strongly suspect that in the event that they were the first to obtain powerful AI, they would deploy it themselves or perhaps give it to handpicked successors. Given Eliezer’s worldview I don’t think it would make much sense for them to give the AI to the US government(considered incompetent) or AI labs(negligently reckless)
I agree that very old MIRI (explicitly disavowed by present MIRI and mostly modeled as “one guy in a basement somewhere”) looked a bit more like this, but I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president. I don’t think it has zero value in forecasting the future, but going and reading someone’s high-school political science essay, and inferring they would endorse that position in the modern day, is extremely dubious.
My model of them would definitely think very hard about the signaling and coordination problems that come with people trying to build an AGI themselves, and then act on those. I think Eliezer’s worldview here would totally output actions that include very legible precommitments about what the AI system would be used for, and would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it. Eliezer has written a lot about this stuff and clearly takes considerations like that extremely seriously.
Yeah, but it’s not just the old MIRI views, but those in combination with their statements about what one might do with powerful AI, the telegraphed omissions in those statements, and other public parts of their worldview e.g. regarding the competence of the rest of the world. I get the pretty strong impression that “a small group of people with overwhelming hard power” was the ideal goal, and that this would ideally be controlled by MIRI or by a small group of people handpicked by them.
Some things that feel incongruent with this:
Eliezer talks a lot in the Arbital article on CEV about how useful it is to have a visibly neutral alignment target
Right now Eliezer is pursuing a strategy which does not meaningfully empower him at all (just halting AGI progress)
Eliezer complaints a lot about various people using AI alignment under the guise of mostly just achieving their personal objectives (in-particular the standard AI censorship stuff being thrown into the same bucket)
Lots of conversations I’ve had with MIRI employees
I would be happy to take bets here about what people would say.
Sure, I DM’d you.
This seems too strong to me. There looks to me like a clear continuity of MIRI’s strategic outlook from the days when their explicit plan was to build a singleton and “optimize” the universe, through to today. In between there was a series of updates regarding how difficult various intermediate targets would be. But the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function.
If I remember correctly, back in the AI foom debate, Robin Hanson characterized the Singularity Institute’s plan (to be the first to a winner-take-all technology, and then use that advantage to optimize the cosmos) as declaring total war on the world. Eliezer disputed that characterization.
(Note that I spent 10 minutes trying to find the relevant comments, and didn’t find anything quite like what I was remembering which does decrease my credence that I’m remembering correctly.)
I think you mean “the goal remains to ensure that CEV or something like it is eventually implemented, and the universe is thus optimized according to the resulting utility function”, right? I think Eliezer’s view has always been that we want a CEV-maximizing ASI to be eventually turned on, but if that happens, it wouldn’t matter which human turns it on. And then evidently Eliezer has pivoted over the decades from thinking that this is likeliest to happen if he tries to build such an ASI with his own hands, to no longer thinking that.
All of that sounds right to me. But this pivot with regards to means isn’t much evidence about what Eliezer/MIRI would do if they (as a magical hypothetical) suddenly found themselves with a verifiably-aligned CEV AGI.
I expect that they would turn it on, with the expectation that it would develop a hard power decisive strategic advantage, use that to end the acute risk period, and then proceed to optimize the universe.
Insofar as that’s true, I think Oliver’s statement above...
...is inaccurate.
MIRI has never said, to my knowledge,
The Singularity Institute used to have the plan of building and deploying a friendly AI, which they expected to “optimize” the whole world.
Eliezer’s writing includes lots of point in which he at least hints (some would say more than hints), that he thinks that it is morally obligatory or at least virtuous, to take over the world for the side of Good.
Famously, Harry says “World Domination is such an ugly phrase. I prefer world optimization.” (We made t-shirts of this phrase!)
The Sword of Good ends with the line
“‘I don’t trust you either,’ Hirou whispered, ‘but I don’t expect there’s anyone better,’ and he closed his eyes until the end of the world.” He’s concluded that all the evil in the world must be opposed, that it’s right for someone to cast the “spell of ultimate power” to do that.
(This is made a bit murky, because Eliezer’s writings usually focus on the transhumanist conquest of the evils of nature, rather than political triumph over human-evil. But triumph over the human-evils is definitely included eg the moral importance and urgency of destroying Azkaban, in HP:MoR.)
From all that, I think it is reasonable to think that MIRI is in favor of taking over the world, if they could get the power to do it!
So it seems disingenuous, to me, to say,
I agree that
MIRI’s leadership doesn’t care who implements a CEV AI, as long as they do it correctly.
(Though this is not as clearly non-powerseeking, if you rephrase it as “MIRI leadership doesn’t care who implements the massively powerful AI, as long as they correctly align it to the values that MIRI leadership endorses.
For an outsider who doesn’t already trust the CEV process, this is about as reassuring as a communist group saying “we don’t care who implements the AI, as long as they properly align it to Marxist doctrine. I understand how CEV is more meta than that, how it is explicitly avoiding coding object level values into the AI. But not everyone will see it that way, especially if they think the output of CEV is contrary to their values (as indeed, virtually every existing group should.))
CEV as an optimization target is itself selected to be cosmopolitan and egalitarian. It’s as good faith attempt to optimize for the good of all. It does seem to me that the plan of “give a hard power advantage to this process, which we expect to implement the Good, itself”, is a step down in power-seeking from “give a hard power advantage to me, and I’ll do Good stuff.”
But it still seems to me that MIRI’s culture endorses sufficiently trustworthy people taking unilateral action, both to do a pivotal act and end the acute risk period, and more generally, to unleash a process that will inexorably optimize the world for Good.
I mean, I also think there is continuity from the beliefs I held in my high-school essays and my present beliefs, but it’s also enough time and distance that if you straightforwardly attribute claims to me that I made in my high-school essays, that I have explicitly disavowed and told you I do not believe, that I will be very annoyed with you and will model you as not actually trying to understand what I believe.
Absolutely, if you have specifically disavowed any claims, that takes precedence over anything else. And if I insist you still think x, because you said x ten years ago, but you say you now think something else, I’m just being obstinant.
In contrast, if you said x ten years ago, and in the intervening time you’ve shared a bunch of highly detailed models that are consistent with x, I think I should think you still think x.
I’m not aware of any specific disavowals of anything after 2004? What are you thinking of here?
Here is a video of of Eliezer, first hosted on vimeo in 2011. I don’t know when it was recorded.
[Anyone know if there’s a way to embed the video inthe coment, so people don’t have to click out to watch it?]
He states explicitly:
And later in the video he says:
“Pretty world-takeover-adjacent” feels like a fair description to me.
I don’t think this is the important point of disagreement. Habryka’s point throughout this thread seems to be that, yes, doing that is power-grabbing, but it is not what MIRI planned to do. So MIRI planned to (intellectually) empower anyone else willing to do (and capable of doing) a pivotal act with a blueprint for how to do so.
So MIRI wasn’t seeking to take power, but rather to allow someone else[1] to do so. It’s the difference between using a weapon and designing a weapon for someone else’s use. An important part is that this “someone else” could very well disagree with MIRI about a large number of things, so there need not be any natural allyship or community or agreement between them.
If you are a blacksmith working in a forge and someone comes into your shop and says “build me a sword so I can use it to kill the king and take control of the realm,” and you agree to do so but do not expect to get anything out-of-the-ordinary in return (in terms of increased power, status, etc), it seems weird and non-central to call your actions power-seeking. You are simply empowering another, different power-seeker. You are not seeking any power of your own.
Who was both in a position of power at an AI lab capable of designing a general intelligence and sufficiently clear-headed about the dangers of powerful AI to understand the need for such a strategy.
My understanding was there were 4 phases in which the Singularity Institute / MIRI had 4 different plans.
~2000 to ~2004:
Plan:
Build a recursively self-improving seed AI as quickly as possible →
That seed AI Fooms →
It figures out the Good and does it.
[Note: Eliezer has explicitly disendorsed everything that he believed in this period, unless othrwise noted.]
~2004 to ~2016:
Update: “Wait. Because of the Orthogonality thesis, not all seed AIs will converge to values that we consider good, even though they’re much smarter than us. The supermajority of seed AIs don’t. We have to build in humane values directly, or building a recursively self-improving AGI will destroy both the world and everything of value in the world.”
New plan:
Figure out the math of motivationaly-stable self-improvement, figure the deep math of cognition →
use both to build a seed AI, initialized to implement Coherent Extrapolated Volition ->
let that seed AI recursively self improve into a singleton / sovereign with a decisive strategic advantage →
that singleton now uses its decisive strategic advantage to optimize the universe.
(“World domination is such an ugly phrase. I prefer world optimization”).
~2016 to ~2021:
Update: “it turns out Deep Learning is general enough that it is possible to build AGI with relatively brute force methods, without having much deep insight into the nature of cognition. AI timelines are shorter than we thought. Fuck. There isn’t time to figure out how to do alignment deeply and completely, at the level that would be required to trust an AI to be a sovereign, and optimize the whole universe.”
New plan:
Figure out enough of alignment to build the minimal AGI system that can preform a pivotal act, in a tightly controlled circumstances, with lots of hacky guardrails and speed-bumps →
Build such a limited AGI →
Deploy that AGI to do a pivotal act to prevent any competitor projects from building a more dangerous unbounded AGI.
~2021 to present:
Update: “We can’t figure out even that much of the science of alignment in time. The above plan is basically doomed. We think the world is doomed. Given that, we might as well try outreach:”
New plan:
Do outreach →
Get all the Great Powers in the world to join and enforce an treaty that maintains a world-wide ban on large training runs →
Do biotech projects that can produce humans that are smart enough that they have security mindset not out of a special personal disposition, but just because they’re smart enough to see the obviousness of it by default →
Those superhumans solve alignment and (presumably?) implement more or less the pre-2016 MIRI plan.
I think interstice’s summary is basically an accurate representation of the ~2001 to ~2016 plan. They’re only mistaken in that MIRI didn’t switch away from that plan until recently.
Nice overview, I agree but I think the 2016-2021 plan could still arguably be described as “obtain god-like AI and use it to take over the world”(admittedly with some rhetorical exaggeration, but like, not that much)
I think it’s pretty important that the 2016 to 2021 plan was explicitly aiming to avoid unleashing godlike power. “The minimal amount of power to do a thing which is otherwise impossible”, not “as much omnipotence as is allowed by physics”.
And similarly, the 2016 to 2021 plan did not entail optimizing the world except with regard to what is necessary to prevent dangerous AGIs.
These are both in contrast to the earlier 2004 to 2016 plan. So the rhetorical exaggeration confuses things.
MIRI actually did have a plan that, in my view, is well characterized as (eventually) taking over the world, without exaggeration, that’s apt to get lost if we describe a “toned down” plan as “taking over the world”, because it involves taking powerful, potentially aggressive, action.
This discussion is a nice illustration of why x-riskers are definitely more power-seeking than the average activist group. Just like Eskimos proverbially have 50 words for snow, AI-risk-reducers need at least 50 terms for “taking over the world” to demarcate the range of possible scenarios. ;)
...fwiw I think it’s not grossly inaccurate.
I think MIRI did put a lot of effort into being cooperative about the situation (i.e. Don’t leave your fingerprints on the future, doing the ‘minimal’ pivotal act that would end the acute risk period, and when thinking about longterm godlike AI, trying to figure out fair CEV sorts of things).
But, I think it was also pretty clear that “have a controllable, safe AI that’s just powerful enough to take some action that prevents anyone else from building a more powerful and more dangerous AI” were not in the overton window. I don’t know what Eliezer’s actual plan was since he disclaimed “yes I know melt all the GPUs won’t work”, but, like, “melt all the GPUs” implies a level of power over the world that is really extreme by historical standards, even if you’re trying to do the minimal thing with that power.
I think the plan implies having the capability that if you wanted to, you could take over the world, but having the power to do something and actually doing it are quite different. When you say “MIRI wanted to take over the world”, the central meanings of that that come to mind for me is “take over all the governments, be in charge of all the laws and decision-making, be world dictator, take possession of all the resources” and probably also “steer humanity’s future in a very active way”. Which is very very not their intention and if someone goes around saying MIRI’s plan was to take over the world without any clarification leaving the reader to think the above, then I think they’re being very darn misleading.
When you read the Sequences, was your visualization of a Friendly AI going to let the governments of North Korea or Saudi Arabia persist? Would it allow parents to abuse their children in ways that are currently allowed by the law (and indeed enshrined by the law, in that the law give parents authority over their children)? Does it allow the factory farms to continue to run? How about the (then contemporaneous) US occupations of Iraq and Afghanistan?
(This is a non- rehtorical question. I wonder if we were visualizing different things.)
Speaking for myself, I would say:
It’s a superintelligence, and so it can probably figure out effective peaceful ways to accomplish it’s goals. But among it’s goals will be the dismantling of many and likely all of the world’s major governments, not to mention a bunch of other existing power structures. A government being dismantled by a superhuman persuader is, in many but not all ways, as unsettling as it being destroyed by military force.
Perhaps humanity as a whole, and every individual human, would be made better off by a CEV-aligned friendly singleton, but I think the US government, as an entity, would be rightly threatened.
Doesn’t this very answer show that an AI such as you describe would not be reasonably describable as “Friendly”, and that consequently any AI worthy of the term “Friendly” would not do any of the things you describe? (This is certainly my answer to your question!)
No. “Friendly” was a semi-technical term of art, at the time. It may turn out that a Friendly AI (in the technical sense) is not or, even can’t be, “friendly” in a more conventional sense.
Er… yes, I am indeed familiar with that usage of the term “Friendly”. (I’ve been reading Less Wrong since before it was Less Wrong, you know; I read the Sequences as they were being posted.) My comment was intended precisely to invoke that “semi-technical term of art”; I was not referring to “friendliness” in the colloquial sense. (That is, in fact, why I used the capitalized term.)
Please consider the grandparent comment in light of the above.
In that case, I answer flatly “no”. I don’t expect many existing governmental institutions to be ethical or legitimate in the eyes of CEV, if CEV converges at all. Factory Farming is right out.
You don’t think that most humans would be opposed to having an AI dismantle their government, deprive them of affordable meat, and dictate how they can raise their children?
Whether most existing humans would be opposed is not a criterion of Friendliness.
I think if you described what was going to happen many and maybe humans would say they prefer the status quo to a positive CEV-directed singularity. Perhaps it depends on which parts of “what’s going to happen” you focus on, some are more obviously good or exciting than others. Curing cancer is socially regarded as 👍 while curing death and dismantling governments are typically (though not universally) regarded as 👎.
I don’t think they will actually provide much opposition, because a superhuman persuader will be steering the trajectory of events. (Ostensively, by using only truth tracking arguments and inputs that allow us to converge on the states of belief that we would reflectively prefer, but we mere humans won’t be able to distinguish that from malicious superhuman manipulation.)
But again, how humans would react is neither here nor there for what a Friendly AI does. The AI does what the CEV of humans would want, not what the humans want.
And… you claim that the CEV of existing humans will want those things?
Part of the whole point of CEV is to discover at least some things that current humanity is confused about but would want if fully informed, with time to think. It’d be surprising to me if CEV-existing-humanity didn’t turn out to want some things that many current humans are opposed to.
Sure. Now, as far as I understand it, whether the extrapolated volition of humanity will even cohere is an open question (on any given extrapolation method; we set aside the technical question of selecting or constructing such a method).
So Eli Tyre’s claim seems to be something like: on [ all relevant / the most likely / otherwise appropriately selected ] extrapolation methods, (a) humanity’s EV will cohere, (b) it will turn out to endorse the specific things described (dismantling of all governments, removing the supply of factory farmed meat, dictating how people should raise their children).
Right?
I’m much more doubtful than most people around here about whether CEV coheres: I guess that the CEV of some humans wireheads themselves and the CEV of other humans doesn’t, for instance.
But I’m bracketing that concern for this discussion. Assuming CEV coheres, then yes I predict that it will have radical (in the sense of a political radical who’s beliefs are extremely outside of the overton window, such that they are disturbing to the median voter) views about all of those things.
But more confidently, I predict that it will have radical views about a very long list of things that are commonplace in 2024, even if it turns out that I’m wrong about this specific set.
CEV asks what would we want if we knew everything the AI knows. There are dozens of things that I think that I know, that if the average person knew to be true, would invalidate a lot of their ideology. Basic
If the average person knew everything that an AGI knows (which includes potentially millions of subjective years of human science, whole new fields, as foundational to one’s worldview as economics and probability theory is to my current worldview), and they had hundreds of subjective years to internalize those facts and domains, in a social context that was conducive to that, with (potentially) large increases in their intelligence, I expect their views are basically unrecognizable after a process like that.
As a case in point, most people consider it catastrophically bad to have their body destroyed (duh). And if you asked them if they would prefer, given their body being destroyed, to have their brain-state recorded, uploaded, and run on a computer, many would say “no”, because it seems horrifying to them.
Most LessWrongers embrace computationalism: they think that living as an upload is about as good as living as a squishy biological robot (and indeed, better in many respects). They would of course choose to be uploaded if their body was being destroyed. Many would elect to have their body destroyed specifically because they would prefer to be uploaded!
That is most LessWrongers think they know something which most people don’t know, but which, if they did know it, would radically alter their preferences and behavior.
I think a mature AGI knows at least thousands of things like that.
So among the things about CEV that I’m most confident about (again, granting that it coheres at all), is that CEV has extremely radical views, conclusions which are horrifying to most people, including probably myself.
If by ‘cohere’ you mean ‘the CEVs of all individual humans match’, then my belief (>99%) is that it is not the case that the CEVs of all individual humans will (precisely) match. I also believe there would be significant overlap between the CEVs of 90+% of humans[1], and that this overlap would include disvaluing two of the three[2] things you asked about (present factory farming and child abuse; more generally, animal and child suffering).
(This felt mostly obvious to me, but you did ask about it a few times, in a way that suggested you expect something different; if so, you’re welcome to pinpoint where you disagree.)
For instance, even if one human wants to create a lot of hedonium, and another human wants to create a lot of individuals living fun and interesting lives, it will remain the case that they both disvalue things like extreme suffering. Also, the former human will probably still find at least some value in what the latter human seeks.
For the part of your question about whether their CEVs would endorse dismantling governments: note that ‘governments’ is a relevantly broad category, when considering that most configurations which are infeasible now will be feasible in the (superintelligence-governed) future. I think these statements capture most of my belief about how most humans’ CEVs would regard things in this broad category.
Most human CEVs would be permissive of those who terminally-wish[3] to live in contexts that have some form of harmless government structure.
The category of ‘government’ also includes, e.g., dystopias that create suffering minds and don’t let them leave; most human CEVs would seek to prevent this kind of government from existing.
(None of that implies any government would be present everywhere, nor that anyone would be in such a context against their will; rather, I’m imagining that a great diversity of contexts and minds will exist. I less confidently predict that most will choose to live in contexts without a government structure, considering it unnecessary given the presence of a benevolent ASI.)
(wished for not because it is necessary, for it would not be under a benevolent ASI, but simply because it’s their vision for the context in which they want to live)
I do.
I mean, it depends on the exact CEV procedure. But yes.
I think majority of nations would support dismantling their governments in favor of benevolent superintelligence, especially given correct framework. And ASI can simply solve problem of meat by growing brainless bodies.
Edit: Whoever mega-downvoted this, I’m interested to see you explain why.
Meta: You may wish to know[1] that seeing these terms replaced with the ones you used can induce stress/dissociation in the relevant groups (people disturbed by factory farming and child abuse survivors). I am both and this was my experience. I don’t know how common it would be among LW readers of those demographics specifically, though.
The one you responded to:
Your response:
I’m framing this as sharing info you (or a generalized altruistic person placed in your position) may care about rather than as arguing for a further conclusion.
I’m confused about your question. I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
Right, but I’m asking about what you’re visualization of a Friendly AI as described in the sequences, not a limited AGI for a pivotal act.
I’m confused by your confusion! Are you saying that that’s a non-sequitur, because I’m asking about a CEV-sovereign instead of a corrigible, limited genie or oracle?
It seems relevant to me, because both of those were strategic goals for MIRI at various points in it’s history, and at least one of them seem well characterized as “taking over the world” (or at least something very nearby to that). Which seems germane to the discussion at hand to me.
I would be surprised if a Friendly AI resulted in those things being left untouched.
I think that is germane but maybe needed some bridging/connecting work since this thread so far was about MIRI-as-having-pivotal-act-goal. Whereas I was less sure about whether MIRI itself would enact a pivotal act if they could than Habryka, my understanding was they had no plan to create a sovereign for most of their history (like after 2004) and so doesn’t seem like that’s a candidate for them having a plan to take over the world.
Yeah, I think that’s false.
The plan was “Figure out how to build a friendly AI, and then build one”. (As Eliezer stated in the video that I linked somewhere else in this comment thread).
But also, I got that impression from the Sequences? Like Eliezer talks about actually building an AGI, not just figuring out the theory of how to build one. You didn’t get that impression?
I don’t remember what exactly I thought in 2012 when I was reading the Sequences. I do recall sometime later, after DL was in full swing, it seeming like MIRI wasn’t in any position to be building AGI before others (like no compute, not the engineering prowess), and someone (not necessarily at MIRI) confirmed that wasn’t the plan. Now and at the time, I don’t know how much that was principle vs ability.
My feeling of the plan pre-pivotal-act era was “figure out the theory of how to build a safe AI at all, and try to get whoever is building to adopt that approach”, and that MIRI wasn’t taking any steps to be the ones building it. I also had the model that due to psychological unity of mankind, anyone building an aligned[ with them] AGI was a good outcome compared to someone building unaligned. Like even if it was Xi Jinping, a sovereign aligned with him would be okay (and not obviously that dramatically different from anyone else?). I’m not sure how much this was MIRI positions vs fragments that I combined in my own head that came from assorted places and were never policy.
Well, I can tell you that they definitely planned to build the Friendly AI, after figuring out how.
See this other comment.
Pretty solid evidence.
“Taking over” something does not imply that you are going to use your authority in a tyrannical fashion. People can obtain control over organizations and places and govern with a light or even barely-existent touch, it happens all the time.
Would you accept “they plan to use extremely powerful AI to institute a minimalist, AI-enabled world government focused on preventing the development of other AI systems” as a summary? Like sure, “they want to take over the world” as a gist of that does have a bit of an editorial slant, but not that much of one. I think that my original comment would be perceived as much less misleading by the majority of the world’s population than “they just want to do some helpful math uwu” in the event that these plans actually succeeded. I also think it’s obvious that these plans indicate a far higher degree of power-seeking(in aim at least) than virtually all other charitable organizations.
(..and to reiterate, I’m not taking a strong stance on the advisability of these plans. In a way, had they succeeded, that would have provided a strong justification for their necessity. I just think it’s absurd to say that the organization making them is less power-seeking than the ADL or whatever)
No. Because I don’t think that was specified or is necessary for a pivotal act. You could leave all existing government structures intact and simply create an invincible system that causes any GPU farm larger than a certain size to melt. Or something akin to that that doesn’t require replacing existing governments, but is a quite narrow intervention.
It wasn’t specified but I think they strongly implied it would be that or something equivalently coercive. The “melting GPUs” plan was explicitly not a pivotal act but rather something with the required level of difficulty, and it was implied that the actual pivotal act would be something further outside the political Overton window. When you consider the ways “melting GPUs” would be insufficient a plan like this is the natural conclusion.
I don’t think you would need to replace existing governments. Just block all AI projects and maintain your ability to continue doing so in the future via maintaining military supremacy. Get existing governments to help you, or at least not interfere, via some mix of coercion and trade. Sort of a feudal arrangement with a minimalist central power.
That to me is a very very non-central case of “take over the world”, if it is one at all.
This is about “what would people think when they hear that description” and I could be wrong, but I expect “the plan is to take over the world” summary would lead people to expect “replace governments” level of interference, not “coerce/trade to ensure this specific policy”—and there’s a really really big difference between the two.
I think this whole debate is missing the point I was trying to make. My claim was that it’s often useful to classify actions which tend to lead you to having a lot of power as “structural power-seeking” regardless of what your motivations for those actions are. Because it’s very hard to credibly signal that you’re accumulating power for the right reasons, and so the defense mechanisms will apply to you either way.
In this case MIRI was trying to accumulate a lot of power, and claiming that they were aiming to use it in the “right way” (do a pivotal act) rather than the “wrong way” (replacing governments). But my point above is that this sort of claim is largely irrelevant to defense mechanisms against power-seeking.
(Now, in this case, MIRI was pursuing a type of power that was too weird to trigger many defense mechanisms, though it did trigger some “this is a cult” defense mechanisms. But the point cross-applies to other types of power that they, and others in AI safety, are pursuing.)
I don’t super buy this. I don’t think MIRI was trying to accumulate a lot of power. In my model of the world they were trying to design a blueprint for some institution or project that would mostly have highly conditional power, that they would personally not wield.
In the metaphor of classical governance, I think what MIRI was doing was much more “design a blueprint for a governance agency” not “put themselves in charge of a governance agency”. Designing a blueprint is not a particularly power-seeking move, especially if you expect other people to implement it.
I got your point and think it’s valid and I don’t object to calling MIRI structurally power-seeking to the extent they wanted to execute a pivotal act themselves (Habryka claims they weren’t, I’m not knowledgeable on that front).
I still think it’s important to push back against a false claim that someone had the goal of taking over the world.
Are you saying that AIS movement is more power-seeking than environmentalist movement that spent 30M$+ on lobbying in single 2023 and has political parties in 90 countries, in five countries—in ruling coalition? For comparison, this paper in Politico with maximally negative attitude mentions AIS lobbying around 2M$.
It’s like saying “NASA default plan is to spread light of consciousness across the stars”, which is kinda technically true, but in reality NASA actions are not as cool as this phrase implies. “MIRI default plan” was “to do math in hope that some of this math will turn out to be useful”.
I think that AIS lobbying is likely to have more consequential and enduring effects on the world than environmental lobbying regardless of the absolute size in body count or amount of money, so yes.
I mean yeah, that is a better description of their publicly-known day-to-day actions, but intention also matters. They settled on math after it became clear that the god AI plan was not achievable(and recently, gave up on the math plan too when it became clear that was not realistic). An analogy might be an environmental group that planned to end pollution by bio-engineering a microbe to spread throughout the world that made oil production impossible, then reluctantly settled for lobbying once they realized they couldn’t actually make the microbe. I think this would be a pretty unusually power-seeking plan for an environmental group!
The point of the OP is not about effects, it’s about AIS being visibly more power-seeking than other movements and causing backlash in response to visible activity.
Perhaps the broader point here is that public relations is a complex art, of which we are mostly not even practitioners let alone masters. We should probably learn about it and get better.
I also want to note that there are probably psychological as well as societal defense mechanisms against someone trying to change your worldview. I don’t know the name of the phenomena, but this is essentially why counselors/therapists typically avoid giving advice or stating their opinion plainly; the client is prone to rebel against that advice or worldview. I’d suspect this happens because it’s terribly dangereous to just let other people tell you how to think; you’ll be taken advantage of rather quickly if you do. Obviously there are multiple routes around these defense mechanisms, since people do convince others to change their minds in both subtle and forceful ways. But we should probably learn the theory of how that happens, prior to triggering a bunch of defense mechanisms by going in swinging with amateur enthusiasm (and the unusual perspective of devoted rationalism).
Waiting to speak while polishing our approach seems foolish when time is short. I find very short timelines entirely plausible, but I nonetheless think it would behoove us to collectively gather some clues before arguing loudly in public.
This is in part because I think the last point is very true and very relevant: people who aren’t taking AGI risk seriously largely just aren’t taking AI seriously. They’ll take it more seriously the more it advances, with no convincing needed. So a good bit of the work is being done by progress, so we’re not as far behind in getting people to pay attention as it seems. That gives us a bit more time to figure out how to work with that societal attention as it continues to grow.
None of this is arguing for shutting up or not speaking the truth. I’m just suggesting we err on the side of speaking gently, to avoid triggering strong defense mechanisms we don’t understand.
Unfortunately, as societal attention to AI ramps up, less and less of that attention will go to “us”.
That is an excellent point. I hate the idea of gathering attention and reputation, but that’s probably an big part of having people listen to you when it’s important.
I don’t think the set of people interested in AI safety is a even a “community” given how diverse it is (Bengio, Brynjolfsson, Song, etc.), so I think it’s be more accurate to say “Bay Area AI alignment community is structurally power-seeking.”
I think this is a better pointer, but I think “Bay Area alignment community” is still a bit too broad. I think e.g. Lightcone and MIRI are very separate from Constellation and Open Phil and it doesn’t make sense to put them into the same bucket.
I am kinda confused by these comments. Obviously you can draw categories at higher or lower levels of resolution. Saying that it doesn’t make sense to put Lightcone and MIRI in the same bucket as Constellation and OpenPhil, or Bengio in the same bucket as the Bay Area alignment community, feels like… idk, like a Protestant Christian saying it doesn’t make sense to put Episcopalians and Baptists in the same bucket. The differences loom large for insiders but are much smaller for outsiders.
You might be implicitly claiming that AI safety people aren’t very structurally power-seeking unless they’re Bay Area EAs. I think this is mostly false, and in fact it seems to me that people often semi-independently reason themselves into power-seeking strategies after starting to care about AI x-risk. I also think that most proposals for AI safety regulation are structurally power-seeking, because they will make AI safety people arbitrators of which models are allowed (implicitly or explicitly). But a wide range of AI safety people support these (and MIRI, for example, supports some of the strongest versions of these).
I’ll again highlight that just because an action is structurally power-seeking doesn’t make it a bad idea. It just means that it comes along with certain downsides that people might not be tracking.
I don’t know, I think I’ll defend that Lightcone is genuinely not very structurally power-seeking, and neither is MIRI, and also that both of these organizations are not meaningfully part of some kind of shared power-base with most of the EA AI Alignment community in Berkeley (Lightcone is banned from receiving any kind of OpenPhil funding, for example).
I think you would at least have to argue that there are two separate power-seeking institutions here, each seeking power for themselves, but I also do genuinely think that Lightcone is not a very structurally power-seeking organization (I feel a bit more confused about MIRI, though would overall still defend that).
Suppose I’m an atheist, or a muslim, or a jew, and an Episcopalian living in my town came up to me and said “I’m not meaningfully in a shared power-base with the Baptists. Sure, there’s a huge amount of social overlap, we spend time at each other’s churches, and we share many similar motivations and often advocate for many of the same policies. But look, we often argue about theological disagreements, and also the main funder for their church doesn’t fund our church (though of course many other funders fund both Baptists and Episcopalians).”
I just don’t think this is credible, unless you’re using a very strict sense of “meaningfully”. But at that level of strictness it’s impossible to do any reasoning about power-bases, because factional divides are fractal. What it looks like to have a power-base is to have several broadly-aligned and somewhat-overlapping factions that are each seeking power for themselves. In the case above, the Episcopalian may legitimately feel very strongly about their differences with the Baptists, but this is a known bug in human psychology: the narcissism of small differences.
Though I am happy to agree that Lightcone is one of the least structurally power-seeking entities in the AI safety movement, and I respect this. (I wouldn’t say the same of current-MIRI, which is now an advocacy org focusing on policies that strongly centralize power. I’m uncertain about past-MIRI.)
I think you’re making an important point here, and I agree that given the moral valence people here will be quite tempted to gerrymander themselves out of the relevant categories (also, pretending to be the underdog, or participating in bravery debates, is an extremely common pattern in conversations like this).
I do agree that a few years ago things would have been better modeled as a shared power base, but I think a lot of this has genuinely changed post-FTX.
I also think there are really crucial differences in how much different sub-parts of this ecosystem are structurally-power-seeking, and that those are important to model (and also importantly that some of the structural power-seeking-ness of some these parts puts those parts into conflict with the others, in as much as they they are not participating in the same power-seeking strategies).
Like, the way I have conceptualized most of my life’s work so far has been to try to build neutral non-power-seeking institutions, that inform other people and help them make better decisions, and that generally try to actively avoid plans that route through “me and my friends get powerful and then solve our problems” because I think this kind of plan will almost inevitably end up just running into conflict with other power-seeking entities and then spend most of its resources on that.
And I think there are thousands of others who have similar intuitions about how to relate to the world, within the broader AI-Alignment/Rationality ecosystem, and I think those parts are genuinely not structurally power-seeking in the same way. And I agree they are all very enmeshed with parts that are power-seeking, and this makes distinguishing them harder, but I think there are really quite important differences.
I don’t actually know how much we disagree. I do think that modeling the AI Safety space as a single power-base is wrong and not really carving reality along structural lines. Like, I don’t think the situation is “look, we often argue theological disagreements”, I think the situation is often much more “these two things that care about safety are actively in-conflict with each other and are taking active steps to eradicate the other party” and at that point I just really don’t think it makes sense to model these as one.
This is the thing that feels most like talking past each other. You’re treating this as a binary and it’s really, really not a binary. Some examples:
There are many circumstances in which it’s useful to describe “the US government” as a power base, even though republicans and democrats are often at each other’s throats, and also there are many people within the US government who are very skeptical of it (e.g. libertarians).
There are many circumstances in which it’s useful to describe “big tech” as a power base, even though the companies in it are competing ferociously.
I’m not denying that there are crucial differences to model here. But this just seems like the wrong type of argument to use to object to accusations of gerrymandering, because every example of gerrymandering will be defended with “here are the local differences that feel crucial to me”.
So how should we evaluate this in a principled way? One criterion: how fierce is the internal fighting? Another: how many shared policy prescriptions do the different groups have? On the former, while I appreciate that you’ve been treated badly by OpenPhil, I think “trying to eradicate each other” is massive hyperbole. I would maybe accept that as a description of the fighting between AI ethics people, AI safety people, and accelerationists, but the types of weapons they use (e.g. public attacks in major news outlets, legal measures, etc) are far harsher than anything I’ve heard about internally inside AI safety. If I’m wrong about this, I’d love to know.
On the latter: when push comes to shove, a lot of these different groups band together to support stuff like interpretability research, raising awareness of AI risk, convincing policymakers it’s a big deal, AI legislation, etc. I’d believe that you don’t do this; I don’t believe that there are thousands of others who have deliberately taken stances against these things, because I think there are very few people as cautious about this as you (especially when controlling for influence over the movement).
As above, I respect this a lot.
Yeah, I think this makes sense. I wasn’t particularly trying to treat it as just a binary, and I agree that there are levels of abstraction where it makes sense to model these things as one, and this also applies to the whole extended AI-Alignment/EA/Rationality ecosystem.
I do feel like this lens loses a lot of its validity at the highest levels of abstraction (like, I think there is a valid sense in which you should model AI x-risk concerned people as part of big-tech, but also, if you do that, you kind of ignore the central dynamic that is going on with the x-risk concerned people, and maybe that’s the right call sometimes, but I think in terms of “what will the future of humanity be” in making that simplification you have kind of lost the plot)
My best guess is you are underestimating the level of adversarialness going on, though I am also uncertain about this. I would be interested in sharing notes some time.
As one concrete example, my guess is we both agree it would not make sense to model OpenAI as part of the same power base. Like, yeah, a bunch of EAs used to be on OpenAIs board, but even during that period, they didn’t have much influence on OpenAI. I think basically all-throughout it made most sense to model these as separate communities/institutions/groups with regards to power-seeking.
I also personally do straightforwardly think that most of the efforts of the extended EA-Alignment ecosystem are bad, and would give up a large chunk of my resources to reduce their influence on the world. Not because I am in a competition between them (indeed, I think I do tend to get more power as they get more power), but because I think they genuinely have really bad consequences for the world. I also care a lot about cooperativeness, and so I don’t tend to go around getting into conflicts with lots of collateral damage or reciprocal escalation, but also, I have definitely taken actions within the bounds of what seems reasonable that have aimed at getting the EA community to shut down or disappear (and will probably continue to do so).
Do you have a diagnosis of the root cause of this?
Why not try to reform EA instead? (This is related to my previous question. If we could diagnose what’s causing EA to be harmful, maybe we can fix it?)
I have spent like 40% of the last 1.5 years trying to reform EA. I think I had a small positive effect, but it’s also been extremely tiring and painful and I consider my duty with regards to this done. Buy in for reform in leadership is very low, and people seem primarily interested in short term power seeking and ass-covering.
The memo I mentioned in another comment has a bunch of analysis I’ll send it to you tomorrow when I am at my laptop.
For some more fundamental analysis I also have this post, though it’s only a small part of the picture: https://www.lesswrong.com/posts/HCAyiuZe9wz8tG6EF/my-tentative-best-guess-on-how-eas-and-rationalists
As a datapoint, I think I was likely underestimating the level of adversarialness going on & this thread makes me less likely to lump Lightcone in with other parts of the community.
@habryka are you able to share details/examples RE the actions you’ve taken to get the EA community to shut down or disappear?
I would also be interested in more of your thoughts on this. (My Habryka sim says something like “the episemtic norms are bad, and many EA groups/individuals are more focused on playing status games. They are spending their effort saying and doing things that they believe will give them influence points, rather than trying to say true and precise things. I think society’s chances of getting through this critical period would be higher if we focused on reasoning carefully about these complex domains, making accurate arguments, and helping ourselves & others understand the situation.” Curious if this is roughly accurate or if I’m missing some important bits. Also curious if you’re able to expand on this or provide some examples of the things in the category “Many EA people think X action is a good thing to do, but I disagree.”
I have a memo I thought I had shared with you at one point that I wrote for EA Coordination Forum 2023. It has a bunch of wrong stuff in it, and fixing it has been too difficult, but I could share it with you privately (with disclaimers on what is wrong). Feel free to DM me if I haven’t.
Sharing my memo at the coordination forum is one such action I have taken. I have also advocated for various people to be fired, and have urged a number of external and internal stakeholders to reconsider their relationship with EA. Most of this has been kind of illegible and flaily , with me not really knowing how to do anything in the space without ending up with a bunch of dumb collateral damage and reciprocal escalation.
I would be keen to see the memo if you’re comfortable sharing it privately.
Sure, sent a DM.
I would also be interested to see this. Also, could you clarify:
Are you talking here about ‘the extended EA-Alignment ecosystem’, or do you mean you’ve aimed at getting the global poverty/animal welfare/other non-AI-related EA community to shut down or disappear?
The leadership of these is mostly shared. There are many good parts of EA, and reform would be better than shutting down, but reform seems unlikely at this point.
My world model mostly predicts effects on technological development and the long term future dominate, so in as much as the non-AI related parts of EA are good or bad, I think what matters is their effect on that. Mostly the effect seems small, and quibbling over the sign doesn’t super seem worth it.
I do think there is often an annoying motte and bailey going on where people try to critique EA for their negative effects in the important things, and those get redirected to “but you can’t possibly be against bednets”, and in as much as the bednet people are willingly participating in that (as seems likely the case for e.g. Open Phil’s reputation), that seems bad.
What do you mean the leadership is shared? That seems much less true now Effective Ventures have started spinning off their orgs. It seems like the funding is still largely shared, but that’s a different claim.
Wow, what a striking thing for you to say without further explanation.
Personally, I’m a fan of EA. Also am an EA―signed the GWWC/10% pledge and all that. So, I wonder what you mean.
I mean, it’s in the context of a discussion with Richard who knows a lot of my broader thoughts on EA stuff. I’ve written quite a lot of comments with my thoughts on EA on the EA Forum. I’ve also written a bunch more private memos I can share with people who are interested.
Hmm, I’m not totally sure. At various points:
OpenAI was the most prominent group talking publicly about AI risk
Sam Altman was the most prominent person talking publicly about large-scale AI regulation
A bunch of safety-minded people at OpenAI were doing OpenAI’s best capabilities work (GPT-2, GPT-3)
A bunch of safety-minded people worked on stuff that led to ChatGPT (RLHF, John Schulman’s team in general)
Elon tried to take over, and the people who opposed that were (I’m guessing) a coalition of safety people and the rest of OpenAI
It’s really hard to step out of our own perspective here, but when I put myself in the perspective of, say, someone who doesn’t believe in AGI at all, these all seem pretty indicative of a situation where OpenAI and AI safety people were to a significant extent building a shared power base, and just couldn’t keep that power base together.
[this comment is irrelevant to the point you actually care about and is just nit-picking about the analogy]
There is a pretty big divide between “liberal” and “conservative” Christianity that is in some ways bigger than the divide between different denominations. In the US, people who think of themselves as “Episcopalians” tend to be more liberal than people who call themselves “Baptists”. In the rest of this comment, I’m going to assume we’re talking about conservative Anglicans rather than Episcopalians (those terms referring to the same denominational family), and also about conservative Baptists, since they’re more likely to be up to stuff / doing meaningful advocacy, and more likely to care about denominational distinctions. That said, liberal Episcopalians and liberal Baptists are much more likely to get along, and also openly talk about how they’re in cooperation.
My guess is that conservative Anglicans and Baptists don’t spend much time at each other’s churches, at least during worship, given that they tend to have very different types of services and very different views about the point of worship (specifically about the role of the eucharist). Also there’s a decent chance they don’t allow each other to commune at their church (more likely on the Baptist end). Similarly, I don’t think they are going to have that much social overlap, altho I could be wrong here. There’s a good chance they read many of the same blogs tho.
In terms of policy advocacy, on the current margin they are going to mostly agree—common goals are going to be stuff like banning abortion, banning gay marriage, and ending the practice of gender transition. Anglican groups are going to be more comfortable with forms of state Christianity than Baptists are, altho this is lower-priority for both, I think. They are going to advocate for their preferred policies in part by denominational policy bodies, but also by joining common-cause advocacy organizations.
Both Anglican and Baptist churches are largely going to be funded by members, and their members are going to be disjoint. That said it’s possible that their policy bodies will share large donor bases.
They are also organized pretty differently internally: Anglicans have a very hierarchical structure, and Baptists having a very decentralized structure (each congregation is its own democratic policy, and is able to e.g. vote to remove the pastor and hire a new one)
Anyway: I’m pretty sympathetic to the claim of conservative Anglicans and Baptists being meaningfully distinct power bases, altho it would be misleading to not acknowledge that they’re both part of a broader conservative Christian ecosystem with shared media sources, fashions, etc.
Part of the reason this analogy didn’t vibe for me is that Anglicans and Baptists are about as dissimilar as Protestants can get. If it were Anglicans and Presbyterians or Baptists and Pentecostals that would make more sense, as those denominations are much more similar to each other.
Why?
The reason for the ban is pretty crux-y. Are Lighitcone banned because OpenPhil dislikes you, or because you’re too close so that would be a conflict of interests, or something else.
Good Ventures have banned OpenPhil from recommending grants to organizations working in the “rationalist community building” space (including for their non-”rationalist community building” work). I understand this to be because Dustin doesn’t believe in that work and feels he suffers a bunch of reputational damage for funding it (IIRC, he said he’d be willing to suffer that reputational damage if he was personally excited by it). Lots more detail on the discussion on this post.
“Bay Area EA alignment community”/”Bay Area EA community”? (Most EAs in the Bay Area are focused on alignment compared to other causes.)
I’m imagining a future post about how society has defense mechanisms against people trying to focus on legitimacy[1] advising us to stop doing that so much :P
1: Public criticism of people trying to persuade the public.
2: Powerful actors refusing to go along with distributed / cooperative plans for the future.
3: Public criticism of anyone trying to make Our Side give up power over the future.
4: Conspiracy theories about what The Man is trying to persuade you of.
5. The evolution of an accelerationist movement who want to avoid anti-centralization measures from society insofar as they require limiting the size of individual advances.
What’s the FATE community? Fair AI and Tech Ethics?
Fairness, Accountability, Transparency, Ethics. I think this research community/area is often also called “AI ethics”
First, I think that thinking about and highlighting these kind of dynamics is important.
I expect that, by default, too few people will focus on analyzing such dynamics from a truth-seeking and/or instrumentally-useful-for-safety perspective.
That said:
It seems to me you’re painting with too broad a brush throughout.
At the least, I think you should give some examples that lie just outside the boundary of what you’d want to call [structural power-seeking].
Structural power-seeking in some sense seems unavoidable. (AI is increasingly powerful; influencing it implies power)
It’s not clear to me that you’re sticking to a consistent sense throughout.
E.g. “That makes AI safety strategies which require power-seeking more difficult to carry out successfully.” seems false in general, unless you mean something fairly narrow by power-seeking.
An important aspect is the (perceived) versatility of power:
To the extent that it’s [general power that could be efficiently applied to any goal], it’s suspicious.
To the extent that it’s [specialized power that’s only helpful in pursuing a narrow range of goals] it’s less suspicious.
Similarly, it’s important under what circumstances the power would become general: if I take actions that can only give me power by routing through [develops principled alignment solution], that would make a stated goal of [develop principled alignment solution] believable; it doesn’t necessarily make some other goal believable—e.g. [...and we’ll use it to create this kind of utopia].
Increasing legitimacy is power-seeking—unless it’s done in such a way that it implies constraints.
That said, you may be right that it’s somewhat less likely to be perceived as such.
Aiming for [people will tend to believe whatever I say about x] is textbook power-seeking wherever [influence on x] implies power.
We’d want something more like [people will tend to believe things that I say about x, so long as their generating process was subject to [constraints]].
Here it’s preferable for [constraints] to be highly limiting and clear (all else equal).
I’d say that “prioritizing competence” begs the question.
What is the required sense of “competence”?
For the most important AI-based decision-making, I doubt that ”...broadly competent, and capable of responding sensibly...” is a high enough bar.
In particular, ”...because they don’t yet take AGI very seriously” is not the only reason people are making predictable mistakes.
″...as AGI capabilities and risks become less speculative...”
Again, this seems too coarse-grained:
Some risks becoming (much) clearer does not entail all risks becoming (much) clearer.
Understanding some risks well while remaining blind to others, does not clearly imply safer decision-making, since “responding sensibly” will tend to be judged based on [risks we’ve noticed].
I mostly agree with this post, but while I do think that the AI safety movement probably should try to at least be more cooperative with other movements, I disagree with the claim in the comments section that AI safety shouldn’t try to pick a political fight in the future around open-source.
(I agree it probably picked that fight too early.)
The reason is that there’s a non-trivial chance that alignment is plausibly solvable for human-level AI systems ala AI control, even if they are scheming, so long as the lab has control over the AIs, which as a corollary also means you can’t open-source/open-weights the model.
More prosaically, AI misuse can be a problem, and the most important point here is that open-source/open-weighting the model widens the set of people who can change the AI, which unfortunately also means that there is a larger and larger chance for misuse with more people that know how to change the AI.
So I do think there’s a non-trivial chance that AI safety eventually will have to suffer political costs to ban/severely restrict open-sourcing AI.
I disagree with this claim. It seems pretty clear that the world has defense mechanisms against
disempowering other people or groups
breaking norms in the pursuit of power
But it is possible to be power-seeking in other ways. The Gates Foundation has a lot of money and wants other billionaires’ money for its cause too. It influences technology development. It has to work with dozens of governments, sometimes lobbying them. Normal think tanks exist to gain influence over governments. Harvard University, Jane Street, and Goldman Sachs recruit more elite students than all the EA groups and control more money than OpenPhil. Jane Street and Goldman Sachs guard private information worth billions of dollars. The only one with a negative reputation is Goldman Sachs, which is due to perceived greed rather than power-seeking per se. So why is there so much more backlash against AI safety? I think it basically comes down to a few factors:
We are bending norms (billionaire funding for somewhat nebulous causes) and sometimes breaking them (FTX financial and campaign finance crimes)
We are not able to credibly signal that we won’t disempower others.
MIRI wanted a pivotal act to happen, and under that plan nothing would stop MIRI from being world dictators
AI is inherently a technology with world-changing military and economic applications whose governance is unsolved
An explicitly consequentialist movement will take power by any means necessary, and people are afraid of that.
AI labs have incentives to safetywash, making people wary of safety messaging.
The preexisting AI ethics and open-source movements think their cause is more important and x-risk is stealing attention.
AI safety people are bad at diplomacy and communication, leading to perceptions that they’re the same as the AI labs or have some other sinister motivation.
That said, I basically agree with section 3. Legitimacy and competence are very important. But we should not confuse power-seeking—something the world has no opinion on—with what actually causes backlash.
I think there’s something importantly true about your comment, but let me start with the ways I disagree. Firstly, the more ways in which you’re power-seeking, the more defense mechanisms will apply to you. Conversely, if you’re credibly trying to do a pretty narrow and widely-accepted thing, then there will be less backlash. So Jane Street is power-seeking in the sense of trying to earn money, but they don’t have much of a cultural or political agenda, they’re not trying to mobilize a wider movement, and earning money is a very normal thing for companies to do, it makes them one of thousands of comparably-sized companies. (Though note that there is a lot of backlash against companies in general, which are perceived to have too much power. This leads a wide swathe of people, especially on the left, and especially in Europe, to want to greatly disempower companies because they don’t trust them.)
Meanwhile the Gates Foundation has a philanthropic agenda, but like most foundations tries to steer clear of wider political issues, and also IIRC tries to focus on pretty object-level and widely-agreed-to-be-good interventions. Even so, it’s widely distrusted and feared, and Gates has become a symbol of hated global elites, to the extent where there are all sorts of conspiracy theories about him. That’d be even worse if the foundation were more political.
Lastly, it seems a bit facile to say that everyone hates Goldman due to “perceived greed rather than power-seeking per se”. A key problem is that people think of the greed as manifesting through political capture, evading regulatory oversight, deception, etc. That’s part of why it’s harder to tar entrepreneurs as greedy: it’s just much clearer that their wealth was made in legitimate ways.
Now the sense in which I agree: I think that “gaining power triggers to defense mechanisms” is a good first pass, but also we definitely want a more mechanistic explanation of what the defense mechanisms are, what triggers them, etc—in particular so we don’t just end up throwing our hands in the air and concluding that doing anything is hopeless and scary. And I also agree that your list is a good start. So maybe I’d just want to add to it stuff like:
having a broad-ranging political agenda (that isn’t near-universally agreed to be good)
having non-transparent interactions with many other powerful actors
having open-ended scope to expand
And maybe a few others (open to more suggestions).
Given that OP works for OpenAI, this post reads like when Marc Andreesen complains about the “gigantic amount of money in AI safety”.
I think I disagree with some of the claims in this post and I’m mostly sympathetic with the points Akash raised in his comments. Relatedly, I’d like to see a more rigorous comparison between the AI safety community (especially EA/Rationality parts) and relevant reference class movements such as the climate change community.
That said, I think it’s reasonable to have a high prior on people ending up aiming for inappropriate levels of power-seeking when taking ambitious actions in the world so it’s important to keep these things in mind.
In addition to your two “recommendations” of focusing on legitimacy and competence, I’d add two additional candidates:
1. Being careful about what de facto role models or spokespeople the AI safety community “selects”. It seems crucial to avoid another SBF.
2. Enabling currently underrepresented perspectives to contribute in well-informed, competent ways.
I agree with many of the points expressed in this post, though something doesn’t sit right with me about some of the language/phrasing used.
For example, the terms “power-seeking” and “cooperative” feel somewhat loaded. It’s not so much that they’re inaccurate (when read in a rather precise and charitable way) but moreso that it feels like they have pretty strong connotations and valences.
Consider:
Alice: I’m going to a networking event tonight; I might meet someone who can help me get a job in housing policy!
Bob: That’s a power-seeking move.
Alice: Uh… what?
Bob: Well, you know, if you get a job, then that increases your power. It increases your ability to influence the world.
Alice: I guess me getting a job does technically increase my ability to influence the world, so if that’s how you want to define “power-seeking” then you’re technically correct, but that’s not really the first word that comes to mind here. We usually use the word “power-seeking” to refer to bad people who are overly concerned with power-seeking– usually for personal or selfish gain at the expense of others. And I don’t really think that’s what I’m doing.
Separately, I’d be curious how you’re defining “cooperative” in this context. (Does it mean “not power-seeking” or “strategies that focus more on sharing information with the public and making sure that competent people are in charge regardless of their views on AI safety”, or something else?)
I feel like it’s pretty accurate to say that a community of job seekers are stucturally power-seeking, and this is kind of interesting though it also defangs a bunch of the social intuition against power-seeking to look at it through a bloodless “structural” lens.
Would you feel the same way about “influence-seeking”, which I almost went with?
Note also that, while Bob is being a dick about it, the dynamic in your scenario is actually a very common one. Many people are social climbers who use every opportunity to network or shill themselves, and this does get noticed and reflects badly on them. We can debate about the precise terminology to use (which I think should probably be different for groups vs individuals) but if Alice just reasoned from the top down about how to optimize her networking really hard for her career, in a non-socially-skilled way, a friend should pull her aside and say “hey, communities often have defense mechanisms against the thing you’re doing, watch out”.
Influence-seeking activates the same kind of feeling though it’s less strong than for “power-seeking.”
+1. I suspect we’d also likely agree that if Alice just stayed in her room all day and only talked to her friends about what ideal housing policy should look like, someone should pull her aside and say “hey, you might want to go to some networking events and see if you can get involved in housing policy, or at least see if there are other things you should be doing to become a better fit for housing policy roles in the future.”
In this case, it’s not the desire to have influence that is the core problem. The core problem is whether or not Alice is taking the right moves to have the kind of influence she wants.
Bringing it back to the post– I think I’d be excited to see you write something more along the lines of “What mistakes do many people in the AIS community make when it comes to influence-seeking?” I suspect this would be more specific and concrete. I think the two suggestions at the end (prioritize legitimacy & prioritize competence) start to get at this.
Otherwise, I feel like the discussion is going to go into less productive directions, where people who already agree with you react like “Yeah, Alice is such a status-seeker! Stick it to her!” and people who disagree with you are like “Wait what? Alice is just trying to network so she can get a job in housing policy– why are you trying to cast that as some shady plot? Should she just stay in her room and write blog posts?”
I think I actually disagree with this. It feels like your framing is something like: “if you pursue power in the wrong ways, you’ll have problems. If you pursue power in the right ways, you’ll be fine”.
And in fact the thing I’m trying to convey is more like “your default assumption should be that accumulating power triggers defense mechanisms, and you might think you can avoid this tradeoff by being cleverer, but that’s mostly an illusion”. (Or, in other words, it’s faulty CDT-style thinking.)
Based on this I actually think that “structurally power-seeking” is the right term after all, because it’s implicitly asserting that you can’t separate out these two things (“power-seeking” and “gaining power in ‘the right ways’”).
Note also that my solutions at the end are not in fact strategies for accumulating power in ‘the right ways.’ They’re strategies for achieving your goals while accumulating less power. E.g. prioritizing competence means that you’ll try less hard to get “your” person into power. Prioritizing legitimacy means you’re making it harder to get your own ideas implemented, when others disagree.
(FWIW I think that on the level of individuals the tradeoff between accumulating power and triggering defense mechanisms is often just a skill issue. But on the level of movements the tradeoff is much harder to avoid—e.g. you need to recruit politically-savvy people, but that undermines your truth-seeking altruistically-motivated culture.)
I’m not quite sure where we disagree, but if I had to put my finger on it, it’s something like “I don’t think that people would be offput by Alice going to networking events to try to get a job in housing policy, and I don’t think she would trigger any defense mechanisms.”
Specific question for you: Would you say that “Alice going to a networking event” (assume she’s doing it socially conventional/appropriate ways) would count as structural power-seeking? And would you discourage her from going?
More generally, there are a lot of things you’re labeling as “power-seeking” which feel inaccurate or at least quite unnatural to label as “power-seeking”, and I suspect that this will lead to confusion (or at worst, lead to some of the people you want to engage dismissing your valid points).
I think in your frame, Alice going to networking events would be seen as “there are some socially-accepted ways of seeking power” and in my frame this would be seen as “it doesn’t really make sense to call this power-seeking, as most people would find it ridiculous/weird to apply the label ‘power-seeking’ to an action as simple as going to a networking event.”
I’m also a bit worried about a motte-and-bailey here. The bold statement is “power-seeking (which I’m kind of defining as anything that increases your influence, regardless of how innocuous or socially accepted it seems) is bad because it triggers defense mechanisms” and the more moderated statement is “there are some specific ways of seeking power that have important social costs, and I think that some/many actors in the community underestimate those costs. Also, there are many strategies for achieving your goals that don’t involve seeking power, and I think some/many people in the community are underestimating those.”
I agree with the more moderated claims.
I think you’re doing a paradox of the heap here. One grain of sand is obviously not a heap, but a million obviously is. Similarly, Alice going to one networking event is obviously not power-seeking, but Alice taking every opportunity she can to pitch herself to the most powerful people she can find obviously is. I’m identifying a pattern of behavior that AI safety exhibits significantly more than other communities, and the fair analogy is to a pattern of behavior that Alice exhibits significantly more than other people around her.
I flagged several times in the post that I was not claiming that power-seeking is bad overall, just that it typically has this one bad effect.
I repudiated this position in my previous comment, where I flagged that I’m trying to make a claim not about specific ways of seeking power, but rather about the outcome of gaining power in general.
That’s clarifying. In particular, I hadn’t realized you meant to imply [legitimacy of the ‘community’ as a whole] in your post.
I think both are good examples in principle, given the point you’re making. I expect neither to work in practice, since I don’t think that either [broad competence of decision-makers] or [increased legitimacy of broad (and broadening!) AIS community] help us much at all in achieving our goals.
To achieve our goals, I expect we’ll need something much closer to ‘our’ people in power (where ‘our’ means [people with a pretty rare combination of properties, conducive to furthering our goals]), and increased legitimacy for [narrow part of the community I think is correct].
I think we’d need to go with [aim for a relatively narrow form of power], since I don’t think accumulating less power will work. (though it’s a good plan, to the extent that it’s possible)
While this seems like a reasonable opinion in isolation, I also read the thread where you were debating Rohin and holding the position that most technical AI safety work was net-negative.
And so basically I think that you, like Eliezer, have been forced by (according to me, incorrect) analyses of the likelihood of doom to the conclusion that only power-seeking strategies will work.
From the inside, for you, it feels like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
From the outside, for me, it feels like “The doomers have a cognitive bias that ends up resulting in them overrating power-seeking strategies, and this is not a coincidence but instead driven by the fact that it’s disproportionately easy for cognitive biases to have this effect (given how the human mind works)”.
Fortunately I think most rationalists have fairly good defense mechanisms against naive power-seeking strategies, and this is to their credit. So the main thing I’m worried about here is concentrating less force behind non-power-seeking strategies.
On your bottom line, I entirely agree—to the extent that there are non-power-seeking strategies that’d be effective, I’m all for them. To the extent that we disagree, I think it’s about [what seems likely to be effective] rather than [whether non-power-seeking is a desirable property].
Constrained-power-seeking still seems necessary to me. (unfortunately)
A few clarifications:
I guess most technical AIS work is net negative in expectation. My ask there is that people work on clearer cases for their work being positive.
I don’t think my (or Eliezer’s) conclusions on strategy are downstream of [likelihood of doom]. I’ve formed some model of the situation. One output of the model is [likelihood of doom]. Another is [seemingly least bad strategies]. The strategies are based around why doom seems likely, not (primarily) that doom seems likely.
It doesn’t feel like “I am responding to the situation with the appropriate level of power-seeking given how extreme the circumstances are”.
It feels like the level of power-seeking I’m suggesting seems necessary is appropriate.
My cognitive biases push me away from enacting power-seeking strategies.
Biases aside, confidence in [power seems necessary] doesn’t imply confidence that I know what constraints I’d want applied to the application of that power.
In strategies I’d like, [constraints on future use of power] would go hand in hand with any [accrual of power].
It’s non-obvious that there are good strategies with this property, but the unconstrained version feels both suspicious and icky to me.
Suspicious, since [I don’t have a clue how this power will need to be directed now, but trust me—it’ll be clear later (and the right people will remain in control until then)] does not justify confidence.
To me, you seem to be over-rating the applicability of various reference classes in assessing [(inputs to) likelihood of doom]. As I think I’ve said before, it seems absolutely the correct strategy to look for evidence based on all the relevant reference classes we can find.
However, all else equal, I’d expect:
Spending a long time looking for x, makes x feel more important.
[Wanting to find useful x] tends to shade into [expecting to find useful x] and [perceiving xs as more useful than they are].
Particularly so when [absent x, we’ll have no clear path to resolving hugely important uncertainties].
The world doesn’t owe us convenient reference classes. I don’t think there’s any way around inside-view analysis here—in particular, [how relevant/significant is this reference class to this situation?] is an inside-view question.
That doesn’t make my (or Eliezer’s, or …’s) analysis correct, but there’s no escaping that you’re relying on inside-view too. Our disagreement only escapes [inside-view dependence on your side] once we broadly agree on [the influence of inside-view properties on the relevance/significance of your reference classes]. I assume that we’d have significant disagreements there.
Though it still seems useful to figure out where. I expect that there are reference classes that we’d agree could clarify various sub-questions.
In many non-AI-x-risk situations, we would agree—some modest level of inside-view agreement would be sufficient to broadly agree about the relevance/significance of various reference classes.
I can imagine plausible mechanisms for how the first four backlash examples were a consequence of perceived power-seeking from AI safetyists, but I don’t see one for e/acc. Does someone have one?
Alternatively, what reason do I have to expect that there is a causal relationship between safetyist power-seeking and e/acc even if I can’t see one?
e/acc has coalesced in defense of open-source, partly in response to AI safety attacks on open-source. This may well lead directly to a strongly anti-AI-regulation Trump White House, since there are significant links between e/acc and MAGA.
I think of this as a massive own goal for AI safety, caused by focusing too much on trying to get short-term “wins” (e.g. dunking on open-source people) that don’t actually matter in the long term.
IMO this overstates the influence of OS stuff on the broader e/acc movement.
My understanding is that the central e/acc philosophy is around tech progress. Something along the lines of “we want to accelerate technological progress and AGI progress as quickly as possible, because we think technology is extremely awesome and will lead to a bunch of awesome+cool outcomes.” The support for OS is toward the ultimate goal of accelerating technological progress.
In a world where AI safety folks didn’t say/do anything about OS, I would still suspect clashes between e/accs and AI safety folks. AI safety folks generally do not believe that maximally fast/rapid technological progress is good for the world. This would inevitably cause tension between the e/acc worldview and the worldview of many AI safety folks, unless AI safety folks decided never to propose any regulations that could cause us to deviate from the maximally-fast pathways to AGI. This seems quite costly.
(Separately, I agree that “dunking on open-source people” is bad and that people should do less “dunking on X” in general. I don’t really see this as an issue with prioritizing short-term wins so much as getting sucked into ingroup vs. outgroup culture wars and losing sight of one’s actual goals.)
Similar point here– I think it’s extremely likely this would’ve happened anyways. A community that believes passionately in rapid or maximally-fast AGI progress already has strong motivation to fight AI regulations.
There’s a big difference between e/acc as a group of random twitter anons, and e/acc as an organized political force. I claim that anti-open-source sentiment from the AI safety community played a significant role (and was perhaps the single biggest driver) in the former turning into the latter. It’s much easier to form a movement when you have an enemy. As one illustrative example, I’ve seen e/acc flags that are a version of the libertarian flag saying “come and take it [our GPUs]”. These are a central example of an e/acc rallying cry that was directly triggered by AI governance proposals. And I’ve talked to several principled libertarians who are too mature to get sucked into a movement by online meme culture, but who have been swung in that direction due to shared opposition to SB-1047.
Consider, analogously: Silicon Valley has had many political disagreements with the Democrats over the last decade—e.g. left-leaning media has continuously been very hostile to Silicon Valley. But while the incentives to push back were there for a long time, the organized political will to push back has only arisen pretty recently. This shows that there’s a big difference between “in principle people disagree” and “actual political fights”.
This reasoning seems far too weak to support such a confident conclusion. There was a lot of latent pro-innovation energy in Silicon Valley, true, but the ideology it gets channeled towards is highly contingent. For instance, Vivek Ramaswamy is a very pro-innovation, anti-regulation candidate who has no strong views on AI. If AI safety hadn’t been such a convenient enemy then plausibly people with pro-innovation views would have channeled them towards something closer to his worldview.
Separately, do you think “organized opposition” could have ever been avoided? It sounds like you’re making two claims:
When AI safety folks advocate for specific policies, this gives opponents something to rally around and makes them more likely to organize.
There are some examples of specific policies (e.g., restrictions on OS, SB1047) that have contributed to this.
Suppose no one said anything about OS, and also (separately) SB1047 never happened. Presumably, at some point, som groups start advocating for specific policies that go against the e/acc worldview. At that point, it seems like you get the organized resistance.
So I’m curious: What does the Ideal Richard World look like? Does it mean people are just much more selective about which policies to advocate for? Under what circumstances is it OK to advocate for something that will increase the political organization of opposing groups? Are there examples of policies that you think are so important that they’re worth the cost (of giving your opposition something to rally around?) To what extent is the deeper crux the fact that you’re less optimistic about the policy proposals actually helping?
My two suggestions:
People stop aiming to produce proposals that hit almost all the possible worlds. By default you should design your proposal to be useless in, say, 20% of the worlds you’re worried about (because trying to get that last 20% will create really disproportionate pushback); or design your proposal so that it leaves 20% of the work undone (because trusting that other people will do that work ends up being less power-seeking, and more robust, than trying to centralize everything under your plan). I often hear people saying stuff like “we need to ensure that things go well” or “this plan needs to be sufficient to prevent risk”, and I think that mindset is basically guaranteed to push you too far towards the power-seeking end of the spectrum. (I’ve added an edit to the end of the post explaining this.)
As a specific example of this, if your median doom scenario goes through AGI developed/deployed by centralized powers (e.g. big labs, govts) I claim you should basically ignore open-source. Sure, there are some tail worlds where a random hacker collective beats the big players to build AGI; or where the big players stop in a responsible way, but the open-source community doesn’t; etc. But designing proposals around those is like trying to put out candles when your house is on fire. And I expect there to be widespread appetite for regulating AI labs from govts, wider society, and even labs themselves, within a few years’ time, unless those proposals become toxic in the meantime—and making those proposals a referendum on open-source is one of the best ways I can imagine to make them toxic.
(I’ve talked to some people whose median doom scenario looks more like Hendrycks’ “natural selection” paper. I think it makes sense by those people’s lights to continue strongly opposing open-source, but I also think those people are wrong.)
I think that the “we must ensure” stuff is mostly driven by a kind of internal alarm bell rather than careful cost-benefit reasoning; and in general I often expect this type of motivation to backfire in all sorts of ways.
Why do you assume that open source equates to small hacker groups??? The largest supplier of open weights is Meta AI and their recent Llama-405B rivals SOTA models.
I think your concrete suggestions such as these are very good. I still don’t think you have illustrated the power-seeking aspect you are claiming very well (it seems to be there for EA, but less so for AI safety in general).
In short, I think you are conveying certain important, substantive points, but are choosing a poor framing.
Thanks for this clarification– I understand your claim better now.
Do you have any more examples of evidence that suggests that AI safety caused (or contributed meaningfully to) this shift from “online meme culture” to “organized political force?” This seems like the biggest crux imo.
No legible evidence jumps to mind, but I’ll keep an eye out. Inherently this sort of thing is pretty hard to pin down, but I do think I’m one of the handful of people that most strongly bridges the AI safety and accelerationist communities on a social level, and so I get a lot of illegible impressions.
Do you see this as likely to have been avoidable? How?
I agree that it’s undesirable. Less clear to me that it’s an “own goal”.
Do you see other specific things we’re doing now (or that we may soon do) that seem likely to be future-own-goals?
[all of the below is “this is how it appears to my non-expert eyes”; I’ve never studied such dynamics, so perhaps I’m missing important factors]
I expect that, even early on, e/acc actively looked for sources of long-term disagreement with AI safety advocates, so it doesn’t seem likely to me that [AI safety people don’t emphasize this so much] would have much of an impact.
I expect that anything less than a position of [open-source will be fine forever] would have had much the same impact—though perhaps a little slower. (granted, there’s potential for hindsight bias here, so I shouldn’t say “I’m confident that this was inevitable”, but it’s not at all clear to me that it wasn’t highly likely)
It’s also not clear to me that any narrow definition of [AI safety community] was in a position to prevent some claims that open-source will be unacceptably dangerous at some point. E.g. IIRC Geoffrey Hinton rhetorically compared it to giving everyone nukes quite a while ago.
Reducing focus on [desirable, but controversial, short-term wins] seems important to consider where non-adversarial groups are concerned. It’s less clear that it helps against (proto-)adversarial groups—unless you’re proposing some kind of widespread, strict message discipline (I assume that you’re not).
[EDIT for useful replies to this, see Richard’s replies to Akash above]
I agree that dunking on OS communities has apparently not been helpful in these regards. It seems kind of orthogonal to being power-seeking though. Overall, I think part of the issue with AI safety is that the established actors (e.g. wide parts of CS academia) have opted out of taking a responsible stance, e.g. compared to recent developments in biosciences and RNA editing. Partially, one could blame this on them not wanting to identify too closely with, or grant legitimacy to, the existing AI safety community at the time. However, a priori, it seems more likely that it is simply due to the different culture in CS vs life sciences, with the former lacking the deep culture of responsibility for their research (in particular as far as they’re connected to e.g. Silicon Valley startup culture).
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?