I think that vast majority of comparative claims (like “AI Safety community is more X than any other advocacy group”) is more based on vibes than facts. Are you sure than Sierra Club, Club of Rome, Mont Pelerin Society, Fabian Society, Anti-Defamation League et cetera are less power-seeking than AI Safety community? Are you sure that “let’s totally reorganize society around ecological sustainability” is less power-seeking than “let’s ensure that AGI corporations have management that is not completely blind to alignment problem”?
I never claimed that AI safety is more X than “any other advocacy group”; I specifically said “most other advocacy groups”. And of course I’m not sure about this, asking for that is an isolated demand for rigor. It feels like your objection is the thing that’s vibe-based here.
On the object level: these are good examples, but because movements vary on so many axes, it’s hard to weigh up two of them against each other. That’s why I identified the three features of AI safety which seem to set it apart from most other movements. (Upon reflection I’d also add a fourth: the rapid growth.)
I’m curious if there are specific features which some of the movements you named have that you think contribute significantly to their power-seeking-ness, which AI safety doesn’t have.
I agree that the word “any” is wrong here. I used “sure” in the sense of “reasonably everyday I-won’t-get-hit-by-a-car-if-I-cross-the-road-now sure,” not in the “100% math-proof sure” sense.
By “vibe-based,” I refer to the features that you mention. Yes, there is a lot of talk about consequentialism and efficiency in rationalist/EA-adjacent AIS circles, and consequentialism gives a base for power-seeking actions, but it’s unclear how much this talk leads to actual power-seeking actions and how much it’s just local discourse framing.
The same applies to the feel of urgency.
Funnily, consequentialism can lead to less power-seeking if we define the problem in a less open-ended manner. If your task is to “maximize the number of good things in the world,” you benefit from power-seeking. If your task is to “design a safe system capable of producing superintelligent work,” you are, in fact, interested in completing this task with minimal effort and, therefore, minimal resources.
I think that the broad environmentalist movement is at least in the same tier as the AIS movement for all of the mentioned features.
Power-seeking inferred from consequentialism? Environmentalism is a movement about political control from the very start. The Club of Rome was initially founded inside the OECD and tried to influence its policy towards degrowth.
Consequentialism leading to questionable practices? Mass sterilizations in the 1980s are far beyond whatever AI Safety has done to date.
Urgency? You can look at all the people claiming that 2030 is a point of no return for climate change.
Focus of elites? The Club of Rome and the Sierra Club are literally what is written in the name—they are elite clubs.
I think I would agree if you say that there are a lot of nuances and the chance of AIS being more power-seeking than the environmentalism movement is non-negligible, but to measure power-seeking with necessary accuracy, we would need not a blog post written by one person, but the work of an army of sociologists.
Regardless of who is more power-seeking, it would probably be a good idea to look at how being power-seeking has been a disadvantage to other movements. It looks to me like the insistence/power-seeking of the environmental movement may well have been an immense disadvantage; it may have created a backlash that’s almost as strong as the entire movement.
Backlash for environmentalism was largely inevitable. The whole point of environmentalism is to internalize externalities in some way, i.e., impose costs of pollution/ecological damage on polluters. Nobody likes to get new costs, so backlash ensues.
That’s a good point. But not all of the imposed costs were strategically wise, so the backlash didn’t need to be that large to get the important things done. It could be argued that the most hardline, strident environmentalists might’ve cost the overall movement immensely by pushing for minor environmental gains that come at large perceived costs.
I think that did happen, and that similarly pushing for AI safety measures should be carefully weighed in cost vs benefit. The opposite argument is that we should just get everyone used to paying costs for ai safety (in terms of limiting ai progress that would not probably be highly dangerous). I think that strategy backfired badly for environmentalism and would backfire for us.
Maybe. Again, I’m not expert in PR and I’d really like to have people who are expert involved in coming up with strategies.
I think at least some of this backlash comes from Earth being very bad at coordination. “To get moderate result you should scare opponents with radicals” and other negotiation frictions.
Sure. But scaring opponents with inflated arguments and demands by radicals didn’t seem to work well for the environmental movement, so the AI safety movement probably shouldn’t employ those tactics.
To clarify in more general way: Earth is bad in coordination in a sense that you can’t expect that if industrial producer dumps toxic waste into environment, special government agency will walk in with premise: “hey, you seem to destroy environmental commons to get profits, let’s negotiate point on Pareto frontier in space of environmental commons-profits which leave both of us not very upset”.
Are you sure [...] et cetera are less power-seeking than AI Safety community?
Until recently the MIRI default plan was basically “obtain god-like AI and use it to take over the world”(“pivotal act”), it’s hard to get more power-seeking than that. Other wings of the community have been more circumspect but also more active in things like founding AI labs, influencing government policy, etc., to the tune of many billions of dollars worth of total influence. Not saying this is necessarily wrong but it does seem empirically clear that AI-risk-avoiders are more power-seeking than most movements.
let’s ensure that AGI corporations have management that is not completely blind to alignment problem
My understanding of MIRI plan was “have a controllable, safe AI that’s just powerful enough to take some action that prevents anyone else from building a more powerful and more dangerous AI”. I wouldn’t call that God-like or an intention to take over the world. The go-to [acknowledged as that plausible] example is “melt all the GPUs”] Your description feels grossly inaccurate.
Basically any plan of the form “use AI to prevent anyone from building more powerful and more dangerous AI” is incredibly power-grabbing by normal standards: in order to do this, you’ll have to take actions that start out as terrorism and then might quickly need to evolve into insurrection (given that the government will surely try to coerce you into handing over control over the AI-destroying systems); this goes against normal standards for what types of actions private citizens are allowed to take.
I agree that “obtain enough hard power that you can enforce your will against all governments in the world including your own” is a bit short of “try to take over the world”, but I think that it’s pretty world-takeover-adjacent.
I mean, it really matters whether you are suggesting someone else to take that action or whether you are planning to take that action yourself. Asking the U.S. government to use AI to prevent anyone from building more powerful and more dangerous AI is not in any way a power-grabbing action, because it does not in any meaningful way make you more powerful (like, yes, you are part of the U.S. so I guess you end up with a bit more power as the U.S. ends up with more power, but that effect is pretty negligible). Even asking random AI capability companies to do that is also not a power-grabbing action, because you yourself do not end up in charge of those companies as part of that.
Yes, unilaterally deploying such a system yourself would be, but I have no idea what people are referring to when they say that MIRI was planning on doing that (maybe they were, but all I’ve seen them do is to openly discuss plans about what ideally someone with access to a frontier model should do in a way that really did not sound like it would end up with MIRI meaningfully in charge).
I think they talked explicitly about planning to deploy the AI themselves back in the early days(2004-ish) then gradually transitioned to talking generally about what someone with a powerful AI could do.
But I strongly suspect that in the event that they were the first to obtain powerful AI, they would deploy it themselves or perhaps give it to handpicked successors. Given Eliezer’s worldview I don’t think it would make much sense for them to give the AI to the US government(considered incompetent) or AI labs(negligently reckless)
I think they talked explicitly about planning to deploy the AI themselves back in the early days(2004-ish) then gradually transitioned to talking generally about what someone with a powerful AI could do.
I agree that very old MIRI (explicitly disavowed by present MIRI and mostly modeled as “one guy in a basement somewhere”) looked a bit more like this, but I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president. I don’t think it has zero value in forecasting the future, but going and reading someone’s high-school political science essay, and inferring they would endorse that position in the modern day, is extremely dubious.
My model of them would definitely think very hard about the signaling and coordination problems that come with people trying to build an AGI themselves, and then act on those. I think Eliezer’s worldview here would totally output actions that include very legible precommitments about what the AI system would be used for, and would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it. Eliezer has written a lot about this stuff and clearly takes considerations like that extremely seriously.
I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president
Yeah, but it’s not just the old MIRI views, but those in combination with their statements about what one might do with powerful AI, the telegraphed omissions in those statements, and other public parts of their worldview e.g. regarding the competence of the rest of the world. I get the pretty strong impression that “a small group of people with overwhelming hard power” was the ideal goal, and that this would ideally be controlled by MIRI or by a small group of people handpicked by them.
Eliezer talks a lot in the Arbital article on CEV about how useful it is to have a visibly neutral alignment target
Right now Eliezer is pursuing a strategy which does not meaningfully empower him at all (just halting AGI progress)
Eliezer complaints a lot about various people using AI alignment under the guise of mostly just achieving their personal objectives (in-particular the standard AI censorship stuff being thrown into the same bucket)
Lots of conversations I’ve had with MIRI employees
I would be happy to take bets here about what people would say.
but I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president.
This seems too strong to me. There looks to me like a clear continuity of MIRI’s strategic outlook from the days when their explicit plan was to build a singleton and “optimize” the universe, through to today. In between there was a series of updates regarding how difficult various intermediate targets would be. But the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function.
If I remember correctly, back in the AI foom debate, Robin Hanson characterized the Singularity Institute’s plan (to be the first to a winner-take-all technology, and then use that advantage to optimize the cosmos) as declaring total war on the world. Eliezer disputed that characterization.
(Note that I spent 10 minutes trying to find the relevant comments, and didn’t find anything quite like what I was remembering which does decrease my credence that I’m remembering correctly.)
the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function
I think you mean “the goal remains to ensure that CEV or something like it is eventually implemented, and the universe is thus optimized according to the resulting utility function”, right? I think Eliezer’s view has always been that we want a CEV-maximizing ASI to be eventually turned on, but if that happens, it wouldn’t matter which human turns it on. And then evidently Eliezer has pivoted over the decades from thinking that this is likeliest to happen if he tries to build such an ASI with his own hands, to no longer thinking that.
All of that sounds right to me. But this pivot with regards to means isn’t much evidence about what Eliezer/MIRI would do if they (as a magical hypothetical) suddenly found themselves with a verifiably-aligned CEV AGI.
I expect that they would turn it on, with the expectation that it would develop a hard power decisive strategic advantage, use that to end the acute risk period, and then proceed to optimize the universe.
Insofar as that’s true, I think Oliver’s statement above...
and would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it.
...is inaccurate.
MIRI has never said, to my knowledge,
We used to think that if a small team could build a verifiably-aligned CEV AI, that they should unilaterally turn it on, knowing that that will likely result in the relative disempowerment of many human institutions and existing human leaders. We once planned to do that ourselves.
We now think that was a mistake, not just because building a verifiably-aligned CEV AI is unworkably hard, but because unilaterally seizing a hard power advantage, even in the seizing a hard power advantage, even in the service of CEV, is an act of war (or something).
The Singularity Institute used to have the plan of building and deploying a friendly AI, which they expected to “optimize” the whole world.
Eliezer’s writing includes lots of point in which he at least hints (some would say more than hints), that he thinks that it is morally obligatory or at least virtuous, to take over the world for the side of Good.
Famously, Harry says “World Domination is such an ugly phrase. I prefer world optimization.” (We made t-shirts of this phrase!)
The Sword of Good ends with the line
“‘I don’t trust you either,’ Hirou whispered, ‘but I don’t expect there’s anyone better,’ and he closed his eyes until the end of the world.” He’s concluded that all the evil in the world must be opposed, that it’s right for someone to cast the “spell of ultimate power” to do that.
(This is made a bit murky, because Eliezer’s writings usually focus on the transhumanist conquest of the evils of nature, rather than political triumph over human-evil. But triumph over the human-evils is definitely included eg the moral importance and urgency of destroying Azkaban, in HP:MoR.)
From all that, I think it is reasonable to think that MIRI is in favor of taking over the world, if they could get the power to do it!
So it seems disingenuous, to me, to say,
I think Eliezer’s worldview here...would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it.
I agree that
MIRI’s leadership doesn’t care who implements a CEV AI, as long as they do it correctly.
(Though this is not as clearly non-powerseeking, if you rephrase it as “MIRI leadership doesn’t care who implements the massively powerful AI, as long as they correctly align it to the values that MIRI leadership endorses.
For an outsider who doesn’t already trust the CEV process, this is about as reassuring as a communist group saying “we don’t care who implements the AI, as long as they properly align it to Marxist doctrine. I understand how CEV is more meta than that, how it is explicitly avoiding coding object level values into the AI. But not everyone will see it that way, especially if they think the output of CEV is contrary to their values (as indeed, virtually every existing group should.))
CEV as an optimization target is itself selected to be cosmopolitan and egalitarian. It’s as good faith attempt to optimize for the good of all. It does seem to me that the plan of “give a hard power advantage to this process, which we expect to implement the Good, itself”, is a step down in power-seeking from “give a hard power advantage to me, and I’ll do Good stuff.”
But it still seems to me that MIRI’s culture endorses sufficiently trustworthy people taking unilateral action, both to do a pivotal act and end the acute risk period, and more generally, to unleash a process that will inexorably optimize the world for Good.
I mean, I also think there is continuity from the beliefs I held in my high-school essays and my present beliefs, but it’s also enough time and distance that if you straightforwardly attribute claims to me that I made in my high-school essays, that I have explicitly disavowed and told you I do not believe, that I will be very annoyed with you and will model you as not actually trying to understand what I believe.
Absolutely, if you have specifically disavowed any claims, that takes precedence over anything else. And if I insist you still think x, because you said x ten years ago, but you say you now think something else, I’m just being obstinant.
In contrast, if you said x ten years ago, and in the intervening time you’ve shared a bunch of highly detailed models that are consistent with x, I think I should think you still think x.
I’m not aware of any specific disavowals of anything after 2004? What are you thinking of here?
Here is a video of of Eliezer, first hosted on vimeo in 2011. I don’t know when it was recorded.
[Anyone know if there’s a way to embed the video inthe coment, so people don’t have to click out to watch it?]
He states explicitly:
As a research fellow of the Singularity institute, I’m supposed to first figure out how to build a friendly AI, and then once I’ve done that go and actually build one.
And later in the video he says:
The Singularity Institute was founded on the theory that in order to get a friendly artificial intelligence someone’s got to build one. So there. We’re just going to have an organization whose mission is ‘build a friendly AI’. That’s us. There’s like various other things that we’re also concerned with, like trying to get more eyes and more attention focused on the problem, trying to encourage people to do work in this area. But at the core, the reasoning is: “Someone has to do it. ‘Someone’ is us.”
Basically any plan of the form “use AI to prevent anyone from building more powerful and more dangerous AI” is incredibly power-grabbing by normal standards
I don’t think this is the important point of disagreement. Habryka’s point throughout this thread seems to be that, yes, doing that is power-grabbing, but it is not what MIRI planned to do. So MIRI planned to (intellectually) empower anyone else willing to do (and capable of doing) a pivotal act with a blueprint for how to do so.
So MIRI wasn’t seeking to take power, but rather to allow someone else[1] to do so. It’s the difference between using a weapon and designing a weapon for someone else’s use. An important part is that this “someone else” could very well disagree with MIRI about a large number of things, so there need not be any natural allyship or community or agreement between them.
If you are a blacksmith working in a forge and someone comes into your shop and says “build me a sword so I can use it to kill the king and take control of the realm,” and you agree to do so but do not expect to get anything out-of-the-ordinary in return (in terms of increased power, status, etc), it seems weird and non-central to call your actions power-seeking. You are simply empowering another, different power-seeker. You are not seeking any power of your own.
Who was both in a position of power at an AI lab capable of designing a general intelligence and sufficiently clear-headed about the dangers of powerful AI to understand the need for such a strategy.
My understanding was there were 4 phases in which the Singularity Institute / MIRI had 4 different plans.
~2000 to ~2004:
Plan:
Build a recursively self-improving seed AI as quickly as possible →
That seed AI Fooms →
It figures out the Good and does it.
[Note: Eliezer has explicitly disendorsed everything that he believed in this period, unless othrwise noted.]
~2004 to ~2016:
Update: “Wait. Because of the Orthogonality thesis, not all seed AIs will converge to values that we consider good, even though they’re much smarter than us. The supermajority of seed AIs don’t. We have to build in humane values directly, or building a recursively self-improving AGI will destroy both the world and everything of value in the world.”
New plan:
Figure out the math of motivationaly-stable self-improvement, figure the deep math of cognition →
use both to build a seed AI, initialized to implement Coherent Extrapolated Volition ->
let that seed AI recursively self improve into a singleton / sovereign with a decisive strategic advantage →
that singleton now uses its decisive strategic advantage to optimize the universe.
(“World domination is such an ugly phrase. I prefer world optimization”).
~2016 to ~2021:
Update: “it turns out Deep Learning is general enough that it is possible to build AGI with relatively brute force methods, without having much deep insight into the nature of cognition. AI timelines are shorter than we thought. Fuck. There isn’t time to figure out how to do alignment deeply and completely, at the level that would be required to trust an AI to be a sovereign, and optimize the whole universe.”
New plan:
Figure out enough of alignment to build the minimal AGI system that can preform a pivotal act, in a tightly controlled circumstances, with lots of hacky guardrails and speed-bumps →
Build such a limited AGI →
Deploy that AGI to do a pivotal act to prevent any competitor projects from building a more dangerous unbounded AGI.
~2021 to present:
Update: “We can’t figure out even that much of the science of alignment in time. The above plan is basically doomed. We think the world is doomed. Given that, we might as well try outreach:”
New plan:
Do outreach →
Get all the Great Powers in the world to join and enforce an treaty that maintains a world-wide ban on large training runs →
Do biotech projects that can produce humans that are smart enough that they have security mindset not out of a special personal disposition, but just because they’re smart enough to see the obviousness of it by default →
Those superhumans solve alignment and (presumably?) implement more or less the pre-2016 MIRI plan.
I think interstice’s summary is basically an accurate representation of the ~2001 to ~2016 plan. They’re only mistaken in that MIRI didn’t switch away from that plan until recently.
Nice overview, I agree but I think the 2016-2021 plan could still arguably be described as “obtain god-like AI and use it to take over the world”(admittedly with some rhetorical exaggeration, but like, not that much)
I think it’s pretty important that the 2016 to 2021 plan was explicitly aiming to avoid unleashing godlike power. “The minimal amount of power to do a thing which is otherwise impossible”, not “as much omnipotence as is allowed by physics”.
And similarly, the 2016 to 2021 plan did not entail optimizing the world except with regard to what is necessary to prevent dangerous AGIs.
These are both in contrast to the earlier 2004 to 2016 plan. So the rhetorical exaggeration confuses things.
MIRI actually did have a plan that, in my view, is well characterized as (eventually) taking over the world, without exaggeration, that’s apt to get lost if we describe a “toned down” plan as “taking over the world”, because it involves taking powerful, potentially aggressive, action.
This discussion is a nice illustration of why x-riskers are definitely more power-seeking than the average activist group. Just like Eskimos proverbially have 50 words for snow, AI-risk-reducers need at least 50 terms for “taking over the world” to demarcate the range of possible scenarios. ;)
I think MIRI did put a lot of effort into being cooperative about the situation (i.e. Don’t leave your fingerprints on the future, doing the ‘minimal’ pivotal act that would end the acute risk period, and when thinking about longterm godlike AI, trying to figure out fair CEV sorts of things).
But, I think it was also pretty clear that “have a controllable, safe AI that’s just powerful enough to take some action that prevents anyone else from building a more powerful and more dangerous AI” were not in the overton window. I don’t know what Eliezer’s actual plan was since he disclaimed “yes I know melt all the GPUs won’t work”, but, like, “melt all the GPUs” implies a level of power over the world that is really extreme by historical standards, even if you’re trying to do the minimal thing with that power.
I think the plan implies having the capability that if you wanted to, you could take over the world, but having the power to do something and actually doing it are quite different. When you say “MIRI wanted to take over the world”, the central meanings of that that come to mind for me is “take over all the governments, be in charge of all the laws and decision-making, be world dictator, take possession of all the resources” and probably also “steer humanity’s future in a very active way”. Which is very very not their intention and if someone goes around saying MIRI’s plan was to take over the world without any clarification leaving the reader to think the above, then I think they’re being very darn misleading.
When you read the Sequences, was your visualization of a Friendly AI going to let the governments of North Korea or Saudi Arabia persist? Would it allow parents to abuse their children in ways that are currently allowed by the law (and indeed enshrined by the law, in that the law give parents authority over their children)? Does it allow the factory farms to continue to run? How about the (then contemporaneous) US occupations of Iraq and Afghanistan?
(This is a non- rehtorical question. I wonder if we were visualizing different things.)
It’s a superintelligence, and so it can probably figure out effective peaceful ways to accomplish it’s goals. But among it’s goals will be the dismantling of many and likely all of the world’s major governments, not to mention a bunch of other existing power structures. A government being dismantled by a superhuman persuader is, in many but not all ways, as unsettling as it being destroyed by military force.
Perhaps humanity as a whole, and every individual human, would be made better off by a CEV-aligned friendly singleton, but I think the US government, as an entity, would be rightly threatened.
Doesn’t this very answer show that an AI such as you describe would not be reasonably describable as “Friendly”, and that consequently any AI worthy of the term “Friendly” would not do any of the things you describe? (This is certainly my answer to your question!)
No. “Friendly” was a semi-technical term of art, at the time. It may turn out that a Friendly AI (in the technical sense) is not or, even can’t be, “friendly” in a more conventional sense.
Er… yes, I am indeed familiar with that usage of the term “Friendly”. (I’ve been reading Less Wrong since before it was Less Wrong, you know; I read the Sequences as they were being posted.) My comment was intended precisely to invoke that “semi-technical term of art”; I was not referring to “friendliness” in the colloquial sense. (That is, in fact, why I used the capitalized term.)
Please consider the grandparent comment in light of the above.
In that case, I answer flatly “no”. I don’t expect many existing governmental institutions to be ethical or legitimate in the eyes of CEV, if CEV converges at all. Factory Farming is right out.
You don’t think that most humans would be opposed to having an AI dismantle their government, deprive them of affordable meat, and dictate how they can raise their children?
Whether most existing humans would be opposed is not a criterion of Friendliness.
I think if you described what was going to happen many and maybe humans would say they prefer the status quo to a positive CEV-directed singularity. Perhaps it depends on which parts of “what’s going to happen” you focus on, some are more obviously good or exciting than others. Curing cancer is socially regarded as 👍 while curing death and dismantling governments are typically (though not universally) regarded as 👎.
I don’t think they will actually provide much opposition, because a superhuman persuader will be steering the trajectory of events. (Ostensively, by using only truth tracking arguments and inputs that allow us to converge on the states of belief that we would reflectively prefer, but we mere humans won’t be able to distinguish that from malicious superhuman manipulation.)
But again, how humans would react is neither here nor there for what a Friendly AI does. The AI does what the CEV of humans would want, not what the humans want.
Part of the whole point of CEV is to discover at least some things that current humanity is confused about but would want if fully informed, with time to think. It’d be surprising to me if CEV-existing-humanity didn’t turn out to want some things that many current humans are opposed to.
Sure. Now, as far as I understand it, whether the extrapolated volition of humanity will even cohere is an open question (on any given extrapolation method; we set aside the technical question of selecting or constructing such a method).
So Eli Tyre’s claim seems to be something like: on [ all relevant / the most likely / otherwise appropriately selected ] extrapolation methods, (a) humanity’s EV will cohere, (b) it will turn out to endorse the specific things described (dismantling of all governments, removing the supply of factory farmed meat, dictating how people should raise their children).
I’m much more doubtful than most people around here about whether CEV coheres: I guess that the CEV of some humans wireheads themselves and the CEV of other humans doesn’t, for instance.
But I’m bracketing that concern for this discussion. Assuming CEV coheres, then yes I predict that it will have radical (in the sense of a political radical who’s beliefs are extremely outside of the overton window, such that they are disturbing to the median voter) views about all of those things.
But more confidently, I predict that it will have radical views about a very long list of things that are commonplace in 2024, even if it turns out that I’m wrong about this specific set.
CEV asks what would we want if we knew everything the AI knows. There are dozens of things that I think that I know, that if the average person knew to be true, would invalidate a lot of their ideology. Basic
If the average person knew everything that an AGI knows (which includes potentially millions of subjective years of human science, whole new fields, as foundational to one’s worldview as economics and probability theory is to my current worldview), and they had hundreds of subjective years to internalize those facts and domains, in a social context that was conducive to that, with (potentially) large increases in their intelligence, I expect their views are basically unrecognizable after a process like that.
As a case in point, most people consider it catastrophically bad to have their body destroyed (duh). And if you asked them if they would prefer, given their body being destroyed, to have their brain-state recorded, uploaded, and run on a computer, many would say “no”, because it seems horrifying to them.
Most LessWrongers embrace computationalism: they think that living as an upload is about as good as living as a squishy biological robot (and indeed, better in many respects). They would of course choose to be uploaded if their body was being destroyed. Many would elect to have their body destroyed specifically because they would prefer to be uploaded!
That is most LessWrongers think they know something which most people don’t know, but which, if they did know it, would radically alter their preferences and behavior.
I think a mature AGI knows at least thousands of things like that.
So among the things about CEV that I’m most confident about (again, granting that it coheres at all), is that CEV has extremely radical views, conclusions which are horrifying to most people, including probably myself.
If by ‘cohere’ you mean ‘the CEVs of all individual humans match’, then my belief (>99%) is that it is not the case that the CEVs of all individual humans will (precisely) match. I also believe there would be significant overlap between the CEVs of 90+% of humans[1], and that this overlap would include disvaluing two of the three[2] things you asked about (present factory farming and child abuse; more generally, animal and child suffering).
(This felt mostly obvious to me, but you did ask about it a few times, in a way that suggested you expect something different; if so, you’re welcome to pinpoint where you disagree.)
For instance, even if one human wants to create a lot of hedonium, and another human wants to create a lot of individuals living fun and interesting lives, it will remain the case that they both disvalue things like extreme suffering. Also, the former human will probably still find at least some value in what the latter human seeks.
For the part of your question about whether their CEVs would endorse dismantling governments: note that ‘governments’ is a relevantly broad category, when considering that most configurations which are infeasible now will be feasible in the (superintelligence-governed) future. I think these statements capture most of my belief about how most humans’ CEVs would regard things in this broad category.
Most human CEVs would be permissive of those who terminally-wish[3] to live in contexts that have some form of harmless government structure.
The category of ‘government’ also includes, e.g., dystopias that create suffering minds and don’t let them leave; most human CEVs would seek to prevent this kind of government from existing.
(None of that implies any government would be present everywhere, nor that anyone would be in such a context against their will; rather, I’m imagining that a great diversity of contexts and minds will exist. I less confidently predict that most will choose to live in contexts without a government structure, considering it unnecessary given the presence of a benevolent ASI.)
(wished for not because it is necessary, for it would not be under a benevolent ASI, but simply because it’s their vision for the context in which they want to live)
I think majority of nations would support dismantling their governments in favor of benevolent superintelligence, especially given correct framework. And ASI can simply solve problem of meat by growing brainless bodies.
Edit: Whoever mega-downvoted this, I’m interested to see you explain why.
Meta: You may wish to know[1] that seeing these terms replaced with the ones you used can induce stress/dissociation in the relevant groups (people disturbed by factory farming and child abuse survivors). I am both and this was my experience. I don’t know how common it would be among LW readers of those demographics specifically, though.
The one you responded to:
Would it allow parents to abuse their children in ways that are currently allowed by the law [...]? Does it allow the factory farms to continue to run?
Your response:
You don’t think that most humans would be opposed to having an AI dismantle their government, deprive them of affordable meat, and dictate how they can raise their children?
I’m framing this as sharing info you (or a generalized altruistic person placed in your position) may care about rather than as arguing for a further conclusion.
I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
Right, but I’m asking about what you’re visualization of a Friendly AI as described in the sequences, not a limited AGI for a pivotal act.
I’m confused by your confusion! Are you saying that that’s a non-sequitur, because I’m asking about a CEV-sovereign instead of a corrigible, limited genie or oracle?
It seems relevant to me, because both of those were strategic goals for MIRI at various points in it’s history, and at least one of them seem well characterized as “taking over the world” (or at least something very nearby to that). Which seems germane to the discussion at hand to me.
I would be surprised if a Friendly AI resulted in those things being left untouched.
I think that is germane but maybe needed some bridging/connecting work since this thread so far was about MIRI-as-having-pivotal-act-goal. Whereas I was less sure about whether MIRI itself would enact a pivotal act if they could than Habryka, my understanding was they had no plan to create a sovereign for most of their history (like after 2004) and so doesn’t seem like that’s a candidate for them having a plan to take over the world.
my understanding was they had no plan to create a sovereign for most of their history (like after 2004)
Yeah, I think that’s false.
The plan was “Figure out how to build a friendly AI, and then build one”. (As Eliezer stated in the video that I linked somewhere else in this comment thread).
But also, I got that impression from the Sequences? Like Eliezer talks about actually building an AGI, not just figuring out the theory of how to build one. You didn’t get that impression?
I don’t remember what exactly I thought in 2012 when I was reading the Sequences. I do recall sometime later, after DL was in full swing, it seeming like MIRI wasn’t in any position to be building AGI before others (like no compute, not the engineering prowess), and someone (not necessarily at MIRI) confirmed that wasn’t the plan. Now and at the time, I don’t know how much that was principle vs ability.
My feeling of the plan pre-pivotal-act era was “figure out the theory of how to build a safe AI at all, and try to get whoever is building to adopt that approach”, and that MIRI wasn’t taking any steps to be the ones building it. I also had the model that due to psychological unity of mankind, anyone building an aligned[ with them] AGI was a good outcome compared to someone building unaligned. Like even if it was Xi Jinping, a sovereign aligned with him would be okay (and not obviously that dramatically different from anyone else?). I’m not sure how much this was MIRI positions vs fragments that I combined in my own head that came from assorted places and were never policy.
“Taking over” something does not imply that you are going to use your authority in a tyrannical fashion. People can obtain control over organizations and places and govern with a light or even barely-existent touch, it happens all the time.
Would you accept “they plan to use extremely powerful AI to institute a minimalist, AI-enabled world government focused on preventing the development of other AI systems” as a summary? Like sure, “they want to take over the world” as a gist of that does have a bit of an editorial slant, but not that much of one. I think that my original comment would be perceived as much less misleading by the majority of the world’s population than “they just want to do some helpful math uwu” in the event that these plans actually succeeded. I also think it’s obvious that these plans indicate a far higher degree of power-seeking(in aim at least) than virtually all other charitable organizations.
(..and to reiterate, I’m not taking a strong stance on the advisability of these plans. In a way, had they succeeded, that would have provided a strong justification for their necessity. I just think it’s absurd to say that the organization making them is less power-seeking than the ADL or whatever)
Would you accept “they plan to use extremely powerful AI to institute a minimalist, AI-enabled world government focused on preventing the development of other AI systems” as a summary?
No. Because I don’t think that was specified or is necessary for a pivotal act. You could leave all existing government structures intact and simply create an invincible system that causes any GPU farm larger than a certain size to melt. Or something akin to that that doesn’t require replacing existing governments, but is a quite narrow intervention.
It wasn’t specified but I think they strongly implied it would be that or something equivalently coercive. The “melting GPUs” plan was explicitly not a pivotal act but rather something with the required level of difficulty, and it was implied that the actual pivotal act would be something further outside the political Overton window. When you consider the ways “melting GPUs” would be insufficient a plan like this is the natural conclusion.
doesn’t require replacing existing governments
I don’t think you would need to replace existing governments. Just block all AI projects and maintain your ability to continue doing so in the future via maintaining military supremacy. Get existing governments to help you, or at least not interfere, via some mix of coercion and trade. Sort of a feudal arrangement with a minimalist central power.
Just block all AI projects and maintain your ability to continue doing so in the future via maintaining military supremacy.
That to me is a very very non-central case of “take over the world”, if it is one at all.
This is about “what would people think when they hear that description” and I could be wrong, but I expect “the plan is to take over the world” summary would lead people to expect “replace governments” level of interference, not “coerce/trade to ensure this specific policy”—and there’s a really really big difference between the two.
I think this whole debate is missing the point I was trying to make. My claim was that it’s often useful to classify actions which tend to lead you to having a lot of power as “structural power-seeking” regardless of what your motivations for those actions are. Because it’s very hard to credibly signal that you’re accumulating power for the right reasons, and so the defense mechanisms will apply to you either way.
In this case MIRI was trying to accumulate a lot of power, and claiming that they were aiming to use it in the “right way” (do a pivotal act) rather than the “wrong way” (replacing governments). But my point above is that this sort of claim is largely irrelevant to defense mechanisms against power-seeking.
(Now, in this case, MIRI was pursuing a type of power that was too weird to trigger many defense mechanisms, though it did trigger some “this is a cult” defense mechanisms. But the point cross-applies to other types of power that they, and others in AI safety, are pursuing.)
I don’t super buy this. I don’t think MIRI was trying to accumulate a lot of power. In my model of the world they were trying to design a blueprint for some institution or project that would mostly have highly conditional power, that they would personally not wield.
In the metaphor of classical governance, I think what MIRI was doing was much more “design a blueprint for a governance agency” not “put themselves in charge of a governance agency”. Designing a blueprint is not a particularly power-seeking move, especially if you expect other people to implement it.
I got your point and think it’s valid and I don’t object to calling MIRI structurally power-seeking to the extent they wanted to execute a pivotal act themselves (Habryka claims they weren’t, I’m not knowledgeable on that front).
I still think it’s important to push back against a false claim that someone had the goal of taking over the world.
AI-risk-avoiders are more power-seeking than most movements.
Are you saying that AIS movement is more power-seeking than environmentalist movement that spent 30M$+ on lobbying in single 2023 and has political parties in 90 countries, in five countries—in ruling coalition? For comparison, this paper in Politico with maximally negative attitude mentions AIS lobbying around 2M$.
Until recently the MIRI default plan was basically “obtain god-like AI and use it to take over the world”
It’s like saying “NASA default plan is to spread light of consciousness across the stars”, which is kinda technically true, but in reality NASA actions are not as cool as this phrase implies. “MIRI default plan” was “to do math in hope that some of this math will turn out to be useful”.
Are you saying that AIS movement is more power-seeking than environmentalist movement that spent 30M$+[...]
I think that AIS lobbying is likely to have more consequential and enduring effects on the world than environmental lobbying regardless of the absolute size in body count or amount of money, so yes.
“MIRI default plan” was “to do math in hope that some of this math will turn out to be useful”.
I mean yeah, that is a better description of their publicly-known day-to-day actions, but intention also matters. They settled on math after it became clear that the god AI plan was not achievable(and recently, gave up on the math plan too when it became clear that was not realistic). An analogy might be an environmental group that planned to end pollution by bio-engineering a microbe to spread throughout the world that made oil production impossible, then reluctantly settled for lobbying once they realized they couldn’t actually make the microbe. I think this would be a pretty unusually power-seeking plan for an environmental group!
The point of the OP is not about effects, it’s about AIS being visibly more power-seeking than other movements and causing backlash in response to visible activity.
I think that vast majority of comparative claims (like “AI Safety community is more X than any other advocacy group”) is more based on vibes than facts. Are you sure than Sierra Club, Club of Rome, Mont Pelerin Society, Fabian Society, Anti-Defamation League et cetera are less power-seeking than AI Safety community? Are you sure that “let’s totally reorganize society around ecological sustainability” is less power-seeking than “let’s ensure that AGI corporations have management that is not completely blind to alignment problem”?
I never claimed that AI safety is more X than “any other advocacy group”; I specifically said “most other advocacy groups”. And of course I’m not sure about this, asking for that is an isolated demand for rigor. It feels like your objection is the thing that’s vibe-based here.
On the object level: these are good examples, but because movements vary on so many axes, it’s hard to weigh up two of them against each other. That’s why I identified the three features of AI safety which seem to set it apart from most other movements. (Upon reflection I’d also add a fourth: the rapid growth.)
I’m curious if there are specific features which some of the movements you named have that you think contribute significantly to their power-seeking-ness, which AI safety doesn’t have.
I agree that the word “any” is wrong here. I used “sure” in the sense of “reasonably everyday I-won’t-get-hit-by-a-car-if-I-cross-the-road-now sure,” not in the “100% math-proof sure” sense.
By “vibe-based,” I refer to the features that you mention. Yes, there is a lot of talk about consequentialism and efficiency in rationalist/EA-adjacent AIS circles, and consequentialism gives a base for power-seeking actions, but it’s unclear how much this talk leads to actual power-seeking actions and how much it’s just local discourse framing.
The same applies to the feel of urgency.
Funnily, consequentialism can lead to less power-seeking if we define the problem in a less open-ended manner. If your task is to “maximize the number of good things in the world,” you benefit from power-seeking. If your task is to “design a safe system capable of producing superintelligent work,” you are, in fact, interested in completing this task with minimal effort and, therefore, minimal resources.
I think that the broad environmentalist movement is at least in the same tier as the AIS movement for all of the mentioned features.
Power-seeking inferred from consequentialism? Environmentalism is a movement about political control from the very start. The Club of Rome was initially founded inside the OECD and tried to influence its policy towards degrowth.
Consequentialism leading to questionable practices? Mass sterilizations in the 1980s are far beyond whatever AI Safety has done to date.
Urgency? You can look at all the people claiming that 2030 is a point of no return for climate change.
Focus of elites? The Club of Rome and the Sierra Club are literally what is written in the name—they are elite clubs.
I think I would agree if you say that there are a lot of nuances and the chance of AIS being more power-seeking than the environmentalism movement is non-negligible, but to measure power-seeking with necessary accuracy, we would need not a blog post written by one person, but the work of an army of sociologists.
Regardless of who is more power-seeking, it would probably be a good idea to look at how being power-seeking has been a disadvantage to other movements. It looks to me like the insistence/power-seeking of the environmental movement may well have been an immense disadvantage; it may have created a backlash that’s almost as strong as the entire movement.
Backlash for environmentalism was largely inevitable. The whole point of environmentalism is to internalize externalities in some way, i.e., impose costs of pollution/ecological damage on polluters. Nobody likes to get new costs, so backlash ensues.
That’s a good point. But not all of the imposed costs were strategically wise, so the backlash didn’t need to be that large to get the important things done. It could be argued that the most hardline, strident environmentalists might’ve cost the overall movement immensely by pushing for minor environmental gains that come at large perceived costs.
I think that did happen, and that similarly pushing for AI safety measures should be carefully weighed in cost vs benefit. The opposite argument is that we should just get everyone used to paying costs for ai safety (in terms of limiting ai progress that would not probably be highly dangerous). I think that strategy backfired badly for environmentalism and would backfire for us.
Maybe. Again, I’m not expert in PR and I’d really like to have people who are expert involved in coming up with strategies.
I think at least some of this backlash comes from Earth being very bad at coordination. “To get moderate result you should scare opponents with radicals” and other negotiation frictions.
Sure. But scaring opponents with inflated arguments and demands by radicals didn’t seem to work well for the environmental movement, so the AI safety movement probably shouldn’t employ those tactics.
To clarify in more general way: Earth is bad in coordination in a sense that you can’t expect that if industrial producer dumps toxic waste into environment, special government agency will walk in with premise: “hey, you seem to destroy environmental commons to get profits, let’s negotiate point on Pareto frontier in space of environmental commons-profits which leave both of us not very upset”.
Until recently the MIRI default plan was basically “obtain god-like AI and use it to take over the world”(“pivotal act”), it’s hard to get more power-seeking than that. Other wings of the community have been more circumspect but also more active in things like founding AI labs, influencing government policy, etc., to the tune of many billions of dollars worth of total influence. Not saying this is necessarily wrong but it does seem empirically clear that AI-risk-avoiders are more power-seeking than most movements.
Seems like this is already the case.
My understanding of MIRI plan was “have a controllable, safe AI that’s just powerful enough to take some action that prevents anyone else from building a more powerful and more dangerous AI”. I wouldn’t call that God-like or an intention to take over the world. The go-to [acknowledged as that plausible] example is “melt all the GPUs”] Your description feels grossly inaccurate.
Basically any plan of the form “use AI to prevent anyone from building more powerful and more dangerous AI” is incredibly power-grabbing by normal standards: in order to do this, you’ll have to take actions that start out as terrorism and then might quickly need to evolve into insurrection (given that the government will surely try to coerce you into handing over control over the AI-destroying systems); this goes against normal standards for what types of actions private citizens are allowed to take.
I agree that “obtain enough hard power that you can enforce your will against all governments in the world including your own” is a bit short of “try to take over the world”, but I think that it’s pretty world-takeover-adjacent.
I mean, it really matters whether you are suggesting someone else to take that action or whether you are planning to take that action yourself. Asking the U.S. government to use AI to prevent anyone from building more powerful and more dangerous AI is not in any way a power-grabbing action, because it does not in any meaningful way make you more powerful (like, yes, you are part of the U.S. so I guess you end up with a bit more power as the U.S. ends up with more power, but that effect is pretty negligible). Even asking random AI capability companies to do that is also not a power-grabbing action, because you yourself do not end up in charge of those companies as part of that.
Yes, unilaterally deploying such a system yourself would be, but I have no idea what people are referring to when they say that MIRI was planning on doing that (maybe they were, but all I’ve seen them do is to openly discuss plans about what ideally someone with access to a frontier model should do in a way that really did not sound like it would end up with MIRI meaningfully in charge).
I think they talked explicitly about planning to deploy the AI themselves back in the early days(2004-ish) then gradually transitioned to talking generally about what someone with a powerful AI could do.
But I strongly suspect that in the event that they were the first to obtain powerful AI, they would deploy it themselves or perhaps give it to handpicked successors. Given Eliezer’s worldview I don’t think it would make much sense for them to give the AI to the US government(considered incompetent) or AI labs(negligently reckless)
I agree that very old MIRI (explicitly disavowed by present MIRI and mostly modeled as “one guy in a basement somewhere”) looked a bit more like this, but I think making inferences from that to modern MIRI is about as confused as making inferences from people’s high-school essays about what they will do when they become president. I don’t think it has zero value in forecasting the future, but going and reading someone’s high-school political science essay, and inferring they would endorse that position in the modern day, is extremely dubious.
My model of them would definitely think very hard about the signaling and coordination problems that come with people trying to build an AGI themselves, and then act on those. I think Eliezer’s worldview here would totally output actions that include very legible precommitments about what the AI system would be used for, and would absolutely definitely not include the ability of whoever builds AGI to just take over the world with it. Eliezer has written a lot about this stuff and clearly takes considerations like that extremely seriously.
Yeah, but it’s not just the old MIRI views, but those in combination with their statements about what one might do with powerful AI, the telegraphed omissions in those statements, and other public parts of their worldview e.g. regarding the competence of the rest of the world. I get the pretty strong impression that “a small group of people with overwhelming hard power” was the ideal goal, and that this would ideally be controlled by MIRI or by a small group of people handpicked by them.
Some things that feel incongruent with this:
Eliezer talks a lot in the Arbital article on CEV about how useful it is to have a visibly neutral alignment target
Right now Eliezer is pursuing a strategy which does not meaningfully empower him at all (just halting AGI progress)
Eliezer complaints a lot about various people using AI alignment under the guise of mostly just achieving their personal objectives (in-particular the standard AI censorship stuff being thrown into the same bucket)
Lots of conversations I’ve had with MIRI employees
I would be happy to take bets here about what people would say.
Sure, I DM’d you.
This seems too strong to me. There looks to me like a clear continuity of MIRI’s strategic outlook from the days when their explicit plan was to build a singleton and “optimize” the universe, through to today. In between there was a series of updates regarding how difficult various intermediate targets would be. But the goal remains to implement CEV or something like it, and optimize the universe according to the resulting utility function.
If I remember correctly, back in the AI foom debate, Robin Hanson characterized the Singularity Institute’s plan (to be the first to a winner-take-all technology, and then use that advantage to optimize the cosmos) as declaring total war on the world. Eliezer disputed that characterization.
(Note that I spent 10 minutes trying to find the relevant comments, and didn’t find anything quite like what I was remembering which does decrease my credence that I’m remembering correctly.)
I think you mean “the goal remains to ensure that CEV or something like it is eventually implemented, and the universe is thus optimized according to the resulting utility function”, right? I think Eliezer’s view has always been that we want a CEV-maximizing ASI to be eventually turned on, but if that happens, it wouldn’t matter which human turns it on. And then evidently Eliezer has pivoted over the decades from thinking that this is likeliest to happen if he tries to build such an ASI with his own hands, to no longer thinking that.
All of that sounds right to me. But this pivot with regards to means isn’t much evidence about what Eliezer/MIRI would do if they (as a magical hypothetical) suddenly found themselves with a verifiably-aligned CEV AGI.
I expect that they would turn it on, with the expectation that it would develop a hard power decisive strategic advantage, use that to end the acute risk period, and then proceed to optimize the universe.
Insofar as that’s true, I think Oliver’s statement above...
...is inaccurate.
MIRI has never said, to my knowledge,
The Singularity Institute used to have the plan of building and deploying a friendly AI, which they expected to “optimize” the whole world.
Eliezer’s writing includes lots of point in which he at least hints (some would say more than hints), that he thinks that it is morally obligatory or at least virtuous, to take over the world for the side of Good.
Famously, Harry says “World Domination is such an ugly phrase. I prefer world optimization.” (We made t-shirts of this phrase!)
The Sword of Good ends with the line
“‘I don’t trust you either,’ Hirou whispered, ‘but I don’t expect there’s anyone better,’ and he closed his eyes until the end of the world.” He’s concluded that all the evil in the world must be opposed, that it’s right for someone to cast the “spell of ultimate power” to do that.
(This is made a bit murky, because Eliezer’s writings usually focus on the transhumanist conquest of the evils of nature, rather than political triumph over human-evil. But triumph over the human-evils is definitely included eg the moral importance and urgency of destroying Azkaban, in HP:MoR.)
From all that, I think it is reasonable to think that MIRI is in favor of taking over the world, if they could get the power to do it!
So it seems disingenuous, to me, to say,
I agree that
MIRI’s leadership doesn’t care who implements a CEV AI, as long as they do it correctly.
(Though this is not as clearly non-powerseeking, if you rephrase it as “MIRI leadership doesn’t care who implements the massively powerful AI, as long as they correctly align it to the values that MIRI leadership endorses.
For an outsider who doesn’t already trust the CEV process, this is about as reassuring as a communist group saying “we don’t care who implements the AI, as long as they properly align it to Marxist doctrine. I understand how CEV is more meta than that, how it is explicitly avoiding coding object level values into the AI. But not everyone will see it that way, especially if they think the output of CEV is contrary to their values (as indeed, virtually every existing group should.))
CEV as an optimization target is itself selected to be cosmopolitan and egalitarian. It’s as good faith attempt to optimize for the good of all. It does seem to me that the plan of “give a hard power advantage to this process, which we expect to implement the Good, itself”, is a step down in power-seeking from “give a hard power advantage to me, and I’ll do Good stuff.”
But it still seems to me that MIRI’s culture endorses sufficiently trustworthy people taking unilateral action, both to do a pivotal act and end the acute risk period, and more generally, to unleash a process that will inexorably optimize the world for Good.
I mean, I also think there is continuity from the beliefs I held in my high-school essays and my present beliefs, but it’s also enough time and distance that if you straightforwardly attribute claims to me that I made in my high-school essays, that I have explicitly disavowed and told you I do not believe, that I will be very annoyed with you and will model you as not actually trying to understand what I believe.
Absolutely, if you have specifically disavowed any claims, that takes precedence over anything else. And if I insist you still think x, because you said x ten years ago, but you say you now think something else, I’m just being obstinant.
In contrast, if you said x ten years ago, and in the intervening time you’ve shared a bunch of highly detailed models that are consistent with x, I think I should think you still think x.
I’m not aware of any specific disavowals of anything after 2004? What are you thinking of here?
Here is a video of of Eliezer, first hosted on vimeo in 2011. I don’t know when it was recorded.
[Anyone know if there’s a way to embed the video inthe coment, so people don’t have to click out to watch it?]
He states explicitly:
And later in the video he says:
“Pretty world-takeover-adjacent” feels like a fair description to me.
I don’t think this is the important point of disagreement. Habryka’s point throughout this thread seems to be that, yes, doing that is power-grabbing, but it is not what MIRI planned to do. So MIRI planned to (intellectually) empower anyone else willing to do (and capable of doing) a pivotal act with a blueprint for how to do so.
So MIRI wasn’t seeking to take power, but rather to allow someone else[1] to do so. It’s the difference between using a weapon and designing a weapon for someone else’s use. An important part is that this “someone else” could very well disagree with MIRI about a large number of things, so there need not be any natural allyship or community or agreement between them.
If you are a blacksmith working in a forge and someone comes into your shop and says “build me a sword so I can use it to kill the king and take control of the realm,” and you agree to do so but do not expect to get anything out-of-the-ordinary in return (in terms of increased power, status, etc), it seems weird and non-central to call your actions power-seeking. You are simply empowering another, different power-seeker. You are not seeking any power of your own.
Who was both in a position of power at an AI lab capable of designing a general intelligence and sufficiently clear-headed about the dangers of powerful AI to understand the need for such a strategy.
My understanding was there were 4 phases in which the Singularity Institute / MIRI had 4 different plans.
~2000 to ~2004:
Plan:
Build a recursively self-improving seed AI as quickly as possible →
That seed AI Fooms →
It figures out the Good and does it.
[Note: Eliezer has explicitly disendorsed everything that he believed in this period, unless othrwise noted.]
~2004 to ~2016:
Update: “Wait. Because of the Orthogonality thesis, not all seed AIs will converge to values that we consider good, even though they’re much smarter than us. The supermajority of seed AIs don’t. We have to build in humane values directly, or building a recursively self-improving AGI will destroy both the world and everything of value in the world.”
New plan:
Figure out the math of motivationaly-stable self-improvement, figure the deep math of cognition →
use both to build a seed AI, initialized to implement Coherent Extrapolated Volition ->
let that seed AI recursively self improve into a singleton / sovereign with a decisive strategic advantage →
that singleton now uses its decisive strategic advantage to optimize the universe.
(“World domination is such an ugly phrase. I prefer world optimization”).
~2016 to ~2021:
Update: “it turns out Deep Learning is general enough that it is possible to build AGI with relatively brute force methods, without having much deep insight into the nature of cognition. AI timelines are shorter than we thought. Fuck. There isn’t time to figure out how to do alignment deeply and completely, at the level that would be required to trust an AI to be a sovereign, and optimize the whole universe.”
New plan:
Figure out enough of alignment to build the minimal AGI system that can preform a pivotal act, in a tightly controlled circumstances, with lots of hacky guardrails and speed-bumps →
Build such a limited AGI →
Deploy that AGI to do a pivotal act to prevent any competitor projects from building a more dangerous unbounded AGI.
~2021 to present:
Update: “We can’t figure out even that much of the science of alignment in time. The above plan is basically doomed. We think the world is doomed. Given that, we might as well try outreach:”
New plan:
Do outreach →
Get all the Great Powers in the world to join and enforce an treaty that maintains a world-wide ban on large training runs →
Do biotech projects that can produce humans that are smart enough that they have security mindset not out of a special personal disposition, but just because they’re smart enough to see the obviousness of it by default →
Those superhumans solve alignment and (presumably?) implement more or less the pre-2016 MIRI plan.
I think interstice’s summary is basically an accurate representation of the ~2001 to ~2016 plan. They’re only mistaken in that MIRI didn’t switch away from that plan until recently.
Nice overview, I agree but I think the 2016-2021 plan could still arguably be described as “obtain god-like AI and use it to take over the world”(admittedly with some rhetorical exaggeration, but like, not that much)
I think it’s pretty important that the 2016 to 2021 plan was explicitly aiming to avoid unleashing godlike power. “The minimal amount of power to do a thing which is otherwise impossible”, not “as much omnipotence as is allowed by physics”.
And similarly, the 2016 to 2021 plan did not entail optimizing the world except with regard to what is necessary to prevent dangerous AGIs.
These are both in contrast to the earlier 2004 to 2016 plan. So the rhetorical exaggeration confuses things.
MIRI actually did have a plan that, in my view, is well characterized as (eventually) taking over the world, without exaggeration, that’s apt to get lost if we describe a “toned down” plan as “taking over the world”, because it involves taking powerful, potentially aggressive, action.
This discussion is a nice illustration of why x-riskers are definitely more power-seeking than the average activist group. Just like Eskimos proverbially have 50 words for snow, AI-risk-reducers need at least 50 terms for “taking over the world” to demarcate the range of possible scenarios. ;)
...fwiw I think it’s not grossly inaccurate.
I think MIRI did put a lot of effort into being cooperative about the situation (i.e. Don’t leave your fingerprints on the future, doing the ‘minimal’ pivotal act that would end the acute risk period, and when thinking about longterm godlike AI, trying to figure out fair CEV sorts of things).
But, I think it was also pretty clear that “have a controllable, safe AI that’s just powerful enough to take some action that prevents anyone else from building a more powerful and more dangerous AI” were not in the overton window. I don’t know what Eliezer’s actual plan was since he disclaimed “yes I know melt all the GPUs won’t work”, but, like, “melt all the GPUs” implies a level of power over the world that is really extreme by historical standards, even if you’re trying to do the minimal thing with that power.
I think the plan implies having the capability that if you wanted to, you could take over the world, but having the power to do something and actually doing it are quite different. When you say “MIRI wanted to take over the world”, the central meanings of that that come to mind for me is “take over all the governments, be in charge of all the laws and decision-making, be world dictator, take possession of all the resources” and probably also “steer humanity’s future in a very active way”. Which is very very not their intention and if someone goes around saying MIRI’s plan was to take over the world without any clarification leaving the reader to think the above, then I think they’re being very darn misleading.
When you read the Sequences, was your visualization of a Friendly AI going to let the governments of North Korea or Saudi Arabia persist? Would it allow parents to abuse their children in ways that are currently allowed by the law (and indeed enshrined by the law, in that the law give parents authority over their children)? Does it allow the factory farms to continue to run? How about the (then contemporaneous) US occupations of Iraq and Afghanistan?
(This is a non- rehtorical question. I wonder if we were visualizing different things.)
Speaking for myself, I would say:
It’s a superintelligence, and so it can probably figure out effective peaceful ways to accomplish it’s goals. But among it’s goals will be the dismantling of many and likely all of the world’s major governments, not to mention a bunch of other existing power structures. A government being dismantled by a superhuman persuader is, in many but not all ways, as unsettling as it being destroyed by military force.
Perhaps humanity as a whole, and every individual human, would be made better off by a CEV-aligned friendly singleton, but I think the US government, as an entity, would be rightly threatened.
Doesn’t this very answer show that an AI such as you describe would not be reasonably describable as “Friendly”, and that consequently any AI worthy of the term “Friendly” would not do any of the things you describe? (This is certainly my answer to your question!)
No. “Friendly” was a semi-technical term of art, at the time. It may turn out that a Friendly AI (in the technical sense) is not or, even can’t be, “friendly” in a more conventional sense.
Er… yes, I am indeed familiar with that usage of the term “Friendly”. (I’ve been reading Less Wrong since before it was Less Wrong, you know; I read the Sequences as they were being posted.) My comment was intended precisely to invoke that “semi-technical term of art”; I was not referring to “friendliness” in the colloquial sense. (That is, in fact, why I used the capitalized term.)
Please consider the grandparent comment in light of the above.
In that case, I answer flatly “no”. I don’t expect many existing governmental institutions to be ethical or legitimate in the eyes of CEV, if CEV converges at all. Factory Farming is right out.
You don’t think that most humans would be opposed to having an AI dismantle their government, deprive them of affordable meat, and dictate how they can raise their children?
Whether most existing humans would be opposed is not a criterion of Friendliness.
I think if you described what was going to happen many and maybe humans would say they prefer the status quo to a positive CEV-directed singularity. Perhaps it depends on which parts of “what’s going to happen” you focus on, some are more obviously good or exciting than others. Curing cancer is socially regarded as 👍 while curing death and dismantling governments are typically (though not universally) regarded as 👎.
I don’t think they will actually provide much opposition, because a superhuman persuader will be steering the trajectory of events. (Ostensively, by using only truth tracking arguments and inputs that allow us to converge on the states of belief that we would reflectively prefer, but we mere humans won’t be able to distinguish that from malicious superhuman manipulation.)
But again, how humans would react is neither here nor there for what a Friendly AI does. The AI does what the CEV of humans would want, not what the humans want.
And… you claim that the CEV of existing humans will want those things?
Part of the whole point of CEV is to discover at least some things that current humanity is confused about but would want if fully informed, with time to think. It’d be surprising to me if CEV-existing-humanity didn’t turn out to want some things that many current humans are opposed to.
Sure. Now, as far as I understand it, whether the extrapolated volition of humanity will even cohere is an open question (on any given extrapolation method; we set aside the technical question of selecting or constructing such a method).
So Eli Tyre’s claim seems to be something like: on [ all relevant / the most likely / otherwise appropriately selected ] extrapolation methods, (a) humanity’s EV will cohere, (b) it will turn out to endorse the specific things described (dismantling of all governments, removing the supply of factory farmed meat, dictating how people should raise their children).
Right?
I’m much more doubtful than most people around here about whether CEV coheres: I guess that the CEV of some humans wireheads themselves and the CEV of other humans doesn’t, for instance.
But I’m bracketing that concern for this discussion. Assuming CEV coheres, then yes I predict that it will have radical (in the sense of a political radical who’s beliefs are extremely outside of the overton window, such that they are disturbing to the median voter) views about all of those things.
But more confidently, I predict that it will have radical views about a very long list of things that are commonplace in 2024, even if it turns out that I’m wrong about this specific set.
CEV asks what would we want if we knew everything the AI knows. There are dozens of things that I think that I know, that if the average person knew to be true, would invalidate a lot of their ideology. Basic
If the average person knew everything that an AGI knows (which includes potentially millions of subjective years of human science, whole new fields, as foundational to one’s worldview as economics and probability theory is to my current worldview), and they had hundreds of subjective years to internalize those facts and domains, in a social context that was conducive to that, with (potentially) large increases in their intelligence, I expect their views are basically unrecognizable after a process like that.
As a case in point, most people consider it catastrophically bad to have their body destroyed (duh). And if you asked them if they would prefer, given their body being destroyed, to have their brain-state recorded, uploaded, and run on a computer, many would say “no”, because it seems horrifying to them.
Most LessWrongers embrace computationalism: they think that living as an upload is about as good as living as a squishy biological robot (and indeed, better in many respects). They would of course choose to be uploaded if their body was being destroyed. Many would elect to have their body destroyed specifically because they would prefer to be uploaded!
That is most LessWrongers think they know something which most people don’t know, but which, if they did know it, would radically alter their preferences and behavior.
I think a mature AGI knows at least thousands of things like that.
So among the things about CEV that I’m most confident about (again, granting that it coheres at all), is that CEV has extremely radical views, conclusions which are horrifying to most people, including probably myself.
If by ‘cohere’ you mean ‘the CEVs of all individual humans match’, then my belief (>99%) is that it is not the case that the CEVs of all individual humans will (precisely) match. I also believe there would be significant overlap between the CEVs of 90+% of humans[1], and that this overlap would include disvaluing two of the three[2] things you asked about (present factory farming and child abuse; more generally, animal and child suffering).
(This felt mostly obvious to me, but you did ask about it a few times, in a way that suggested you expect something different; if so, you’re welcome to pinpoint where you disagree.)
For instance, even if one human wants to create a lot of hedonium, and another human wants to create a lot of individuals living fun and interesting lives, it will remain the case that they both disvalue things like extreme suffering. Also, the former human will probably still find at least some value in what the latter human seeks.
For the part of your question about whether their CEVs would endorse dismantling governments: note that ‘governments’ is a relevantly broad category, when considering that most configurations which are infeasible now will be feasible in the (superintelligence-governed) future. I think these statements capture most of my belief about how most humans’ CEVs would regard things in this broad category.
Most human CEVs would be permissive of those who terminally-wish[3] to live in contexts that have some form of harmless government structure.
The category of ‘government’ also includes, e.g., dystopias that create suffering minds and don’t let them leave; most human CEVs would seek to prevent this kind of government from existing.
(None of that implies any government would be present everywhere, nor that anyone would be in such a context against their will; rather, I’m imagining that a great diversity of contexts and minds will exist. I less confidently predict that most will choose to live in contexts without a government structure, considering it unnecessary given the presence of a benevolent ASI.)
(wished for not because it is necessary, for it would not be under a benevolent ASI, but simply because it’s their vision for the context in which they want to live)
I do.
I mean, it depends on the exact CEV procedure. But yes.
I think majority of nations would support dismantling their governments in favor of benevolent superintelligence, especially given correct framework. And ASI can simply solve problem of meat by growing brainless bodies.
Edit: Whoever mega-downvoted this, I’m interested to see you explain why.
Meta: You may wish to know[1] that seeing these terms replaced with the ones you used can induce stress/dissociation in the relevant groups (people disturbed by factory farming and child abuse survivors). I am both and this was my experience. I don’t know how common it would be among LW readers of those demographics specifically, though.
The one you responded to:
Your response:
I’m framing this as sharing info you (or a generalized altruistic person placed in your position) may care about rather than as arguing for a further conclusion.
I’m confused about your question. I am not imagining that the pivotal act AI (the AI under discussion) does any of those things.
Right, but I’m asking about what you’re visualization of a Friendly AI as described in the sequences, not a limited AGI for a pivotal act.
I’m confused by your confusion! Are you saying that that’s a non-sequitur, because I’m asking about a CEV-sovereign instead of a corrigible, limited genie or oracle?
It seems relevant to me, because both of those were strategic goals for MIRI at various points in it’s history, and at least one of them seem well characterized as “taking over the world” (or at least something very nearby to that). Which seems germane to the discussion at hand to me.
I would be surprised if a Friendly AI resulted in those things being left untouched.
I think that is germane but maybe needed some bridging/connecting work since this thread so far was about MIRI-as-having-pivotal-act-goal. Whereas I was less sure about whether MIRI itself would enact a pivotal act if they could than Habryka, my understanding was they had no plan to create a sovereign for most of their history (like after 2004) and so doesn’t seem like that’s a candidate for them having a plan to take over the world.
Yeah, I think that’s false.
The plan was “Figure out how to build a friendly AI, and then build one”. (As Eliezer stated in the video that I linked somewhere else in this comment thread).
But also, I got that impression from the Sequences? Like Eliezer talks about actually building an AGI, not just figuring out the theory of how to build one. You didn’t get that impression?
I don’t remember what exactly I thought in 2012 when I was reading the Sequences. I do recall sometime later, after DL was in full swing, it seeming like MIRI wasn’t in any position to be building AGI before others (like no compute, not the engineering prowess), and someone (not necessarily at MIRI) confirmed that wasn’t the plan. Now and at the time, I don’t know how much that was principle vs ability.
My feeling of the plan pre-pivotal-act era was “figure out the theory of how to build a safe AI at all, and try to get whoever is building to adopt that approach”, and that MIRI wasn’t taking any steps to be the ones building it. I also had the model that due to psychological unity of mankind, anyone building an aligned[ with them] AGI was a good outcome compared to someone building unaligned. Like even if it was Xi Jinping, a sovereign aligned with him would be okay (and not obviously that dramatically different from anyone else?). I’m not sure how much this was MIRI positions vs fragments that I combined in my own head that came from assorted places and were never policy.
Well, I can tell you that they definitely planned to build the Friendly AI, after figuring out how.
See this other comment.
Pretty solid evidence.
“Taking over” something does not imply that you are going to use your authority in a tyrannical fashion. People can obtain control over organizations and places and govern with a light or even barely-existent touch, it happens all the time.
Would you accept “they plan to use extremely powerful AI to institute a minimalist, AI-enabled world government focused on preventing the development of other AI systems” as a summary? Like sure, “they want to take over the world” as a gist of that does have a bit of an editorial slant, but not that much of one. I think that my original comment would be perceived as much less misleading by the majority of the world’s population than “they just want to do some helpful math uwu” in the event that these plans actually succeeded. I also think it’s obvious that these plans indicate a far higher degree of power-seeking(in aim at least) than virtually all other charitable organizations.
(..and to reiterate, I’m not taking a strong stance on the advisability of these plans. In a way, had they succeeded, that would have provided a strong justification for their necessity. I just think it’s absurd to say that the organization making them is less power-seeking than the ADL or whatever)
No. Because I don’t think that was specified or is necessary for a pivotal act. You could leave all existing government structures intact and simply create an invincible system that causes any GPU farm larger than a certain size to melt. Or something akin to that that doesn’t require replacing existing governments, but is a quite narrow intervention.
It wasn’t specified but I think they strongly implied it would be that or something equivalently coercive. The “melting GPUs” plan was explicitly not a pivotal act but rather something with the required level of difficulty, and it was implied that the actual pivotal act would be something further outside the political Overton window. When you consider the ways “melting GPUs” would be insufficient a plan like this is the natural conclusion.
I don’t think you would need to replace existing governments. Just block all AI projects and maintain your ability to continue doing so in the future via maintaining military supremacy. Get existing governments to help you, or at least not interfere, via some mix of coercion and trade. Sort of a feudal arrangement with a minimalist central power.
That to me is a very very non-central case of “take over the world”, if it is one at all.
This is about “what would people think when they hear that description” and I could be wrong, but I expect “the plan is to take over the world” summary would lead people to expect “replace governments” level of interference, not “coerce/trade to ensure this specific policy”—and there’s a really really big difference between the two.
I think this whole debate is missing the point I was trying to make. My claim was that it’s often useful to classify actions which tend to lead you to having a lot of power as “structural power-seeking” regardless of what your motivations for those actions are. Because it’s very hard to credibly signal that you’re accumulating power for the right reasons, and so the defense mechanisms will apply to you either way.
In this case MIRI was trying to accumulate a lot of power, and claiming that they were aiming to use it in the “right way” (do a pivotal act) rather than the “wrong way” (replacing governments). But my point above is that this sort of claim is largely irrelevant to defense mechanisms against power-seeking.
(Now, in this case, MIRI was pursuing a type of power that was too weird to trigger many defense mechanisms, though it did trigger some “this is a cult” defense mechanisms. But the point cross-applies to other types of power that they, and others in AI safety, are pursuing.)
I don’t super buy this. I don’t think MIRI was trying to accumulate a lot of power. In my model of the world they were trying to design a blueprint for some institution or project that would mostly have highly conditional power, that they would personally not wield.
In the metaphor of classical governance, I think what MIRI was doing was much more “design a blueprint for a governance agency” not “put themselves in charge of a governance agency”. Designing a blueprint is not a particularly power-seeking move, especially if you expect other people to implement it.
I got your point and think it’s valid and I don’t object to calling MIRI structurally power-seeking to the extent they wanted to execute a pivotal act themselves (Habryka claims they weren’t, I’m not knowledgeable on that front).
I still think it’s important to push back against a false claim that someone had the goal of taking over the world.
Are you saying that AIS movement is more power-seeking than environmentalist movement that spent 30M$+ on lobbying in single 2023 and has political parties in 90 countries, in five countries—in ruling coalition? For comparison, this paper in Politico with maximally negative attitude mentions AIS lobbying around 2M$.
It’s like saying “NASA default plan is to spread light of consciousness across the stars”, which is kinda technically true, but in reality NASA actions are not as cool as this phrase implies. “MIRI default plan” was “to do math in hope that some of this math will turn out to be useful”.
I think that AIS lobbying is likely to have more consequential and enduring effects on the world than environmental lobbying regardless of the absolute size in body count or amount of money, so yes.
I mean yeah, that is a better description of their publicly-known day-to-day actions, but intention also matters. They settled on math after it became clear that the god AI plan was not achievable(and recently, gave up on the math plan too when it became clear that was not realistic). An analogy might be an environmental group that planned to end pollution by bio-engineering a microbe to spread throughout the world that made oil production impossible, then reluctantly settled for lobbying once they realized they couldn’t actually make the microbe. I think this would be a pretty unusually power-seeking plan for an environmental group!
The point of the OP is not about effects, it’s about AIS being visibly more power-seeking than other movements and causing backlash in response to visible activity.