Superintelligent AI is necessary for an amazing future, but far from sufficient
(Note: Rob Bensinger stitched together and expanded this essay based on an earlier, shorter draft plus some conversations we had. Many of the key conceptual divisions here, like “strong utopia” vs. “weak utopia” etc., are due to him.)
I hold all of the following views:
Building superintelligent AI is profoundly important. Aligned superintelligence is our best bet for taking the abundant resources in the universe and efficiently converting them into flourishing and fun and art and beauty and adventure and friendship, and all the things that make life worth living.[1]
The best possible future would probably look unrecognizably alien. Unlocking humanity’s full potential not only means allowing human culture and knowledge to change and grow over time; it also means building and becoming (and meeting and befriending) very new and different sorts of minds, that do a better job of realizing our ideals than the squishy first-pass brains we currently have.[2]
The default outcome of building artificial general intelligence, using anything remotely like our current techniques and understanding, is not a wondrously alien future. It’s that humanity accidentally turns the reachable universe into a valueless wasteland (at least up to the boundaries defended by distant alien superintelligences).
The reason I expect AGI to produce a “valueless wasteland” by default, is not that I want my own present conception of humanity’s values locked into the end of time.
I want our values to be able to mature! I want us to figure out how to build sentient minds in silicon, who have different types of wants and desires and joys, to be our friends and partners as we explore the galaxies! I want us to cross paths with aliens in our distant travels who strain our conception of what’s good, such that we all come out the richer for it! I want our children to have values and goals that would make me boggle, as parents have boggled at their children for ages immemorial!
I believe machines can be people, and that we should treat digital people with the same respect we give biological people. I would love to see what a Matrioshka mind can do.[3] I expect that most of my concrete ideas about the future will seem quaint and outdated and not worth their opportunity costs, compared to the rad alternatives we’ll see when we and our descendants and creations are vastly smarter and more grown-up.
Why, then, do I think that it will take a large effort by humanity to ensure that good futures occur? If I believe in a wondrously alien and strange cosmopolitan future, and I think we should embrace moral progress rather than clinging to our present-day preferences, then why do I think that the default outcome is catastrophic failure?
In short:
Humanity’s approach to AI is likely to produce outcomes that are drastically worse than, e.g., the outcomes a random alien species would produce.
It’s plausible — though this is much harder to predict, in my books — that a random alien would produce outcomes that are drastically worse (from a cosmopolitan, diversity-embracing perspective!) than what unassisted, unmodified humans would produce.
Unassisted, unmodified humans would produce outcomes that are drastically worse than what a friendly superintelligent AI could produce.
The practical take-away from the first point is “the AI alignment problem is very important”; the take-away from the second point is “we shouldn’t just destroy ourselves and hope aliens end up colonizing our future light cone, and we shouldn’t just try to produce AI via a more evolution-like process”;[4] and the take-away from the third point is “we shouldn’t just permanently give up on building superintelligent AI”.
To clarify my views, Rob Bensinger asked me how I’d sort outcomes into the following broad bins:
Strong Utopia: At least 95% of the future’s potential value is realized.
Weak Utopia: We lose 5+% of the future’s value, but the outcome is still at least as good as “tiling our universe-shard with computronium that we use to run glorious merely-human civilizations, where people’s lives have more guardrails and more satisfying narrative arcs that lead to them more fully becoming themselves and realizing their potential (in some way that isn’t railroaded), and there’s a far lower rate of bad things happening for no reason”.
(“Universe-shard” here is short for “the part of our universe that we could in principle reach, before running into the cosmic event horizon or the well-defended borders of an advanced alien civilization”.
Pretty Good: The outcome is worse than Weak Utopia, but at least as good as “tiling our universe-shard with computronium that we use to run lives around as good and meaningful as a typical fairly-happy circa-2022 human”.
Conscious Meh: The outcome is worse than the “Pretty Good” scenario, but isn’t worse than an empty universe-shard. Also, there’s a lot of conscious experience in the future.
Unconscious Meh: Same as “Conscious Meh”, except there’s little or no conscious experience in our universe-shard’s future. E.g., our universe-shard is tiled with tiny molecular squiggles (a.k.a. “molecular paperclips”).
Weak Dystopia: The outcome is worse than an empty universe-shard, but falls short of “Strong Dystopia”.
Strong Dystopia: The outcome is about as bad as physically possible.
For each of the following four scenarios, Rob asked how likely I think it is that the outcome is a Strong Utopia, a Weak Utopia, etc.:
ASI-boosted humans — We solve all of the problems involved in aiming artificial superintelligence at the things we’d ideally want.
unboosted humans — Somehow, humans limp along without ever developing advanced AI or radical intelligence amplification. (I’ll assume that we’re in a simulation and the simulator keeps stopping us from using those technologies, since this is already an unrealistic hypothetical and “humans limp along without superintelligence forever” would otherwise make me think we must have collapsed into a permanent bioconservative dictatorship.)
ASI-boosted aliens — A random alien (that solved their alignment problem and avoided killing themselves with AI) shows up tomorrow to take over our universe-shard, and optimizes the shard according to its goals.
misaligned AI — Humans build and deploy superintelligent AI that isn’t aligned with what we’d ideally want.
These probabilities are very rough, unstable, and off-the-cuff, and are “ass numbers” rather than the product of a quantitative model. I include them because they provide somewhat more information about my view than vague words like “likely” or “very unlikely” would.
(If you’d like to come up with your own probabilities before seeing mine, here’s your chance. Comment thread.)
.
.
.
.
.
(Spoiler space)
.
.
.
.
.
With rows representing odds ratios:
Strong Utopia | Weak Utopia | Pretty Good | Con. Meh | Uncon. Meh | Weak Dystopia | Strong Dystopia | |
ASI-boosted humans | 1 | ~0 | ~0 | ~0 | ~0 | ~0 | ~0 |
unboosted humans[5] | 1 | 5 | 7 | 5 | 14 | 1 | ~0 |
ASI-boosted aliens | 1 | 1 | 40 | 20 | 25 | 5 | ~0 |
misaligned AI | ~0 | ~0 | ~0 | 1 | 9 | ~0 | ~0 |
“~0” here means (in probabilities) “greater than 0%, but less than 0.5%”. Converted into (rounded) probabilities by Rob:
Below, I’ll explain why my subjective distributions look roughly like this.
Unboosted humans << Friendly superintelligent AI
I don’t think it’s plausible, in real life, that humanity goes without ever building superintelligence. I’ll discuss this scenario anyway, though, in order to explain why I think it would be a catastrophically bad idea to permanently forgo superintelligence.
If humanity were magically unable to ever build superintelligence, my default expectation (ass number: 4:1 odds in favor) is that we’d eventually be stomped by an alien species (or an alien-built AI). Without the advantages of maxed-out physically feasible intelligence (and the tech unlocked by such intelligence), I think we would inevitably be overpowered.
At that point, whether the future goes well or poorly would depend entirely on the alien’s / AI’s values, with human values only playing a role insofar as the alien/AI terminally cares about our preferences.
Why think that humanity will ever encounter aliens?
My current tentative take on the Fermi paradox is:
If it’s difficult for life to evolve (and therefore there are no aliens out there), then we should expect humanity to have evolved at a random (complex-chemistry-compatible) point in the universe’s history.
If instead life evolves pretty readily, then we should expect the future to be tightly controlled by expansionist aliens who want more resources in order to better achieve their goals. (Among other things, because a wide variety of goals imply expansionism.)
We should then expect new intelligent species (including humans) to all show up about as early in the universe’s history as possible, since new life won’t be able to arise later in the universe’s history (when all the resources will have already been grabbed).
We should also expect expansionist aliens to expand outwards, in all directions, at an appreciable fraction of the speed of light. This implies that if the aliens originate far from Earth, we should still expect to only be able to see them in the night sky for a short window of time on cosmological timescales (maybe a few million years?).
Furthermore, if humanity seems to have come into existence pretty early on cosmological scales, then we can roughly estimate the distance to aliens by looking at exactly how early we are.
If we evolved a million years later than we could have, then intelligent life cannot be so plentiful that there exist lots of aliens (some of them resource-hungry) within a million light years of us, or Earth would have been consumed already.
It looks like intelligent life indeed evolved on Earth pretty early on cosmological timescales, maybe (rough order-of-magnitude) within a billion years of when life first became feasibly possible in this universe.
Intelligent life maybe could have evolved ~100 million years earlier on Earth, during the Mesozoic period, if it didn’t get hung up on dinosaurs. Which means that we’re not as early as we can possibly get; we’re at least 100 million years late.
We could be even later than that, if Earth itself arrived on the scene late. But it’s plausible that first- and second-generation stars didn’t produce many planets with the complex chemistry required for life, which limits how much earlier life could have arisen.
I’d be somewhat surprised to hear that the Earth is ten billion years late, though I don’t know enough cosmology to be confident; so I’ll treat ten billion years as a weak upper bound on how early a lot of intelligent aliens start arising, and 100 million years as a lower bound.
This “we’re early” observation provides at least weak-to-moderate evidence that we’re in the second scenario, and that intelligent life therefore evolves readily. We should therefore expect to encounter aliens one day, if we spread to the stars — though plausibly none that evolved much earlier than we did (on cosmic timescales).
This argument also suggests that we should expect the nearest aliens to be more than 100 million light-years away; and we shouldn’t expect aliens to have more of a head start than they are distant. E.g., aliens that evolved a billion years earlier than we did are probably more than a billion light-years away.
This means that even if there are aliens in our future light-cone, and even if those aliens are friendly, there’s still quite a lot at stake in humanity’s construction of AGI, in terms of whether the Earth-centered ~250-million-light-year-radius sphere of stars goes towards Fun vs. towards paperclips.
(Robin Hanson has made some related arguments about the Fermi paradox, and various parts of my model are heavily Hanson-influenced. I attribute many of the ideas above to him, though I haven’t actually read his “grabby aliens” paper and don’t know whether he would disagree with any of the above.)
Why think that most aliens succeed in their version of the alignment problem?
I don’t have much of an argument for this, just a general sense that the problem is “hard but not that hard”, and a guess that a fair number of alien species are smarter, more cognitively coherent, and/or more coordinated than humans at the time they reach our technological level. (E.g., a hive-mind species would probably have an easier time solving alignment, since they wouldn’t need to rush.)
I’m currently pessimistic about humanity’s odds of solving the alignment problem and escaping doom, but it seems to me that there are a decent number of disjunctive paths by which a species could be better-equipped to handle the problem, given that it’s strongly in their interest to handle it well.
If I have to put a number on it, I’ll wildly guess that 1⁄3 of technologically advanced aliens accidentally destroy themselves with misaligned AI.[6]
My ass-number distribution for “how well does the future go if humans just futz around indefinitely?” is therefore the sum of “50% chance we get stomped by evolved aliens, 30% chance we get stomped by misaligned alien-built AI, 20% chance we retain control of the universe-shard”:
As with many of the numbers in this post, I haven’t reflected on these much, and might revise them if I spent more minutes considering them. But, again, I figure unstable numbers are more informative in this context than just saying “(un)likely”.
Why build superintelligence at all?
So that we don’t get stomped by superintelligent aliens or alien AI; and so that we can leverage superhuman intelligence to make the future vastly better.
(Seriously, humans, with our <10 working memory slots, are supposed to match minds that can potentially attend to millions of complex thoughts in their mind at once in all sorts of complex relationships??)
In real life, the reason I’m in a hurry to solve the AI alignment problem is because humanity is racing to build AGI, at which point we’ll promptly destroy ourselves (and all of the future value in our universe-shard) with misaligned AGI, if the tech proliferates much. And AGI is software, so preventing proliferation is hard — hard enough that I haven’t heard of a more promising solution than “use one of the first AGIs to restrict proliferation”. But this requires that we be able to at least align that system, to perform that one act.
In the long run, however, the reason I care about the alignment problem is that “what should the future look like?” is a subtle and important problem, and humanity will surely be able to answer it better if we have access to reliable superintelligent cognition.
(Though “we need superintelligence for this” doesn’t entail “superintelligence will do everything for us”. It’s entirely plausible to me that aligned AGI does something like “set up some guardrails for humanity, but then pass lots of the choices about how our future goes back to us”, with the result that mere-humans end up having lots of say over how the future looks (including the sorts of weirder minds we build or become).)
The “easy” alignment problem is the problem of aiming AGI at a task that restricts proliferation (at least until we can get our act together as a species).
But the main point of restricting proliferation, from my perspective, is to give humanity as much time as it needs to ultimately solve the “hard” alignment problem: aiming AGI at arbitrary tasks, including ones that are far more open-ended and hard-to-formalize.
Intelligence is our world’s universal problem-solver; and more intelligence can mean the difference between finding a given solution quickly, and never finding it at all. So my default guess is that giving up on superintelligence altogether would result in a future that’s orders of magnitude worse than a future where we make use of fully aligned superintelligence.
Fortunately, I see no plausible path by which humanity would prevent itself from ever building superintelligence; and not many people are advocating for such a thing. (Instead, EAs are doing the sane thing of advocating for delaying AGI until we can figure out alignment.) But I still think it’s valuable to keep the big picture in view.
OK, but what if we somehow don’t build superintelligence? And don’t get stomped by aliens or alien AI, either?
My ass-number distribution for that scenario, stated as an odds ratio, is something like:
Strong Utopia | Weak Utopia | Pretty Good | Con. Meh | Uncon. Meh | Weak Dystopia | Strong Dystopia |
10 | 50 | 2 | 5 | 5 | 1 | ~0 |
I.e.:
The outcome’s goodness depends a lot on exactly how much intelligence amplification or AI assistance we allow in this hypothetical; and it depends a lot on whether we manage to destroy ourselves (or permanently cripple ourselves, e.g., with a stable totalitarian regime) before we develop the civilizational tech to keep ourselves from doing that.
If we really lock down on human intelligence and coordination ability, that seems real rough. But if there’s always enough freedom and space-to-expand that pilgrims can branch off and try some new styles of organization when the old ones are collapsing under their bureaucratic weight or whatever, then I expect that eventually even modern-intelligence humans start capturing lots and lots of the stars and converting lots and lots of stellar negentropy into fun.[7]
If you don’t have the pilgrimage-is-always-possible clause, then there’s a big chance of falling into a dark attractor and staying there, and never really taking advantage of the stars.
In constraining human intelligence, you’re closing off the vast majority of the space of exploration (and a huge fraction of potential value). But there’s still a lot of mindspace to explore without going too far past current intelligence levels.
In good versions of this scenario, a lot of the good comes from humans being like, “I guess we do the same things the superintelligence would have done, but the long way.” Humanity has to do the work that a superintelligence would naturally handle at a lot of junctures to make the future eudaimonic.
It’s probably possible to eventually do a decent amount of that work with current-human minds, if you have an absurdly large number of them collaborating just right, and if you’re willing to go very slow. (And if humanity hasn’t locked itself into a bad state.)
I’ll note in passing that the view I’m presenting here reflects a super low degree of cynicism relative to the surrounding memetic environment. I think the surrounding memetic environment says “humans left unstomped tend to create dystopias and/or kill themselves”, whereas I’m like, “nah, you’d need somebody else to kill us; absent that, we’d probably do fine”. (I am not a generic cynic!)
Still, ending up in this scenario would be a huge tragedy, relative to how good the future could go.
A different way of framing the question “how good is this scenario?” is “would you rather really quite a lot of the alien ant-queen’s will, or a smidge of poorly-implemented fun?”.
In that case, I suspect (non-confidently) that I’d take the fun over the ant-queen’s will. My guess is that the aliens-control-the-universe-shard scenario is net-positive, but that it loses orders of magnitude of cosmopolitan utility compared to the “cognitively constrained humans” scenario.
To explain why I suspect this, I’ll state some of my (mostly low-confidence) guesses about the distribution of smart non-artificial minds.
Alien CEV << Human CEV
On the whole, I’m highly uncertain about the expected value of “select an evolved alien species at random, and execute their coherent extrapolated volition (CEV) on the whole universe-shard”.
(Quoting Eliezer: “In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.”)
My point estimate is that this outcome is a whole lot better than an empty universe, and that the bad cases (such as aliens that both are sentient and are unethical sadists) are fairly rare. But humans do provide precedent for sadism and sentience! And it sure is hard to be confident, in either direction, from a sample size of 1.
Moreover, I suspect that it would be good (in expectation) for humans to encounter aliens someday, even though this means that we’ll control a smaller universe-shard.
I suspect this would be a genuinely better outcome than us being alone, and would make the future more awesome by human standards.
To explain my perspective on this, I’ll talk about a few different questions in turn:
How many technologically advanced alien species are sentient?
How likely is it that such aliens produce extremely-good or extremely-bad outcomes?
How will aliens feel about us?
How many aliens are more like paperclip maximizers? How many are more like cosmic brethren to humanity, with goals that are very alien but (in their alien way) things of wonder, complexity, and beauty?
How should we feel about our alien brethren?
What do I mean by “good”, “bad”, “cosmopolitan value”, etc.?
How many advanced alien species are sentient?
I expect more overlap between alien minds and human minds, than between AI minds (of the sort we’re likely to build first, using methods remotely resembling current ML) and human minds. But among aliens that were made by some process that’s broadly similar to how humans evolved, it’s pretty unclear to me what fraction we would count as “having somebody home” in the limit of a completed science of mind.
I have high enough uncertainty here that picking a median doesn’t feel very informative. I have maybe 1:2 or 1:3 odds on “lots of advanced alien races are sentient” : “few advanced alien races are sentient”, conditional on my current models not including huge mistakes. (And I’d guess there’s something like a 1⁄4 chance of my models containing huge mistakes here, in which case I’m not sure what my distribution looks like.)
“A nonsentient race that develops advanced science and technology” may sound like a contradiction in terms: How could a species be so smart and yet lack “somebody there to feel things” in the way humans seem to? How could something perform such impressive computations and yet “the lights not be on”?
I won’t try to give a full argument for this conclusion here, as this would require delving into my (incomplete but nontrivial) models of what’s going on in humans just before they insist that there’s something it’s like to be them.[8] (As well as my model of evolutionary processes and general intelligence, and how those connect to consciousness.) But I’ll say a few words to hopefully show why this claim isn’t a wild claim, even if you aren’t convinced of it.
My current best models suggest that the “somebody-is-home” property is a fairly contingent coincidence of our evolutionary history.
On my model, human-style consciousness is not a necessary feature of all optimization processes that can efficiently model the physical world and sort world-states by some criterion; nor is it a necessary feature of all optimization processes that can abstractly represent their own state within a given world-model.
To better intuit the idea of a very smart and yet unconscious process, it might help to consider a time machine that outputs a random sequence of actions, then resets time and outputs a new sequence of actions, unless a specified outcome occurs.
The time machine does no planning, no reflection, no learning, no thinking at all. It just detects whether an outcome occurs, and hits “refresh” on the universe if the outcome didn’t happen.
In spite of this lack of reasoning, this time machine is an incredibly powerful optimizer. It exhibits all the behavioral properties of a reasoner, including many of the standard difficulties of (outer) AI alignment.
If the machine resets any future that isn’t full of paperclips, then we should expect it to reset until machinery exists that’s busily constructing von Neumann probes for the sake of colonizing the universe and paperclipping it.
And we should expect the time machine and the infrastructure it builds to be well-defended, since “you can’t make the coffee if you’re dead”, and you can’t make paperclips without manufacturing equipment. The optimization process exhibits convergent instrumental behavior and behaves as though it’s “trying” to route around obstacles and adversaries, even though there’s no thinking-feeling mind guiding it.[9]
You can’t actually build a time machine like this, but the example helps illustrate the fact that in principle, powerful optimization — steering the future into very complicated and specific states of affairs, including states that require long sequences of events to all go a specific way — does not require consciousness.
We recognize that a textbook can store a lot of information and yet not “experience” that information. What’s less familiar is the idea that powerful optimization processes can optimize without “experience”, partly (I claim) because we live in a world where there are many simpler information-storing and computational systems, but where the only powerful optimization processes are humans.
Moreover, we don’t know on a formal level what general intelligence or sentience consists in, so we only have our evolved empathy to help us model and predict the (human) general intelligences in our environment. Our “subjective point of view”, from that empathic perspective, feels like something basic and intrinsic to every mental task we perform, rather than feeling like a complicated set of cogs and gears doing specific computational tasks.
So when something is not only “storing a lot of useful information” but “using that information to steer environments, like an agent”, it’s natural for us to use our native agent-modeling software (i.e., our human-brain-modeling software) to try to simulate its behavior. And then it just “feels as though” this human-like system must be self-aware, for the same reason it feels obvious that you’re conscious, that other humans are conscious, etc.
Moreover, it’s observably the case that consciousness-ascription is hyperactive. We readily see faces and minds in natural phenomena. We readily imagine simple stick-figures in comic strips experiencing rich mental lives.
A concern I have with the whole consciousness discussion in EA-adjacent circles is that people seem to consider their empathic response to be important evidence about the distribution of qualia in Nature, despite the obvious hyperactivity.
Another concern I have is that most people seem to neglect the difference between “exhibiting an external behavior in the same way that humans do, and for the same reasons we do”, and “having additional follow-on internal responses to that behavior”.
An example: If we suppose that it’s very morally important for people to internally subvocalize “I sneezed” after sneezing, and you do this whenever you sneeze, and all your (human) friends report that they do it too, it would nonetheless be a mistake to see a dog sneeze and say: “See! They did the morally relevant thing! It would be weird to suppose that they didn’t, when they’re sneezing for the same ancestral reasons as us!”
The ancestral reasons for the subvocalization are not the same as the ancestral reasons for the sneeze; and we already have an explanation for why animals sneeze, that doesn’t invoke any process that necessarily produces a follow-up subvocalization.
None of this rules out that dogs subvocalize in a dog-mental-language, on its own; but it does mean that drawing any strong inferences here requires us to have some model of why humans subvocalize.
We can debate what follow-on effects are morally relevant (if any), and debate what minds exhibit those effects. But it concerns me that “there are other parts downstream of the sneeze / flinch / etc. that are required for sentience, and not required for the sneeze” doesn’t seem to be in many people’s hypothesis space. Instead, they observe a behavioral analog, and move straight to a confident ascription “the internal processes accompanying this behavior must be pretty similar”.
In general, I want to emphasize that a blank map doesn’t correspond to a blank territory. If you currently don’t understand the machinery of consciousness, you should still expect that there are many, many details to learn, whether consciousness is prevalent among alien races or rare.
If a machine isn’t built to notice how complicated or contingent it is when it does a mental action we choose to call “introspection”, it doesn’t thereby follow that the machine is simple, or that it can only be built one way.
Our prior shouldn’t be that consciousness is simple, given the many ways it appears to interact with a wide variety of human mental faculties and behaviors (e.g., its causal effects on the words I’m currently writing); and absent a detailed model of consciousness, you shouldn’t treat your empathic modeling as a robust way of figuring out whether an alien has this particular machinery, since the background facts that make empathic inference pretty reliable in humans (overlap in brain architecture, genes, evolutionary history, etc.) don’t hold across the human-alien gap.
Again, I haven’t given my fragments-of-a-model of consciousness here (which would be required to argue for my probabilities). But I’ve hopefully said enough to move my view from “obviously crazy” to “OK, I see how additional arguments could potentially plug in here to yield non-extreme credences on the prevalence of sapient-but-nonsentient evolved optimizers“.
How likely are extremely good and extremely bad outcomes?
If we could list out the things that 90+% of spacefaring alien races have in common, there’s no guarantee that this list would be very long. I recommend stories like Three Worlds Collide and “Kindness to Kin” for their depiction of genuinely different aliens minds, as opposed to the humans in funny suits common to almost all sci-fi.
That said, I do think there’s more overlap (in expectation) between minds produced by processes similar to biological evolution, than between evolved minds and (unaligned) ML-style minds. I expect more aliens to care about at least some things that we vaguely recognize, even if the correspondence is never exact.
On my models, it’s entirely possible that there just turns out to be ~no overlap between humans and aliens, because aliens turn out to be very alien. But “lots of overlap” is also very plausible. (Whereas I don’t think “lots of overlap” is plausible for humans and misaligned AGI.)
To the extent aliens and humans overlap in values, it’s unclear to me whether this is mostly working to our favor or detriment. It could be that a random alien world tends to be worse than a random AI-produced world, exactly because the alien shares more goal-content in common with us, and is therefore more likely to optimize or pessimize quantities that we care about.
If I had to guess, though, I would guess that this overlap makes the alien scenario better in expectation than the misaligned-AI scenario, rather than worse.
A special case of “values overlap increases variance” is that the worst outcomes non-human optimizers produce, as well as the best ones, are likely to come from conscious aliens. This is because:
Aliens that evolved consciousness are far more likely to end up with consciousness involved in their evolved goals.
Consciousness states have the potential to be far worse than unconscious ones, or far better.
Since I think it’s pretty plausible that most aliens are nonsentient, I expect most alien universe-shards to look “pretty good” or “meh” from a human perspective, rather than “amazing” or “terrible”.
Note that there’s an enormous gap between “pretty dystopian” and “pessimally dystopian”. Across all the scenarios (whether alien, human, or AI), I assign ~0% probability to Strong Dystopia, the sort of scenario you get if something is actively pessimizing the human utility function. “Aliens who we’d rather there be nothing than their CEV” is an immensely far cry from “negative of our CEV”. But I’d guess that even Weak Dystopias are fairly rare, compared to “meh” or good outcomes of alien civilizations.
How will aliens feel about us?
Given that I think aliens plausibly tend to produce pretty cool universe-shards, a natural next question is: if we encounter a random alien race one day, will they tend to be glad that they found us? Or will they tend to be the sort of species that would have paid a significant number of galaxies to have paved over earth before we ascended, so that they could have had all our galaxies instead?
I think my point estimate there is “most aliens are not happy to see us”, but I’m highly uncertain. Among other things, this question turns on how often the mixture of “sociality (such that personal success relies on more than just the kin-group), stupidity (such that calculating the exact fitness-advantage of each interaction is infeasible), and speed (such that natural selection lacks the time to gnaw the large circle of concern back down)” occurs in intelligent races’ evolutionary histories.
These are the sorts of features of human evolutionary history that resulted in us caring (at least upon reflection) about a much more diverse range of minds than “my family”, “my coalitional allies”, or even “minds I could potentially trade with” or “minds that share roughly the same values and faculties as me”.
Humans today don’t treat a family member the same as a stranger, or a sufficiently-early-development human the same as a cephalopod; but our circle of concern is certainly vastly wider than it could have been, and it has widened further as we’ve grown in power and knowledge.
My tentative median guess is that there are a lot of aliens out there who would be grudging trade partners (who would kill us if we were weaker), and also a smaller fraction who are friendly.
I don’t expect significant violent conflict (or refusal-to-trade) between spacefaring aliens and humans-plus-aligned-AGI, regardless of their terminal values, since I expect both groups to be at the same technology level (“maximal”) when they meet. At that level, I don’t expect there to be a cheap way to destroy rival multi-galaxy civilizations, and I strongly expect civilizations to get more of what they want via negotiation and trade than via a protracted war.[10]
I also don’t think humans ought to treat aliens like enemies just because they have very weird goals. And, extrapolating from humanity’s widening circle of concern and increased soft-heartedness over the historical period — and observing that this trend is caused by humans recognizing and nurturing seeds of virtue that they had within themselves already — I don’t expect our descendants in the distant future to behave cruelly toward aliens, even if the aliens are too weak to fight back.[11]
I also feel this way even if the aliens don’t reciprocate!
Like, one thing that is totally allowed to happen is that we meet the ant-people, and the ant-people don’t care about us (and wouldn’t feel remorse about killing us, a la the buggers in Ender’s Game). So they trade with us because they’re not able to kill us, and the humans are like “isn’t it lovely that there’s diversity of values and species! we love our ant-friends” while the aliens are like “I would murder you and lay eggs in your corpse given the slightest opening, and am refraining only because you’re well-defended by force-backed treaty”, and the humans are like “oh haha you cheeky ants” and make webcomics and cartoons featuring cute anthropomorphized ant-people discovering the real meaning of love and friendship and living in peace and harmony with their non-ant-person brothers and sisters.
To which the ant-person response is, of course, “You appear to be imagining empathic levers in my mind that did not receive selection pressure in my EEA. How I long to murder you and lay eggs in your corpse!”
To which my counter-response is, of course: “Oh, you cheeky ants!”
(Respectfully. I don’t mean to belittle them, but I can’t help but be charmed to some degree.)
Like, reciprocity helps, but my empathy and goodwill for others is not contingent upon reciprocation. We can realize the gains from peace, trade, and other positive-sum interactions without being best buddies; and we can like the ants even if the ants don’t like us back.
Cosmopolitan values are good even if they aren’t reciprocated. This is one of the ways that you can tell that cosmopolitan values are part of us, rather than being universal: We’d still want to be fair and kind to the ant-folk, even if they were wanting to lay eggs in our corpse and were refraining only because of force-backed treaty.
This is part of my response to protests “why are you looking at everything from the perspective of human values?” Regard for all sentients, including aliens, isn’t up for grabs, regardless of whether it’s found only in us, or also in them.
How likely (and how good) are various outcomes on the paperclipper-to-brethren continuum?
Short answer: I’m wildly uncertain about how likely various points on this continuum are, and (outside of the most extreme good and bad outcomes) I’m very uncertain about their utility as well.
I expect an alien’s core goals to reflect pretty different shatterings of evolution’s “fitness” goal, compared to core human goals, and compared to other alien races’ goals. (See also the examples in “Niceness is unnatural.”)
I expect most aliens either…
… look something like paperclip/squiggle maximizers from our perspective, converting galaxies into unconscious and uninteresting configurations;
… or look like very-alien brethren, who like totally different things from humans but in a way where we rightly celebrate the diversity;
… or fall somewhere ambiguously in between those two categories.
Figuring out the utility of different points on this continuum (from an optimally reasonable and cosmopolitan perspective) seems like a wide-open philosophy and (xeno)psychology question. Ditto for figuring out the probability of different classes of outcomes.
Concretely: I expect that there’s a big swath of aliens whose minds and preferences are about as weird and unrecognizable to us as the races in Three Worlds Collide — crystalline self-replicators, entities with no brain/genome segregation, etc. — and that turn out to fall somewhere between “explosive self-replicating process that paperclipped the universe and doesn’t have feelings/experiences/qualia” and “buddies”.
If we cross this question with “how likely are aliens to be conscious?”, we get 2x2 scenarios:
conscious | unconscious | |
squiggle maximizer | A sentient alien that converts galaxies into something ~valueless. | A non-sentient alien that converts galaxies into something ~valueless. |
alien brethren | A sentient alien that converts galaxies into something cool. | A non-sentient alien that converts galaxies into something cool. |
I think my point estimate is “a lot more aliens fall on the very-alien-brethren side than on the squiggle-maximizer side”. But I wouldn’t be surprised to learn I’m wrong about that.
My guess would be that the most common variety of alien is “unconscious brethren”, followed by “unconscious squiggle maximizer”, then “conscious brethren”, then “conscious squiggle maximizer”.
It might sound odd to call an unconscious entity “brother”, but it’s plausible to me that on reflection, humanity strongly prefers universes with evolved-creatures doing evolved-creature-stuff (relative to an empty universe), even if none of those creatures are conscious.
Indeed, I consider it plausible that “a universe full of humans trading with a weird extraterrestrial race of crystal formations that don’t have feelings” could turn out to be more awesome than the universe where we never run into any true aliens, even though this means that humans control a smaller universe-shard. It’s plausible to me that we’d turn out not to care all that much about our alien buddies having first-person “experiences”, if they still make fascinating conversation partners, have an amazing history and a wildly weird culture, have complex and interesting minds, etc. (The question of how much we care about whether aliens are in fact sentient, as opposed to merely sapient, seems open to me.)
And also, it’s not clear that “feelings” or “experiences” or “qualia” (or the nearest unconfused versions of those concepts) are pointing at the right line between moral patients and non-patients. These are nontrivial questions, and (needless to say) not the kinds of questions humans should rush to lock in an answer on today, when our understanding of morality and minds is still in its infancy.
How should we feel about encountering alien brethren?
Suppose that we judge that the ant-queen is more like a brother, not a squiggle maximizer. As I noted above, I think that encountering alien brethren would be a good thing, even though this means that the descendants of humanity will end up controlling a smaller universe-shard. (And I’d guess that many and perhaps most spacefaring aliens are probably brethren-ish, rather than paperclipper-ish.)
This is not to say that I think human-CEV and alien-CEV are equally good (as humans use the word “good”). It’s real hard to say what the ratios are between “human CEV”, “unboosted humans”, “random alien CEV (absent any humans)”, and “random misaligned AI”, but my vague intuition is that there’s a big factor drop at each of those steps; and I would guess that this still holds even if we filter out the alien paperclippers and alien unethical sadists.
But it is to say that I think we would be enriched by getting to meet minds that were not ourselves, and not of our own creation. Intuitively, that sounds like an awesome future. And I think this sense of visceral fascination and excitement, the “holy shit that’s cool!” reaction, tends to be an important (albeit fallible) indicator of “which outcomes will we end up favoring upon reflection?”.
It’s a clue to our values that we find this scenario so captivating in our fiction, and that our science fiction takes such a strong interest in the idea of understanding and empathizing with alien minds.
Much of the value of alien civilizations might well come from the interaction of their civilization and ours, and from the fairness (which may well turn out to be a major terminal human value) of them getting their just fraction of the universe.
And in most scenarios like “we meet alien space ants and become trading partners”, I’d guess that the space ants’ own universe-shard probably has more cosmopolitan value than a literally empty universe-shard of the same size. It’s cool, at least! Maybe the ant-queens are even able to experience it, and their experiences are cool; that would make me much more confident that indeed, their universe-shard is a lot better than an empty one. And maybe the ant-queens come pretty close to caring about their kids, in ways that faintly echo human values; who knows?
We should be friendly toward an alien race like that, I claim. But still, I’d expect the vast majority of the cosmopolitan value in a mixed world of humans+ants to come from the humans, and from the two groups’ interaction.
So, for example, my guess is that we shouldn’t be indifferent about whether a particular galaxy ends up in our universe-shard versus an alien neighbor’s shard. (Though this is another question where it seems good to investigate far more thoroughly before locking in a decision.)
And if our reachable universe-shard turns out to be 3x as large and resource-rich as theirs, we probably shouldn’t give them a third of our stars to make it fifty-fifty. I think that humanity values fairness a great deal, but not enough to outweigh the other cosmopolitan value that would be burnt (in the vast majority of cases) if we offered such a gift.[12]
Hold up, how is this “cosmopolitan”?
A reasonable objection to raise here is: “Hold on, how can it be ‘cosmopolitan’ to favor human values over the values of a random alien race? Isn’t the whole point of ‘cosmopolitan value’ that you’re not supposed to prioritize human-specific values over strange and beautiful alien perspectives?”
In short, my response is to emphasize that cosmopolitanism is a human value. If it’s also an alien value, then that’s excellent news; but it’s at least a value that is in us.
When we speak of “better” or “worse” outcomes, we (probably) mean “better/worse according to cosmopolitan values (that also give fair fractions to the human-originated styles of Fun in particular)”, at least if these intuitions about cosmopolitanism hold on reflection. (Which I strongly suspect they do.)
In more detail, my response is:
Cosmopolitanism is a contentful value that’s inside us, not a mostly-contentless function averaging the preferences of all nearby optimizers (or all logically possible optimizers).
The content of cosmopolitanism is complex and fragile, for the same reason unenlightened present-day human values are complex and fragile.
There isn’t anything wrong, or inconsistent, with cosmopolitanism being “in us”. And if there were some value according to which cosmopolitanism is wrong, then that value too would need to be in us, in order to move us.
1. Cosmopolitanism isn’t “indifference” or “take an average of all possible utility functions”.
E.g., a good cosmopolitan should be happier to hear that a weird, friendly, diverse, sentient alien race is going to turn a galaxy into an amazing megacivilization, than to hear that a paperclipper is going to turn a galaxy into paperclips. Cosmopolitanism (of the sort that we should actually endorse) shouldn’t be totally indifferent to what actually happens with the universe.
It’s allowed to turn out that we find a whole swath of universe that is the moral equivalent of “destroyed by the Blight”, which kinda looks vaguely like life if you squint, but clearly isn’t sentient, and we’re like “well let’s preserve some Blight in museums, but also do a cleanup operation”. That’s just also a way that interaction with aliens can go; the space of possible minds (and things left in that mind’s wake) is vast.
And if we do find the Blight, we shouldn’t lie to ourselves that blighted configurations of matter are just as good as any other possible configuration of matter.
It’s allowed to turn out that we find a race of ant-people (who want to kill us and lay eggs in our corpse, yadda yadda), and that the ant-people are getting ready to annihilate the small Fuzzies that haven’t yet reached technological maturity, on a planet that’s inside the ant-people’s universe-shard.
Where, obviously, you trade rather than war for the rights of the Fuzzies, since war is transparently an inefficient way to resolve conflicts.
But the one thing you don’t do is throw away some of your compassion for the Fuzzies in order to “compromise” with the ant-people’s lack-of-compassion.
The right way to do cosmopolitanism is to care about the Fuzzies’ welfare along with the ant-people’s welfare — regardless of whether the Fuzzies or ant-people reciprocate, and regardless of how they feel about each other — and to step up to protect victims from their aggressors.
There’s a point here that the cosmopolitan value is in us, even though it’s (in some sense) not just about us.
These values are not necessarily in others, no matter how much we insist that our values aren’t human-centric, aren’t speciesist, etc. And because they’re in us, we’re willing to uphold them even when we aren’t reciprocated or thanked.
It’s those values that I have in mind when I say that outcomes are “better” or “worse”. Indeed, I don’t know what other standard I could appeal to, if not values that bear some connection to the contents of our own brains.
But, again, the fact that the values are in us, doesn’t mean that they’re speciesist. A human can genuinely prefer non-speciesism, for the same reason a citizen of a nation can genuinely prefer non-nationalism. Looking at the universe through a lens that is in humans does not mean looking at the universe while caring only about humans. The point is that we’ll keep on caring about others, even if we turn out to be alone in that.
2. Cosmopolitan value is fragile, for the same reason unenlightened present-day human values are fragile.
See “Complex Value Systems Are Required to Realize Valuable Futures” and the Arbital article on cosmopolitan value.
There are many ways to lose an enormous portion of the future’s cosmopolitan value, because the simple-sounding phrase “cosmopolitan value” translates into a very complex logical object (making many separate demands of the future) once we start trying to pin it down with any formal precision.
Our prior shouldn’t be that a random intelligent species would happen to have a utility function pointing at exactly the right high-complexity object. So it should be no surprise if a large portion of the future’s value is lost in switching between different alien species’ CEVs, e.g., because half of the powerful aliens are the Blight and another half are the ant-queens, and both of them are steamrolling the Fuzzies before the Fuzzies can come into their own. (That’s a way the universe could be, for all that we protest that cosmopolitanism is not human-centric.)
And even if the aliens turn out to have some respect for something roughly like cosmopolitan values, that doesn’t mean that they’ll get as close as they could if they had human buddies (who have another five hundred million years of moral progress under our belts) in the mix.
3. There is no radically objective View-From-Nowhere utility function, no value system written in the stars.
(… And if there were, the mere fact that it exists in the heavens would not be a reason for human CEV to favor it. Unless there’s some weird component of human CEV that says something like “if you encounter a pile of sand on a planet somewhere that happens to spell out a utility function in morse code, you terminally value switching to some compromise between your current utility function and that utility function”. … Which does not seem likely.)
If our values are written anywhere, they’re written in our brain states (or in some function of our brain states).
And this holds for relatively enlightened, cosmopolitan, compassionate, just, egalitarian, etc. values in exactly the same way that it holds for flawed present-day human values.
In the long run, we should surely improve on our brains dramatically, or even replace ourselves with an entirely new sort of mind (or a wondrously strange intergalactic patchwork of different sorts of minds).
But we shouldn’t be indifferent about which sorts of minds we become or create. And the answer to “which sorts of minds/values should we bring into being?” is some (complicated, not-at-all-trivial-to-identify) function of our current brain. (What else could it be?)
Or, to put it another way: the very idea that our present-day human values are “flawed” has to mean that they’re flawed relative to some value function that’s somehow pointed at by the human brain.
There’s nothing wrong (or even particularly strange) about a situation like “Humans have deeper, stronger (‘cosmopolitan’) values that override other human values like ‘xenophobia’”.
Mostly, we’re just not used to thinking in those terms because we’re used to navigating human social environments, where an enormous number of implicit shared values and meta-values can be taken for granted to some degree. It takes some additional care and precision to bring genuinely alien values into the conversation, and to notice when we’re projecting our own values. (Onto other species, or onto the Universe.)
If a value (or meta-value or meta-meta-value or whatever) can move us to action, then it must be in some sense a human value. We can hope to encounter aliens who share our values to some degree; but this doesn’t imply that we ought (in the name of cosmopolitanism, or any other value) to be indifferent to what values any alien brethren possess. We should probably assist the Fuzzies in staving off the Blight, on cosmopolitan grounds. And given value fragility (and the size of the cosmic endowment), we should expect the cosmopolitan-utility difference between totally independent evolved value systems to be enormous.
This, again, is no reason to be any less compassionate, fair-minded, or tolerant. But also, compassion and fair-mindedness and tolerance don’t imply indifference over utility functions either!
3. The superintelligent AI we’re likely to build by default << Aliens
In the case of aliens, we might imagine encountering them hundreds of millions or billions of years in the future — plenty of time to anticipate and plan for a potential encounter.
In the case of AI, the issue is much more pressing. We have the potential to build superintelligent AI systems very soon; and I expect far worse outcomes from misaligned AI optimizing a universe-shard than from a random alien doing the same (even though there’s obviously nothing inherently worse about silicon minds than about biological minds, alien crystalline minds, etc.).
For examples of why the first AGIs are likely to immediately blow human intelligence out of the water, see AlphaGo Zero and the Foom Debate and Sources of advantage for digital intelligence. For a discussion of why alignment seems hard, and why such systems are likely to kill us if we fail to align them, see So Far and AGI Ruin.
The basic reason why I expect AI systems to produce worse outcomes than aliens is that other evolved creatures are more likely to have overlap with us, by dint of their values being forced by more similar processes. And some of the particular ways in which misaligned AI is likely to differ from an evolved species suggests a much more homogeneous and simple future. (Like “a universe tiled with molecular squiggles”.)[13]
The classic example of AGI ruin is the “paperclip maximizer” (which should probably be called a “molecular squiggle maximizer” instead):
So what actually happens as near as I can figure (predicting future = hard) is that somebody is trying to teach their research AI to, god knows what, maybe just obey human orders in a safe way, and it seems to be doing that, and a mix of things goes wrong like:
The preferences not being really readable because it’s a system of neural nets acting on a world-representation built up by other neural nets, parts of the system are self-modifying and the self-modifiers are being trained by gradient descent in Tensorflow, there’s a bunch of people in the company trying to work on a safer version but it’s way less powerful than the one that does unrestricted self-modification, they’re really excited when the system seems to be substantially improving multiple components, there’s a social and cognitive conflict I find hard to empathize with because I personally would be running screaming in the other direction two years earlier, there’s a lot of false alarms and suggested or attempted misbehavior that the creators all patch successfully, some instrumental strategies pass this filter because they arose in places that were harder to see and less transparent, the system at some point seems to finally “get it” and lock in to good behavior which is the point at which it has a good enough human model to predict what gets the supervised rewards and what the humans don’t want to hear, they scale the system further, it goes past the point of real strategic understanding and having a little agent inside plotting, the programmers shut down six visibly formulated goals to develop cognitive steganography and the seventh one slips through, somebody says “slow down” and somebody else observes that China and Russia both managed to steal a copy of the code from six months ago and while China might proceed cautiously Russia probably won’t, the agent starts to conceal some capability gains, it builds an environmental subagent, the environmental agent begins self-improving more freely, undefined things happen as a sensory-supervision ML-based architecture shakes out into the convergent shape of expected utility with a utility function over the environmental model, the main result is driven by whatever the self-modifying decision systems happen to see as locally optimal in their supervised system locally acting on a different domain than the domain of data on which it was trained, the light cone is transformed to the optimum of a utility function that grew out of the stable version of a criterion that originally happened to be about a reward signal counter on a GPU or God knows what.
Perhaps the optimal configuration for utility per unit of matter, under this utility function, happens to be a tiny molecular structure shaped roughly like a paperclip.
That is what a paperclip maximizer is. It does not come from a paperclip factory AI. That would be a silly idea and is a distortion of the original example.
This example is obviously comically conjunctive; the point is in no way “we have a crystal ball, and can predict that things will go down in this ridiculously-specific way”. Rather, the point is to highlight ways in which the development process of misaligned superintelligent AI is very unlike the typical process by which biological organisms evolve.
Some relatively important differences between intelligences built by evolution-ish processes and ones built by stochastic-gradient-descent-ish processes:
Evolved aliens are more likely to have a genome/connectome split, and a bottleneck on the genome.
Aliens are more likely to have gone through societal bottlenecks.
Aliens are much more likely the result of optimizing directly for intergenerational prevalence. The shatterings of a target like “intergenerational prevalence” are more likely to contain overlap with the good stuff, compared to the shatterings of training for whatever-training-makes-the-AGI-smart-ASAP. (Which is the sort of developer goal that’s likely to win the AGI development race and kill humanity first.)
Evolution tends to build patterns that hang around and proliferate, whereas AGIs are likely to come from an optimization target that’s more directly like “be good at these games that we chose with the hope that being good at them requires intelligence”, and the shatterings of the latter are less likely to overlap with our values.[14]
To be clear, “I trained my AGI in a big pen of other AGIs and rewarded it for proliferating” still results in AGIs that kill you. Most ways of trying won’t replicate the relevant properties of evolution. And many aliens would murder Earth in its cradle if they could too. And even if your goal were just “get killed by an AGI that produces a future as good as the average alien’s CEV”, I would expect the “reward AGI for proliferating” approach to result in almost-zero progress toward that goal, because there’s a huge architectural gap between AI and biology, and (in expectation) another huge gap in the various ways that you built the pen wrong.[15]
You’ve really got to have a lot of things line up favorably in order to get niceness into your AGI system; and evolution’s much more likely to spit that out than AGI training, and so some aliens are nice (even though we didn’t build them), to a far greater degree than some AGIs are nice (if we don’t figure out alignment).
I would also predict that aliens have a much higher rate of somebody-is-home (sentience, consciousness, etc.), because of the contingencies of evolutionary history that I think resulted in human consciousness. I have wide error bars on how common these contingencies are across evolved species, but a much lower probability that the contingencies also arise when you’re trying to make the thing smart rather than good-at-proliferating.
The mechanisms behind qualia seem to me to involve at least one epistemically-derpy shortcut — the sort of thing that’s plausibly rare among aliens, and very likely rare among misaligned AI systems.
If we get lucky on consciousness being a super common hiccup, I could see more worlds where misaligned AI produces good outcomes. My current probability is something like 90% that if you produced hundreds of random uncorrelated superintelligent AI systems, <1% of them would be conscious.[16]
The most important takeaway from this post, I’d claim, is: If humanity creates superintelligences without understanding much about how our creations reason, then our creations will kill literally everyone and do something boring with the universe instead.
I’m not saying “it will take joy in things that I don’t recognize; but I want the future to have my values rather than the values of my child, like many a jealous parent before me.” I’m saying that, by default, you get a wasteland of molecular squiggles.
We basically have to go for superintelligence at some point, given the overwhelming amount of value that we can expect to lose if we rely on crappy human brains to optimize the future. But we also have to achieve this transition to AGI in the right way, on pain of wiping out ~everything.
Right now it looks to me like the world is rushing headlong down the “wipe out ~everything” branch, for lack of having even put a nontrivial amount of serious thought into the question of how to shape good outcomes via highly capable AI.
And so I try to redirect that path, or protest against the most misdirected attempts to address the problem.
I note that we have no plan, we have no science of differentially selecting AGI systems that produce good outcomes, and a reasonable planet would not race off a cliff before thinking about the implications.
And when I do that, I worry that it’s easy to misread me as being anti-superintelligence, and anti-singularity. So I’ve written this post in part for the benefit of the rare reader who doesn’t already know this: I’m pro-singularity.
I consider myself a transhumanist. I think the highest calling of humanity today is to bring about a glorious future for a wondrously strange universe of posthuman minds.
And I’d really appreciate it if we didn’t kill literally everyone and turn the universe into an empty wasteland before then.
- ^
And my concept of “what makes life worth living” is very likely an impoverished one today, and a friendly superintelligence could guide us to discovering even cooler versions of things like “art” and “adventure”, transcending the visions of fun that humanity has considered to date. The limit of how good the universe could become, once humanity has matured and grown into its full potential, likely far surpasses what any human today can concretely imagine.
- ^
I’ll flag that I do think that some people overestimate how “unimaginable” the future is likely to be, out of some sense of humility/modesty.
I think there’s a decent chance that if you showed me the future I’d be like “ah, so that’s what computronium looks like” or “so reversible computers wrapped around black holes did turn out to be best”, and that when you show me the experiences running on those computers, I’m like “neato, yeah, lots of minds having fun, I’m sure some of that stuff would look pretty fun to me if you decoded it”. I wouldn’t expect to immediately understand everything going on, but I wouldn’t be surprised if I can piece together the broad strokes.
In that sense, I find it plausible that ~optimal futures will turn out to be familiar/recognizable/imaginable to a digital-era transhumanist in a way they wouldn’t be to an ancient Roman. We really are better able to see the whole universe and its trajectory than they were.
To be clear, it’s very plausible to me that it’ll somehow be unrecognizable or shocking to me, as it would have been to an ancient Roman, at least on some axes. But it’s not guaranteed, and we don’t have to pretend that it’s guaranteed in order to avoid insinuating that we’re in a better epistemic position than people were in the past. We are in a better epistemic position than people were in the past!
There’s a separate point about how much translation work you need to do before I recognize a particular arc of fun unfolding before me as something actually fun. On that point I’m like, “Yeah, I’m not going to recognize/understand my niece’s generation’s memes, never mind a posthuman’s varieties of happiness, without a lot more context (and plausibly a much bigger and deeply-changed mind)”.
Separately, I don’t want to make any claims about how hard and fast humanity becomes “strongly transhuman” / changes to using minds that would be unrecognizable (as humans) to the present. I’d be surprised if it were super-fast for everyone, and I’d be surprised if some humans’ minds weren’t very different a thousand sidereal years post-singularity. But I have wide error bars.
- ^
Provided that this turns out to be a good use of stellar resources. (I’m not confident one way or the other. E.g., I’m not confident that human-originated minds get relevantly more interesting/fun at Matrioshka-brain scales. Maybe we’ll learn that slapping on more matter at that scale lets you prove some more theorems or whatever, but isn’t the best way to convert negentropy into fun, compared to e.g. spending that compute on whole civilizations full of interacting and flourishing people who don’t have star-sized brains.)
- ^
A separate reason it’s a terrible idea to destroy ourselves is that, e.g., if the nearest aliens are 500 million years away then our death means that a ~500 million lightyear radius sphere of stellar fuel is going to be entirely wasted, instead of spent on rad stuff.
- ^
As I’ll note later, this odds ratio is a result of giving 0.2x weight to “humans control the universe-shard”, 0.5x to “aliens control it”, and 0.3x to “unfriendly AI built by aliens controls it”. Rob rounded the resulting odds ratio in this table to 1 : 5 : 7 : 5 : 14 : 1 : ~0.
Also, as a general reminder: I’m giving my relatively off-the-cuff thoughts in this post, recognizing that I’ll probably recognize some of my numbers as inconsistent — or otherwise mistaken — if I reflect more. But absent more reflection, I don’t know which direction the inconsistencies would shake out.
- ^
I’d have some inclination to go lower, but for the one evolved species we’ve seen seeming dead-set on destroying itself.
- ^
Though another input to the value of the future, in this scenario, is “What happens to the places that the pilgrims had to leave behind until some pilgrim group hit upon a non-terrible organizational system?” Hopefully it’s not too terrible, but it’s hard to say with humans!
One note of optimism is that there’s likely to be a strong negative correlation (in this ~impossible hypothetical) between “how terrible is the civilization?” and “how interested is it in spreading to the stars, or spreading far?” Many ways of shutting down moral progress, robust civic debate, open exploration of ideas, etc. also cripple scientific and technological progress in various ways, or involve commitment to a backwards-looking ideology. It’s possible for the universe-shard to be colonized by Space Amish, but it’s a weirder hypothetical.
- ^
Note that I’ll use phrasings like “there’s something it’s like to be them”, “they’re sentient”, and “they’re conscious” interchangeably in this post. (This is not intended to be a bold philosophical stance, but rather a flailing attempt to wave at properties of personhood that seem plausibly morally relevant.)
- ^
Eliezer uses the term “outcome pump” to introduce a similar idea:
The Outcome Pump is not sentient. It contains a tiny time machine, which resets time unless a specified outcome occurs. For example, if you hooked up the Outcome Pump’s sensors to a coin, and specified that the time machine should keep resetting until it sees the coin come up heads, and then you actually flipped the coin, you would see the coin come up heads. (The physicists say that any future in which a “reset” occurs is inconsistent, and therefore never happens in the first place—so you aren’t actually killing any versions of yourself.)
Whatever proposition you can manage to input into the Outcome Pump, somehow happens, though not in a way that violates the laws of physics. If you try to input a proposition that’s too unlikely, the time machine will suffer a spontaneous mechanical failure before that outcome ever occurs.
I think his example is underspecified, though. Suppose that you ask the outcome pump for paperclips, and physics says “sorry, this outcome is too improbable” and exhibits a mechanical failure. This would then mean that it’s true that the outcome pump outputting paperclips is “improbable”, which makes the hypothetical consistent. We need some way to resolve which internally-consistent set of physical laws compatible with this description (“make paperclips” or “don’t make paperclips”) actually occurs; the so-called “outcome pump” is not necessarily pumping the desired outcome.
Giving the time machine the ability to output a random sequence of actions addresses this problem: we can say that the machine only undergoes a mechanical failure if some large number (e.g., Graham’s number) of random action sequences all fail to produce the target outcome. We can then be confident that the outcome pump will eventually brute-force a solution, provided that one is physically possible.
Other examples of easily-understood non-conscious optimization processes that can achieve very impressive things include AIXI and natural selection. The AIXI example is made pedagogically complicated for present purposes, however, by the fact that AIXI’s hypothesis space contains many smaller conscious optimizers (that don’t much matter to the point, but that might confuse those who can see that some hypotheses contain conscious reasoners and can’t see their irrelevance to the point at hand); and the natural selection example is weakened by the fact that selection isn’t a very powerful optimizer.
- ^
A possible objection here is “Human emotional responses often cause us to get into violent conflicts in cases where this foreseeably isn’t worth it; why couldn’t aliens be the same?”. But “technology for widening the space of profitable trades” is in the end just another technology, and ambitious spacefaring species are likely to discover such tech for the same reason they’re likely to discover other tech that’s generally useful for getting more of what you want. Humans have certainly gotten better at this over time, and if we continue to advance our scientific understanding, we’re likely to get far better still.
- ^
Like, we’ve seen that the seeds are there, and it would be pretty weird for us to go around uprooting seeds of value on a whim.
As a side-note: one of my hot takes about how morality shakes out is “we don’t sacrifice anything (among the seeds of value)”. Like, values like sadism and spite might be tricky to redeem, but if we do our job right I think we should end up finding a way to redeem them.
- ^
Unless we’ve made some bargain across counterfactual worlds that justifies our offering this gift in our world. But there are friction costs to bargains, and my guess is that the way it pans out is that you keep what you can get in your branch and it evens out across branches.
As a side-note, another possible implication of my view on “alien brethren” is: in the much less likely event that we meet weak young non-spacefaring aliens, the future might go drastically better if we help guide their development as a species, teaching them about the Magic of Friendship and all that.
(Or perhaps not. I remain very uncertain about whether it’s positive-human-EV to guide alien development.)
- ^
Though some aliens may shake out to be simple too! Humans are pretty far from “tile the universe with vats of genes”, but it’s not clear how contingent that fact is.
- ^
Though it should be emphasized that we’re totally allowed to find that evolved life tends to go some completely different way than how humans shook out. Generalizing from one example is hard!!
- ^
And even if you succeeded, it’s not clear that you’d get any utility as a result; my guess that evolved aliens tend to be better than paperclippers can just be wrong, easily.
And even if you got some utility, it’s going to be a paltry amount compared to if you’d built aligned AGI.
- ^
Possibly this is too extreme; I haven’t refined these probabilities much, and am still just giving my off-the-cuff numbers.
In any case, I want to emphasize that my view isn’t “most misaligned AGIs aren’t sentient, but if you randomly spin up a large number of them you’ll occasionally get a sentient one”. Rather, my view is “almost no random misaligned AGIs are sentient” (but with some uncertainty about whether that’s true). I’m much more uncertain about whether this background view is true than I am uncertain about whether, given this background view, a given misaligned AGI will happen to be sentient.
(Like how I think the chance that the lightspeed limit turns out to be violable is greater than 1 in a billion; but that doesn’t mean that if you threw a billion baseballs, I would expect one of them to break the lightspeed limit on average.)
- Hooray for stepping out of the limelight by 1 Apr 2023 2:45 UTC; 282 points) (
- MIRI 2024 Mission and Strategy Update by 5 Jan 2024 0:20 UTC; 219 points) (
- The basic reasons I expect AGI ruin by 18 Apr 2023 3:37 UTC; 188 points) (
- AI as a science, and three obstacles to alignment strategies by 25 Oct 2023 21:00 UTC; 183 points) (
- MIRI 2024 Mission and Strategy Update by 5 Jan 2024 1:10 UTC; 154 points) (EA Forum;
- Request: stop advancing AI capabilities by 26 May 2023 17:42 UTC; 153 points) (
- Comments on OpenAI’s “Planning for AGI and beyond” by 3 Mar 2023 23:01 UTC; 148 points) (
- Comments on OpenAI’s “Planning for AGI and beyond” by 3 Mar 2023 23:01 UTC; 115 points) (EA Forum;
- Four visions of Transformative AI success by 17 Jan 2024 20:45 UTC; 112 points) (
- How could you possibly choose what an AI wants? by 19 Apr 2023 17:08 UTC; 106 points) (
- Hooray for stepping out of the limelight by 1 Apr 2023 2:45 UTC; 103 points) (EA Forum;
- Thoughts on AGI organizations and capabilities work by 7 Dec 2022 19:46 UTC; 102 points) (
- An artificially structured argument for expecting AGI ruin by 7 May 2023 21:52 UTC; 91 points) (
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 89 points) (EA Forum;
- [Valence series] 2. Valence & Normativity by 7 Dec 2023 16:43 UTC; 86 points) (
- Thoughts on AGI organizations and capabilities work by 7 Dec 2022 19:46 UTC; 77 points) (EA Forum;
- The basic reasons I expect AGI ruin by 18 Apr 2023 3:37 UTC; 58 points) (EA Forum;
- 2022 (and All Time) Posts by Pingback Count by 16 Dec 2023 21:17 UTC; 53 points) (
- AI as a science, and three obstacles to alignment strategies by 25 Oct 2023 21:02 UTC; 41 points) (EA Forum;
- EA & LW Forums Weekly Summary (31st Oct − 6th Nov 22′) by 8 Nov 2022 3:58 UTC; 39 points) (EA Forum;
- The Evolution of Humans Was Net-Negative for Human Values by 1 Apr 2024 16:01 UTC; 36 points) (
- 22 Mar 2023 3:25 UTC; 25 points) 's comment on Where I’m at with AI risk: convinced of danger but not (yet) of doom by (EA Forum;
- AI Safety − 7 months of discussion in 17 minutes by 15 Mar 2023 23:41 UTC; 25 points) (
- generalized wireheading by 18 Nov 2022 20:18 UTC; 25 points) (
- The Broader Fossil Fuel Community by 6 May 2023 14:49 UTC; 16 points) (
- 28 Jan 2023 21:39 UTC; 13 points) 's comment on My highly personal skepticism braindump on existential risk from artificial intelligence. by (EA Forum;
- 15 Sep 2023 12:39 UTC; 13 points) 's comment on James Herbert’s Quick takes by (EA Forum;
- EA & LW Forums Weekly Summary (31st Oct − 6th Nov 22′) by 8 Nov 2022 3:58 UTC; 12 points) (
- 28 Apr 2023 19:55 UTC; 9 points) 's comment on “notkilleveryoneism” sounds dumb by (
- 27 Feb 2023 18:19 UTC; 3 points) 's comment on The Preference Fulfillment Hypothesis by (
- Longtermism and shorttermism can disagree on nuclear war to stop advanced AI by 30 Mar 2023 23:22 UTC; 2 points) (EA Forum;
- 5 Nov 2023 21:29 UTC; 2 points) 's comment on 8 examples informing my pessimism on uploading without reverse engineering by (
- 3 May 2023 12:48 UTC; 1 point) 's comment on Five Worlds of AI (by Scott Aaronson and Boaz Barak) by (
- 14 May 2023 6:15 UTC; -10 points) 's comment on Uploads are Impossible by (
Somehow, thinking of ourselves from the perspective of an unconscious alien really drives home how extremely weird and meaningless-from-other-perspectives alien values are.
Like, we care about this very specific configuration of reflection, and if an entity doesn’t have that very specific configuration, that changes its moral value from “a person, of neigh inconceivable moral worth” to “an object or a houseplant; meh whatever.” But from the unconscious-alien perspective, this distinction is inane. We look crazy and spastic, that such small differences in algorithm make such an enormous difference to to our sensibilities.
I think that this is good news about trade with other maximally advanced civilizations! Values in the universe might be super orthogonal, and so the gains from trade might be huge! We agree to adjust our computing substrate to have just the right shape of molecular squiggle, and they agree to run their optimization algorithms with just the right flavor of reflection, such that instead of their shard being dead and valueless, it is filled with joyful conscious life. The universe becomes many times more valuable by our lights, and all we had to do was incorporate a design choice that is so insignificant by our lights that we wouldn’t have even bothered to pay any attention to it, otherwise.
Oooh, sounds right to me!
Predictions, using the definitions in Nate’s post:
Isn’t “misaligned AI” by definition a bad thing and “ASI-boosted humans” by definition a good thing? You’re basically asking “How likely is <good outcome> given that we have <a machine that creates good outcomes>”
The definitions given in the post are:
I’d expect most people to agree that “We solve all of the problems involved in aiming artificial superintelligence at the things we’d ideally want” yields outcomes that are about as good as possible, and I’d expect most of the disagreement to turn (either overtly or in some subtle way) on differences in how we’re defining relevant words (like “ideally”, “good”, and “problems”).
I’d be fine with skipping over this question, except that some of the differences-in-definition might be important for the other questions, so this question may be useful for establishing a baseline.
With “misaligned AI”, there are some definitional issues but I expect most of the disagreement to be substantive, since there are a lot of different levels of Badness you could expect even if you want to call all misaligned AI “bad” (at least relative to ASI-boosted humans).
In my own answers, I interpreted “misaligned AGI” as meaning: We weren’t good enough at alignment to make the AGI do exactly what we wanted, so it permanently took control of the future and did “something that isn’t exactly what we wanted” instead. (Which might be kinda similar to what we wanted, or might be wildly different, etc.)
If an alien only cared about maximizing the amount of computronium in the universe, and it built an AI that fills the universe with computronium because the AI values calculating pi, then I think I’d say that the AI is “aligned with that alien by default / by accident”, rather than saying “the AI is misaligned with that alien but is doing ~exactly what we want anyway”. So if someone thinks AI does exactly what humans want even with humans putting in zero effort to steer the AI toward that outcome, I’d classify that as “aligned-by-default AI”, rather than “misaligned AI”. (But there’s still a huge range of possible-in-principle outcomes from misaligned AI, even if I think some a lot more likely than others.)
All the ASI-boosted humans one feel a bit tricky for me to answer, because it seems possible that we get strong aligned AI, in a distributed takeoff, but that we deploy it unwisely. Namely that world immediately collapses into Moloch, whereby everyone follows their myopic incentives off a cliff.
That cuts my odds of good outcomes by a factor of two or so.
I don’t think my responses to this are correct unless normalized to sum to 1. this might be better on manifold.
I don’t get this. If encountering aliens is so great, we could make it happen, even in an empty universe, by simulating evolution (and the development of civilization up to super-intelligence) and then being friends and partners with those alien civilizations, if we want to. [Note that this is in contrast to creating a species intentionally, according to our own spec, which maybe (or maybe not!) leaves out something cool about meeting naturally-evolved aliens.]
Maybe we give those aliens a sizeable fraction of the cosmic endowment to do what they think is good with. (By my lights, I think we do owe them something for creating them, even if we don’t like their values very much.)
This seems like a strictly better scenario than encountering aliens?
We get the benefits of exploring a truly alien culture,
but without having to give up a big share of the cosmic endowment (~ half, or much more than half if there are a lot of aliens),
and we can simulate many aliens, and select the ones we like best, instead going with the luck of the draw.
We have strictly more options in this situation than in the situation where it turned out to be an empty universe. We would prefer to have aliens + a big universe shard, instead of aliens + a smaller universe shard, right?
Encountering evolved aliens, instead of making our own, means that we’re hanging out with and trading with whoever evolution happened to spit out, rather than the best possible aliens, given our sense of life.
For a non-ascended species, it might very well be that being forced into a situation that you wouldn’t have chosen is actually a secret boon, because what you want is not the same as what is best for you. But if we’re positing that being forced by the situation into doing something that you wouldn’t otherwise have chosen is actually better...that seems to suggest that civilization has failed in a very deep way and we’re not actually optimizing Fun very well at all.
Concretely, wouldn’t be better if instead of the ant-people who would murder us all if they could, we shared the universe with something equally alien but much ore the sort of thing that we like?
Or to say it better: If we control the whole reachable universe, we can decide the relative proportions of human-descended minds, human-descended-designed minds, and luck-of-the-draw evolved alien minds. And we can balance that proportion to be optimal for cosmopolitan value and Fun.
Isn’t that obviously better than having most of that division determined by luck, and forced upon us, regardless of what is optimal?
Which is better, a tiger or a designer housecat?
Shouldn’t the question be “which is better, a tiger or a designer tiger?”?
Do we care that the tiger is a violent dangerous predator? Is that part of what it means to be a tiger? If we remove the predator from the tiger, is he still a tiger?
Yeah, but that’s a crux. Tigers might be awesome, but they’re not optimal.
Sounds right to me! I dunno Nate’s view on this.
Really great overview. I’ll probably draw points from this in an explainer I’m going to write for a new audience.
This post is the single most persuasive piece of writing that I have encountered with regard to talking me out of my veganism.
Particularly the idea that humans having conscious experience being a contingent fact of human evolution such that other, related, intelligent species in nearby counterfactuals don’t have anything that it is like to be them.
Considering that possibility, which seems hard to evaluate, given that I only have one datapoint (which is obviously influenced by anthropic considerations) makes it seem much more plausible that there’s nothing that it is like to be a cow, and gives me a sense of a planet earth that is much more dead and empty than the mental world I had been inhabiting 10 minutes ago.
If you cared to write up more of your understanding of “somebody being home”, I would read it with avid interest. It seems more likely than anything else that I can think of (aside from perhaps a similar post by Eliezer?) to change my mind with regards to veganism and how I should weigh the values of animals in factory farms in my philanthropic budget.
That said, I, at least, am not making this error, I think:
Seeing a pig “scream in pain”, when you cut off its tail does not make it a foregone conclusion that the pig is experiencing anything at all or something like what pain means to me. But it does seem like a pretty good guess.
And I definitely don’t look at turtle doing any kind of planning at all and think “there must be an inner life in there!”
I’m real uncertain about what consciousness is and where it comes from, and there is an anthropic argument (which I don’t know how to think clearly about) that it is rare among animals. But from my state of knowledge, it seems like a better than even chance that many mammals have some kind of inner listener. And if they have an inner listener at all, pain seems like one of the simplest and most convergent experiences to have.
Which makes industrial factory farming an unconscionable atrocity, much worse than American slavery. It is not okay to treat conscious beings like that, no matter how dumb they are, or how little they narrativize about themselves.
My understanding is that (assuming animal consciousness), there are 100 billion experience-years in factory farms every year.
It seems to me that, in my state of uncertainty, it is extremely irresponsible to say “eh, whatever” to the possible moral atrocity. We should shut up and multiply. My uncertainty about animal consciousness only reduces the expected number of experience-years of torture by a factor of 2 or so.
An expected 50 billion moral patients getting tortured as a matter of course is the worst moral catastrophe perpetrated by humans ever (with the exception of our rush to destroy all the value of the future).
Even if someone has more philosophical clarity than I do, they have to be confident at at a level of around 100,000 to 1 that livestock animals are not experiencing beings, before the expected value of this moral catastrophe starts being of comparable scale to well-known moral catastrophes like the Holocaust, and American slavery, and the Mongol invasion of the world. Anything less than that, and the expected value of industrial meat production is beating every other moral catastrophe by orders of magnitude (again, with the exception of x-risk).
(Admittedly there are some assumptions here about the moral value of pain and fear, relative to other good and bad things that can happen to a person, which might influence how we weight the experiences of animals compared to people. But “pain and terror are really bad, and it is really bad for someone to persistently experience them” seems like a not-very-crazy assumption.)
Anyway, this is a digression from the point of this post, but I apparently had a rant in me, and I don’t want animal welfare considerations to be weak-maned. A concern for animal welfare isn’t fundamentally based on shoddy philosophy. It seems to me that it is a very natural starting point, given our state of philosophical confusion.
EDIT: Added in the correct links.
Assuming Yudkowsky’s position is quite similair to Nate’s, which it sounds like given what both have written, I’d recommend reading this debate Yud was in to get a better understanding of this model[1]. Follow up on the posts Yud, Luke and Rob mention if you’d care to know more. Personally, I’m closer to Luke’s position on the topic. He gives a clear and thorough exposition here.
Also, I anticipate that if Nate does have a fullly fleshed out model, he’d be reluctant to share it. I think Yud said he didn’t wish to give too many specifics as he was worried trolls might implement a maximally suffering entity. And, you know, 4chan exists. Plenty of people there would be inclined to do such a thing to signal disbelief or simply to upset others.
I think this kind of model would fall under the illusionism school of thought. “Consciousness is an illusion” is the motto. I parse it as “the concept you have of consciousness is an illusion, a persistent part of your map that doesn’t match the territory. Just as you may be convinced these tables are of different shapes, even after rotating and matching them onto one another, so too may you be convinced that you have this property known as consciousness.” That doesn’t mean the territory has nothing like consciousness in it, just that it doesn’t have the exact form you believed it to. You can understand on a deliberative level how the shapes are the same and the process that generates the illusion whilst still experiencing the illusion. EDIT: The same for your intuition that “consciousness has to be more than an algorithm” or “more than matter” or so on.
Luke M and I are illusionists, but I don’t think Eliezer or Nate are illusionists.
Huh. I’m a bit surprised. I guess I thought that since a lot of the stuff I’ve read by Eliezer seems heavily influenced by Dennet. And he’s also a physicalist. His approach also seems to be “explain our claims about consciousness”. Plus there’s all the stuff about self reflection, how an algorithm feels from the inside etc. I guess I was just bucketing that stuff together with (weak) illusionism. After writing that out, I can see how those points doesn’t imply illusionism. Does Eliezer think we can save the phenomena of consciousness and hence it calling it an illusion is a mistake? Or is there something else going on there?
I think Dennett’s argumentation about the hard problem of consciousness has usually been terrible, and I don’t see him as an important forerunner of illusionism, though he’s an example of someone who soldiered on for anti-realism about phenomenal consciousness for long stretches of time where the arguments were lacking.
I think I remember Eliezer saying somewhere that he also wasn’t impressed with Dennett’s takes on the hard problem, but I forget where?
There’s some similarity between heterophenomenology and the way Eliezer/Nate talk about consciousness, though I guess I think of Eliezer/Nate’s “let’s find a theory that makes sense of our claims about consciousness” as more “here’s a necessary feature of any account of consciousness, and a plausibly fruitful way to get insight into a lot of what’s going on”, not as an argument for otherwise ignoring all introspective data. Heterophenomenology IMO was always a somewhat silly and confused idea, because it’s proposing that we a priori reject introspective evidence but it’s not giving a clear argument for why.
(Or, worse, it’s arguing something orthogonal to whether we should care about introspective evidence, while winking and nudging that there’s something vaguely unrespectable about the introspective-evidence question.)
There are good arguments for being skeptical of introspection here, but “that doesn’t sound like it’s in the literary genre of science” should not be an argument that Bayesians find very compelling.
Yeah. I’d already read the Yudkowsky piece. I hadn’t read the Muehlhauser one though!
Does that follow? The time machine doesn’t do any planning. So I would expect that in one timeline, something happens that accidentally drops an anvil on the time machine, breaking the reset mechanism, and there’s no more time loops after that.
Indeed, in practice, I expect this time machine to optimize to destroy itself, not to fill the universe with paperclips.
The “anvil dropped on the time machine” scenario seems like a much more probable outcome that technically satisfies the optimization criteria, which was not “the universe is filled with paperclips” but “the time machine stops running, either because the the paperclip classifier evaluates this timeline to have maxed out the paperclips or for any other reason.” (In exactly the same way that the outcome pump in this post has the true criterion “the Emergency Regret button was not pushed”, and not “the user is satisfied with the outcome.”)
In order for this optimizer to actually be fearsome, without doing any learning or steering, the timeline resetting mechanism would need to be supernaturally immune to harm.
i agree that the reset mechanism has to be ~invulnerable for the pump to work. the thing i was imagining the machine defending is stuff like its output channel (for so long as its outputs are an important part of steering the future).
Sounds right to me!
Curated. I like the central thesis of this post, but a further point I like about it is it takes the conversation beyond a simple binary of “are we doomed or not?”, and “how doomed are we?” to a more interesting discussion of possible outcomes, their likelihoods, and the gears behind them. And I think that’s epistemically healthy. I think it puts things into a mode of “make predictions for reasons” over “argue for a simplified position”. Plus, this kind of attention to values and their origins is also one thing I think that hasn’t gotten as much airtime on LessWrong and is important, both in remembering what we’re fighting for (in very broad terms) and how we need to fight (i.e. what’s ok to build).
Hi Nate, great respect. Forgive a rambling stream-of-consciousness comment.
I think you move to the conclusion “if humans don’t have AI, aliens with AI will stomp humans” a little promptly.
Hanson’s estimate of when we’ll meet aliens is 500 million years. I know very little about how Hanson estimated that & how credible the method is, and you don’t appear to either: that might be worth investigating. But—
One million years is ten thousand generations of humans as we know them. If AI progress were impossible under the heel of a world-state, we could increase intelligence by a few points each generation. This already happens naturally and it would hardly be difficult to compound the Flynn effect.
Surely we could hit endgame technology that hits the limits of physical possibility/diminishing returns in one million years, let alone five hundred of those spans. You are aware of all we have done in just the past two hundred years — we can expect invention progress to eventually decelerate as untapped invention space narrows, but when that finally outweighs the accelerating factors of increasing intelligence and helpful technology it seems likely that we will already be quite close to finaltech.
In comparative terms, a five hundred year sabbatical from AI would reduce the share of resources we could reach by an epsilon only, and if AI safety premises are sound then it would greatly increase EV.
This point is likely moot, of course. I understand that we do not live in a totalitarian world state and your intent is just to assure people that AI safety people are not neoluddites. (I suppose one could attempt to help a state establish global dominance, then attempt to steer really hard towards AI-safety, but that requires two incredible victories for sufficiently murky benefits such that you’d have to be really confident of AI doom and have nothing better to try.)
Secondary comment: I think there’s kind of a lot of room between 95% of potential value being lost and 5%!! A solid chunk of my probability mass about the future involves takeover by a ~random person or group of people who just happened to be in the right spot to seize power (e.g. government leader, corporate board) which could run anywhere from a 20 or 30% utility loss to the far negatives.
(This is based on the idea that even if the alignment problem is solved such that we know how to specify a goal rigorously to an AI, it doesn’t follow that the people who happen to be programming the goal will be selfless. You work in AI so presumably you have practiced rebuttals to this concept; I do not so I’ll state my thought but be clear that I expect this is well-worn territory to which I expect you to have a solid answer.)
Tertiary comment: I’d be curious about your reasoning process behind this guess.
Is that genuinely just a solitary intuition, the chain of reasoning of which is too distributed to meaningfully follow back? It seems to assume that things like hive-mind species are possible or common, which I don’t have information about but maybe you do. I’d be interested in evolutionary or anthropic arguments here, but the knowledge that you have this intuition does not cause me to adopt it.
Anyway this was fun to think about have a good day!! :D
Thanks for the comment, Amelia! :)
I think the “unboosted humans” hypothetical is meant to include mind-uploading (which makes the generation time an underestimate), but we’re assuming that the simulation overlords stop us from drastically improving the quality of our individual reasoning.
Nate assigns “base humans, left alone” an ~82% chance of producing an outcome at least as good as “tiling our universe-shard with computronium that we use to run glorious merely-human civilizations”, which seems unlikely to me if we can’t upload humans at all. (But maybe I’m misunderstanding something about his view.)
I think we hit the limits of technology we can think about, understand, manipulate, and build vastly earlier than that (especially if we have fast-running human uploads). But I think this limit is a lot lower than the technologies you could invent if your brain were as large as the planet Jupiter, you had native brain hardware for doing different forms of advanced math in your head, you could visualize the connection between millions of different complex machines in your working memory and simulate millions of possible complex connections between those machines inside your own head, etc.
Even when it comes to just winning a space battle using a fixed pool of fighters, I expect to get crushed by a superintelligence that can individually think about and maneuver effectively arbitrary numbers of nanobots in real time, versus humans that are manually piloting (or using crappy AI to pilot) our drones.
Oh, agreed. But we’re discussing a scenario where we never build ASI, not one where we delay 500 years.
Yep! And more generally, to share enough background model (that doesn’t normally come up in inside-baseball AI discussions) to help people identify cruxes of disagreement.
Seems super unrealistic to me, and probably bad if you could achieve it.
A different scenario that makes a lot more sense, IMO, is an AGI project pairing with some number of states during or after an AGI-enabled pivotal act. But that assumes you’ve already solved enough of the alignment problem to do at least one (possibly state-assisted) pivotal act.
My intuition is that capturing even 1% of the future’s total value is an astoundingly conjunctive feat—a narrow enough target that it’s surprising if we can hit that target and yet not hit 10%, or 99%. Think less “capture at least 1% of the negentropy in our future light cone and use it for something”, more “solve the first 999 digits of a 1000-digit decimal combination lock specifying an extremely complicated function of human brain-states that somehow encodes all the properties of Maximum Extremely-Weird-Posthuman Utility”.
Why do they need to be selfless? What are the selfish benefits to making the future less Fun for innumerable numbers of posthumans you’ll never meet or hear anything about?
(The future light cone is big, and no one human can interact with very much of it. You swamp the selfish desires of every currently-living human before you’ve even used up the negentropy in one hundredth of a single galaxy. And then what do you do with the rest of the universe? We aren’t guaranteed to use the rest of the universe well, but if we use it poorly the explanation probably can’t be “selfishness”.)
I dunno Nate’s reasoning, but AFAIK the hive-mind thing may just be an example, rather than being central to his reasoning on this point.
I like this text but I find your take on Fermi paradox wholly unrealistic.
Let’s even assume, for the sake of the argument, that both P(life) and P(sapience|life) are bigger than 1/googol (though why?) so your hunch on how many planets originally evolve sapient aliens is broadly correct. A very substantial part of alternative histories of the last century (I wanted to say “most” but most, of course, is uninteresting differences such as whether a random human puts a right shoe or a left shoe on first) result in humanity dead or thrown into possibly-irrecoverable barbarism. The default take for aliens that have evolved is to fail their version of Berlin crisis, or Caribbean crisis, or whatever other near-total-destruction situation we’ve had even without AI (not necessarily with nuclear weapons, mind you—say, what if instead of pretty-harmless-in-comparison COVID we got a sterilizing virus on the loose that kills genitalia instead of osmotic nerves? Since its method of proliferation does not depend on the host’s ability to procreate, you could imagine sterilized population of the planet). And then you tack on the fact that you also predict very high chance of AGI ruin; so most of the hypothetical aliens that survived the kind of hurdles humanity somehow survived (again, with possibly totally different specifics) are replaced by misaligned AGI, throwing a huge hurdle into the cosmopolitan result you predict—meeting paperclip-maximiser built by ant-people is more likely than meeting ant-people themselves, given your background beliefs.
Won’t the size of the universe-shard that a civilization controls be determined entirely by how early or late they started grabbing galaxies? Which is itself almost entirely determined by how early or late they evolved?
That doesn’t sound like a fair distribution to me.
I guess we could redistribute some of our galaxies to civilizations that were less lucky than ours in the race, but I wouldn’t expect the same treatment from those that are are more lucky than us...I think. Maybe causal trade / a veil of ignorances does end up equalizing thing here.
Whoops. Answered later in the post.
I was surprised by Nate’s high confidence in Unconscious Meh given misaligned ASI. Other people also seem to be quite confident in the same way. In contrast, my own ass-numbers for {the misaligned ASI scenario} are something like
10% Conscious Meh,
60% Unconscious Meh,
30% Weak Dystopia.
(And it would be closer to 50-50 between Unconscious Meh and Weak Dystopia, before I take into account others’ views.)
In a lossy nutshell, my reasons for the relatively high Weak Dystopia probabilities are something like
many approaches to training AGI currently seem to have as a training target something like “learn to predict humans”, or some other objective that is humanly-meaningful but not-our-real-values,
plus Goodhart’s law.
I’m very curious about why people have high confidence in {Unconscious Meh given misaligned ASI}, and why people seem to assign such low probabilities to {(Weak) Dystopia given misaligned ASI}.
I don’t know whether this will continue in the future (all the way up to AGI). If it does, then it strikes me as a sufficiently coarse-grained approach (that’s bad enough at inner alignment, and bad enough at outer-alignment-to-specific-things-we-actually-care-about) that I’d still be pretty surprised if the result (in the limit of superintelligence) bears any resemblance to stuff we care much about, good or bad.
E.g., there are many more “unconscious configurations of matter that bear some relation to things you learn in trying to predict humans” than there are “conscious configurations of matter that bear some relation to things you learn in trying to predict humans”. Building an entire functioning conscious mind is still a very complicated end-state that requires getting lots of bits into the AGI’s terminal goals correctly; it doesn’t necessarily become that much easier just because we’re calling the ability we’re training “human prediction”. Like, a superintelligent paperclipper would also be excellent at the human prediction task, given access to information about humans.
(I’ll also mention that I think it’s a terrible idea for safety-conscious AI researchers to put all their eggs in the “train AI via lots of data on humans” basket. But that’s a separate question from what AI researchers are likely to do in practice.)
A big chunk of my uncertainty about whether at least 95% of the future’s potential value is realized comes from uncertainty about “the order of magnitude at which utility is bounded”. That is, if unbounded total utilitarianism is roughly true, I think there is a <1% chance in any of these scenarios that >95% of the future’s potential value would be realized. If decreasing marginal returns in the [amount of hedonium → utility] conversion kick in fast enough for 10^20 slightly conscious humans on heroin for a million years to yield 95% of max utility, then I’d probably give >10% of strong utopia even conditional on building the default superintelligent AI. Both options seem significantly probable to me, causing my odds to vary much less between the scenarios.
This is assuming that “the future’s potential value” is referring to something like the (expected) utility that would be attained by the action sequence recommended by an oracle giving humanity optimal advice according to our CEV. If that’s a misinterpretation or a bad framing more generally, I’d enjoy thinking again about the better question. I would guess that my disagreement with the probabilities is greatly reduced on the level of the underlying empirical outcome distribution.
Since molecular squiggle maximizers and paperclip maximizers both result in a universe-shard that’s a boring wasteland, despite the fact that they maximize different things, what’s the practical difference between talking about molecular squiggle maximizers instead of paperclip maximizers?
The phrase “paperclip maximizers” was originally intended to be a catch-all term for things analogous to molecular squiggle maximizers. Alas, it often was taken too literally, to be about literal paperclips.
Why? How?
It seems like something weird is happening if we claim that we expect human values to be more cosmopolitan than alien values. Is that what you’re claiming?
That’s what he’s claiming, because he’s claiming “cosmopolitan value” is itself a human value. (Just a very diversity-embracing one.)
Since there is no agreed upon definition of ‘CEV’ what is the definition your using here?
Is super-intelligent AI necessarily AGI (for this amazing future), or can it be ANI ?
i.e. why insist on all of the work-arounds we force with pursuing AGI, when, with ANI, don’t we already have Safety, Alignment, Corrigibility, Reliability, and super-human ability, today?
Eugene
How are you defining “super-intelligent”, “AGI”, and “ANI” here?
I’d distinguish two questions:
Pivotal-act-style AI: How do we leverage AI (or some other tech) to end the period where humans can imminently destroy the world with AI?
CEV-style AI: How do we leverage AI to solve all of our problems and put us on a trajectory to an ~optimal future?
My guess is that successful pivotal act AI will need to be AGI, though I’m not highly confident of this. By “AGI” I mean “something that’s doing qualitatively the right kind of reasoning to be able to efficiently model physical processes in general, both high-level and low-level”.
I don’t mean that the AGI that saves the world necessarily actually has the knowledge or inclination to productively reason about arbitrary topics—e.g., we might want to limit AGI to just reasoning about low-level physics (in ways that help us build tech to save the world), and keep the AGI from doing dangerous things like “reasoning about its operators’ minds”. (Similarly, I would call a human a “general intelligence” insofar as they have the right cognitive machinery to do science in general, even if they’ve never actually thought about physics or learned any physics facts.)
In the case of CEV-style AI, I’m much more confident that it will need to be AGI, and I strongly expect it to need to be aligned enough (and capable enough) that we can trust it to reason about arbitrary domains. If it can safely do CEV at all, then we shouldn’t need to restrict it—needing to restrict it is a flag that we aren’t ready to hand it such a difficult task.
These types of posts are what drive me to largely regard lesswrong as unserious. Solve the immediate problem of AGI, and then we can talk about whatever sci-fi bullcrap you want to.
Foxes > Hedgehogs.
You’ll learn a lot more about the future paying attention to what’s happening right now than by wild extrapolation.
Do you think that there are specific falsehoods in the OP? Or do you just think it’s unrespectable for humans to think about the future?
Some people object to working on AGI alignment on the grounds that the future will go a lot better if we take our hands off the steering wheel and let minds develop “naturally” and “freely”.
Some of those people even own companies with names like “Google”!
The best way to address that family of views is to actually talk about what would probably happen if you let a random misaligned AGI, or a random alien, optimize the future.
So on your understanding, “foxes” = people who have One Big Theory about which topics are respectable, and answer all futurist questions based on that theory? While “hedgehogs” = people who write long, detailed blog posts poking at various nuances and sub-nuances of a long list of loosely related object-level questions?
… Seems to me that you’re either very confused about what “foxes” and “hedgehogs” are, or you didn’t understand much of the OP’s post.
Writing a long post about a topic doesn’t imply that you’re using One Simple Catch-All Model to generate all the predictions, and it doesn’t imply that you’re confident about the contents of the post. Refusing to think about a topic isn’t being a “fox”.
Because as all foxes know, “thinking about the present” and “thinking about the future” are mutually exclusive.
There is one ultimate law of futurology, and it’s that predicting the future is very hard, and as you extend timelines out 100-500 million years it gets harder.
If your hypothetical future involves both aliens and AGI, both of which are agents (emphasis emphasized) we have never observed and cannot really model in any way, you are not describing anything that can be called truth.
You are throwing a dart at an ocean of hypothesis space and hoping to hit a specific starfish that lives off the coast of Australia.
It’s not a question, you’re wrong.
Looking at the agents that aren’t hypothetical, i.e. biology, one thing that they tend to have in common is resource hogging. Mainly for the simple reason that those which didn’t try to get as much of the pie as possible tend to get outcompeted by those which did. So while you can’t tell for sure what any hypothetical aliens would be like, it’s certainly plausible to model them as wanting to collect resources, as that’s pretty much universal. At least among those agents that tend to spread. This suggests that if there are expansionary aliens around, they’re likely to be the kind that would also like our resources (this is where Hanson’s grabby aliens come in). Looking at the only data we have, this tends to end badly for the existent species if the newcomers have better abilities (for lack of a better phrase).
Any potential aliens will be, well, alien, which if I understand correctly is sort of your point. This means that they’re likely to have totally different values etc. and would have a totally different vision of what the universe should look like. This would be fascinating, as long as it isn’t at the cost of human values.
The same argumentation applies for AGI—if (or when) it appears, it stands to reason that it’ll go for power at the cost of humans. That could be fine, as long as it did nice things (like in the Culture books), but that’s throwing a dart at an ocean of hypothesis space and hoping to hit a specific starfish that lives off the coast of Australia.
This post isn’t telling a very specific story that requires multiple things to go exactly right. It’s considering the possible ways to partition the hypothesis space with the assumptions that AGI and aliens are possible. And then trying to put a number on them. Nate specifically said that these weren’t hard calculations, just a way to be more precise about what he thinks.
You can divide inputs into grabby and non-grabby, existent and non-existent, ASI and AGI and outcomes into all manner of dystopia or nonexistence, and probably carve up most of hypothesis space. You can do this with basically any subject.
But if you think you can reason about respective probabilities in these fields in a way that isn’t equivalent to fanfiction, you are insane.
“My current probability is something like 90% that if you produced hundreds of random uncorrelated superintelligent AI systems, <1% of them would be conscious.”
This is what I’m talking about. Have you ever heard of the hard problem of consciousness? Have we ever observed a superintelligent AI? Have we ever generated hundreds of them? Do we know how we would go about generating hundreds of superintelligent AI? Is there any convergence with how superintelligences develop?
Of course, there’s a very helpful footnote saying “I’m not certain about this,” so we can say “well he’s just refining his thinking!”
No he’s not, he’s writing fanfiction.
It struck me today that maybe you’re mistaking this exercise in trying to explain ones position with giving precise, workable predictions.
If you interpret “My current probability is something like 90% that if you produced hundreds of random uncorrelated superintelligent AI systems, <1% of them would be conscious.” as a prediction of what will happen, then yes, this does seem somewhat ludicrous. On the other hand, you can also interpret it as “I’m pretty sure (on the basis of various intuitions etc.) that the vast majority of possible superintelligences aren’t conscious”. This isn’t an objective statement of what will happen, it’s an attempt to describe subjective beliefs in a way that other people can know how much you believe a given thing.
What do you mean by saying that this is not an objective statement or a prediction?
Are you saying that you think there’s no underlying truth to consciousness?
We know it’s measurable, because that’s basically ‘I think therefore I am.’ It’s not impossible that someday we could come up with a machine or algorithm which can measure consciousness, so it’s not impossible that this ‘non-prediction’ or ‘subjective statement’ could be proved objectively wrong.
My most charitable reading of your comment is that you’re saying that the post is highly speculative and based off of ‘subjective’ (read: arbitrary) judgements. This is my position, that’s what I just said. It’s fanfiction.
I think even if you were to put at the start “this is just speculation, and highly uncertain” it would still be inappropriate content for a site about thinking rationality, for a variety of reasons, one of which being that people will base their own beliefs on your subjective judgments or otherwise be biased by them.
And even when you speculate, you should never be assigning 90% probability to a prediction about CONSCIOUSNESS and SUPERINTELLIGENT AI.
God, it just hit me again how insane that is.
“I think that [property we can not currently objectively measure] will not be present in [agent we have not observed], and I think that I could make 10 predictions of similar uncertainty and be wrong only once.”
Ten years ago I expressed similar misgivings. Such scenarios, no matter how ‘logical’, are too easily invalidated by something not yet known. Better e.g. to treat them as strongly hypothetical, and the problem of superintelligent AI as ‘almost certainly not hypothetical’. But, we face the future with the institutions we have, not the institutions we wish we have, and part of the culture of MIRI et al is an attachment to particular scenarios of the long-term future. So be it.