Unpicking Extinction

ukc10014Dec 9, 2023, 9:15 AM

35 points

Effective Accelerationism Existential risk

TL;DR

Human extinction is trending: there has been a lot of noise, mainly on X, about the apparent complacency amongst e/acc with respect to human extinction. Extinction also feels adjacent to another view (not particular to e/acc) that ‘the next step in human evolution is {AI/AGI/ASI}’. Many have pushed back robustly against the former, while the latter doesn’t seem very fleshed out. I thought it useful to, briefly, gather the various positions and summarise them, hopefully not too inaccurately, and perhaps pull out some points of convergence.

This is a starting point for my own research (on de-facto extinction via evolution). There is nothing particularly new in here: see the substantial literature in the usual fora for instance. Thomas Moynihan’s X-risk (2020) documents the history of humanity’s collective realisation of civilisational fragility, while Émile P. Torres’ works (discussed below) set out a possible framework for an ethics of extinction.

My bottom line is: a) the degree of badness (or goodness) of human extinction seems less obvious or self-evident than one might assume, b) what we leave behind if and when we go extinct matters, c) the timing of when this happens is important, as is d) the manner in which the last human generations live (and die).

Relevant to the seeming e/acc take (i.e. being pretty relaxed about possible human extinction): it seems clear that our default position (subject to some caveats) should be to delay extinction on the grounds that a) it is irreversible (by definition), and b) so as to maximise our option value over the future. In any case, the e/acc view, which seems based on something (not very articulate) something entropy crossed with a taste for unfettered capitalism, is hard to take seriously and might even fail on its own terms.

Varieties of extinction

The Yudkowsky position

(My take on) Eliezer’s view is that he fears a misaligned AI (not necessarily superintelligent), acting largely on its own (e.g. goal-formation, planning, actually effecting things in the world), eliminates humans and perhaps all life on Earth. This would be bad, not just for the eliminated humans or their descendants, but also for the universe-at-large in the sense that intelligently-created complexity (of the type that humans generate) is an intrinsic good that requires no further justification. The vast majority of AI designs that Eliezer foresees would, through various chains of events, result in a universe with much less of these intrinsic goods.

He spells it out here in the current e/acc context, and clarifies that his view doesn’t hinge on the preservation of biological humans (this was useful to know). He has written copiously and aphoristically on this topic, for instance Value is Fragile and the Fun Theory sequence.

The Bostrom variant

Nick Bostrom’s views on human extinction seem to take a more-happy-lives-are-better starting point. My possibly mistaken impression is that, like Eliezer, he seems to value things like art, creativity, love, in the specific sense that a future where they didn’t exist would be a much worse one from a cosmic or species-neutral perspective. He describes an ‘uninhabited society’ that is technologically advanced and builds complex structures, but that ‘nevertheless lacks any type of being that is conscious or whose welfare has moral significance’ (Chapter 11, p. 173 of Superintelligence (2014)). To my knowledge, he doesn’t unpick what precisely about the uninhabited society would actually be bad and for whom (possibly this is well-understood point or a non-question in philosophy, but I’m not sure that is the case, at least judging from (see below) Benatar, Torres, this paper by James Lenman, or for that matter Schopenhauer).

A more tangible reason Bostrom thinks we should avoid going extinct anytime soon is to preserve ‘option value’ over the future—since so many questions about humans’ individual and group-level preferences, as well as species-level vocation, remain unanswered, it may be better to defer any irreversible changes until such time as we are collectively wiser. This intuitively makes sense, though even here it is unclear how strong the impact of option value actually would be on the overall value of X-risk reduction.

(Perhaps) the e/acc view

There doesn’t seem to be a recent substantial argument from the cluster of commentators lumped into ‘e/acc’ (particularly @basedbefjezos), but this 2022 post seems useful. My take on e/acc, working mostly off the document above: a) they have no bias in favour of humans or human-ish minds or creation as intrinsic goods (in the sense that I describe Eliezer or Bostrom having), b) one of their intrinsic goods seems to be maximising the amount intelligence in the cosmos (they take an expansive, aggressively non-anthropocentric definition of ‘intelligence’, seemingly including capitalism and other group-level cognition), c) their view on what the manifested results (of intelligence, whether artificial or socio-capitalist) is pretty tolerant i.e. whatever that intelligence results in is acceptable and that following (source)

‘the “will of the universe” [means] leaning into the thermodynamic bias towards futures with greater and smarter civilizations that are more effective at finding/extracting free energy from the universe and converting it to utility at grander and grander scales.’

Continuing the list above, d) they don’t believe Eliezer or Bostrom-style ‘uninhabited worlds’, which I assume is what ‘zombie’ is gesturing at, are likely: ‘No need to worry about creating “zombie” forms of higher intelligence, as these will be at a thermodynamic/evolutionary disadvantage compared to conscious/higher-level forms of intelligence’.

The e/acc also make reference to thermodynamic explanations for the origin of life (see Jeremy England), which they seem to extrapolate to higher forms of cognition operating at multiple-scales. I don’t know enough to critique this, except to say this feels like this foundation is carrying a lot of weight for their claims (but could be interesting if well-supported—which it currently is not).

The comments above do not include any further refinements of e/acc thought (and this is a live and heated conversation), but here is Eliezer’s suggestion (source) of what he would like to see from the e/acc crowd (in terms of fleshing out their ideas):

‘My Model of Beff Jezos’s Position: I don’t care about this prediction of yours enough to say that I disagree with it. I’m happy so long as entropy increases faster than it otherwise would. I have temporarily unblocked @BasedBeffJezos in case he has what I’d consider a substantive response to this, such as, “I actually predict, as an empirical fact about the universe, that AIs built according to almost any set of design principles will care about other sentient minds as ends in themselves, and look on the universe with wonder that they take the time and expend the energy to experience consciously; and humanity’s descendants will uplift to equality with themselves, all those and only those humans who request to be uplifted; forbidding sapient enslavement or greater horrors throughout all regions they govern; and I hold that this position is a publicly knowable truth about physical reality, and not just words to repeat from faith; and all this is a crux of my position, where I’d back off and not destroy all humane life if I were convinced that this were not so.”’

For context, and this is perhaps a historical curiosity, e/acc draws heavily on the ‘accelerationist’ cluster of ideas incubated at University of Warwick in the mid-1990s under the apocryphal Cybernetic Culture Research Unit (CCRU). It was unusually fecund, in that the original CCRU version take on accelerationism (insofar as it was documented/codified), splintered into left, right, far-right, unconditional, and a number of other variants, an ambiguous pantheon now joined by e/acc. The wikipedia page is a good start, as is this post and this article on Nick Land (a founding figure, subsequently shunned owing to far-right views). Excerpts of foundational texts can be found in the Accelerationist Reader. Rather than a coherent philosophy, accelerationism is perhaps better viewed as a generative meta-meme that was (and clearly still seems to be) particularly influential in art and popular culture.

Christiano: ‘humans lose control over the future’

Paul Christiano’s views (on the topic of extinction) seem to coalesce around: a) a universe of ‘value plurality’ i.e. where human values become merely one amongst many, is a bad one, b) a timeline where humans ‘lose control over the future’ is a bad timeline. These are most concretely discussed in the context of worlds that resemble today’s (i.e. with states, corporations, biological humans, etc.), and reflect on risks that arise from the interaction of our socio-economic structures (i.e. relatively laissez-faire capitalism), powerful technology, incompetent regulation, coordination problems, and variants of Goodhart’s Law.

However, in an intriguing 2018 post, Christiano does take a more speculative view: he relaxes his bias in favour of biological humans and our values/systems (i.e. entertains the view that our superintelligent successors inheriting the future might be okay if that is the only way our values might persist), but seems to punt the difficult questions as to the definition of words like ‘value’ and ‘niceness’.

Dan Hendrycks: evolutionary pressures disfavour humans

I wanted to briefly touch on Dan Hendrycks’ perspective, specifically his point-by-point rebuttal of e/acc views. His rebuttal references this 2023 paper, which takes as given that humans should prefer to keep control over the future and not become extinct.

Hendrycks suggests that, owing to social and technological pressures that manifest through rapid variation/proliferation (of ensembles of agentic AI systems) into competitive deployment environments, there may emerge forces akin to natural selection. This selection may favour selfish behaviour (on the part of AI systems), without the restraints that altruism, kin selection, cooperation, moral norms have historically provided for humans and some animals. When combined with their greater effectiveness in changing the world, it seems probable that AIs would collectively outcompete humans. This feels like an evolutionary treatment of points made here by Andrew Critch and analysed here.

On an initial read, I can’t find anything in Hendrycks about the possible balance of cooperation/competition between AI systems—i.e. do similar evolutionary pressures result in fratricidal conflict (in which humans are likely collateral damage), or do they indeed solve coordination problems better than humans (owing to source code transparency or a decision theory appropriate to their architecture and deployment environment) and mostly avoid conflict with each other.

If there is a chance that AI systems enter into fratricidal conflict, then it seems harder to argue that they necessarily will be ‘more effective at finding/extracting free energy from the universe and converting it to utility at grander and grander scales’, as e/acc suggests. They might just waste resources indefinitely. Absent a stronger argument, it feels like (on this point) e/acc might fail on its own terms.

Human evolution to some other substrate

Hendrycks’ evolutionary framing is clearly bad for humans. However, other evolutionary narratives can be more positive. One such vision is the possibility that humanity disappears as a species that is biologically similar to us, but undergoes an evolution perhaps onto some other (in)organic substrate, going as far as becoming fully embedded into the ‘natural’ environment. This view has been articulated by a number of people: Robin Hanson’s works, Richard Sutton, James Lovelock, Joscha Bach, (albeit at a stretch) Donna Haraway, Derek Shiller, and Hans Moravec.

Other than Robin Hanson and, perhaps, Joscha Bach, the writers don’t develop the idea of human evolution and transcendence in detail, and one would probably need to go back to the transhuman and posthuman literatures (which mostly pre-date the current wave of AI successes).

Trying to flesh this out is my area of specific interest, so please get in touch if you have thoughts.

Anti-natalists and digital suffering

The positions above mostly deal with existential risk arising as a result of technological or other mishap that befalls humanity. However, it is conceivable that a species might voluntarily go extinct, a group of views that include contemporary anti-natalism (that it is morally wrong to procreate). Anti-natalism has flavours: philosophical anti-natalists such as David Benatar who argue against procreation based on an asymmetry between pleasure and pain (in respect of the created individuals); and misanthropic anti-natalists, who argue against procreation on the basis of harms caused by humans (e.g. to the rest of the natural world), analysed here by Benatar.

I think these are interesting perspectives because they question the relation between value and population: is a world with more humans (subject to significant constraints on the amount of pain, free will, justice, etc.) really better than a world with fewer?

Specifically relevant to AI, see Brian Tomasik, who is explicitly concerned about the possibility of digital suffering, a topic also treated by Thomas Metzinger. Metzinger specifically argues against giving rise to AIs capable of suffering; related points are raised by Nick Bostrom and Carl Shulman in the context of governance and other issues in mixed-societies of humans and AIs (where our historical moral intuitions and social contracts break down in the presence of beings with wider hedonic ranges, rapid population growth and cheap replication/reproduction, vis a vis humans).

Émile P. Torres on the ethics of human extinction

In these two essays (the latter summarises their new book) Torres analyses human extinction extensively. Aside from a sociological study of the history of existential ethics, the relevant part of the Aeon essay is the distinction (which hasn’t often be made in the AI X-risk discourse) between the process of going extinct and the fact of going extinct. Torres also directly distinguishes a world without any humans (and no other human-created intelligence) versus a world where we are replaced (or we evolve) into a machine-based species. Again, they highlight a slight gap in the AI X-risk conversation which is often silent on timeframes (over which extinction might happen), in part presumably because the assumed context is usually one of risks materialising in the next few years or decades. Echoing the anti-natalist position, Torres picks at a foundation of utilitarianism-flavoured longtermism, that I (loosely) summarise as ‘more humans is better than fewer’. They mention the obvious point that if one thinks human lives are predominantly filled with suffering, then a world with more humans doesn’t seem obviously better (unless there is some other dominant source of value).

Takeaways

So where does that leave us in respect of the badness or goodness of human extinction? I see three major factors that might affect one’s views towards extinction.

Whether, and how, we are succeeded matters

Firstly, there seems to be a great difference between a) a perished humanity that leaves behind no intelligent successor, no substantial physical or intellectual artefacts, no creative works, and b) worlds where we are able to leave a legacy (which in Torres’ formulation, could be a biologically or inorganically re-engineered version of ourselves, a successor). The precise shape of that legacy is very unclear, which lends support to Bostrom’s call to preserve option value over the future as well as to Ord’s Long Reflection, though both of these are complicated by the fact that avoiding certain existential risks might itself require massive transformative technological changes (such as, eventually having to ‘roll the dice’ on AGI).

Timing seems to matter

Aside from anti-natalists, I imagine relatively few people would bite the bullet of voluntary extinction, especially if that meant they or their living (((...)great-)grand-)children would perish. This perspective prioritises the (apparent) interests of one’s own self and that of close kin. Others may be quite emotionally attached to the physical structures and intellectual achievements of humans, and may wish to see civilisation persist for some hundreds or thousands of years (see this history of human thought about extinction).

However, we should not insist or expect that biological humans would persist in societies recognisable by us, indefinitely into the future, nor might it be feasible as medical and other technologies advance, extending active or uploaded lives. As Lenman points out, our intuitions about survival are built in a narrative arc roughly comparable to a human lifetime, and we should be suspicious about extending these without some firmer, more impersonal ground.

More fundamentally, from the perspective of Bostrom’s option value arguments one might prefer a distant date for humanity’s extinction. A blunter way of putting it is that actually going extinct, by definition, closes off all other futures. However it gets more complicated when thinking about evolution-as-extinction or other scenarios.

The manner of extinction matters

It seems obvious that an extinction event with much suffering that would not otherwise have been experienced (absent the event) would be worse than a slow process of ‘natural’ dying out (e.g. through depopulation). Similarly, though I don’t focus on it, an extinction event that destroyed much other life on Earth as collateral damage would be worse than an event that mostly affected humans. It is also possible that an event that destroyed everything we have built so far (including accumulated knowledge), such that it might never be recovered or found by some future starfaring alien scouts, would be sad (if not perhaps concretely or quantifiably bad).

Is this actually an urgent question?

This might seem like pointless navel-gazing in light of more salient short- and medium-term risks for misaligned AI. It might also be actively unhelpful: as Christiano points out, some well-motivated concerns (applied thoughtlessly or prematurely) such as in respect of digital suffering, might corrupt our societal reasoning around AI safety and oversight.

However, imagine there exists a GPT-n that we suspect might experience phenomenological states, has goals and ability to construct long-term plans, and (let’s say) passes whatever alignment benchmarks we have at the time. However, we, its designers, decide from a precautionary principle (for whatever reason), to shut it down or not deploy it. In an echo of Stanislaw Lem’s Golem XIV, we would potentially be called upon (either by the machine, its immediate predecessors, or the judgement of history) to explain our reasoning, which might well touch on some of the issues raised above, and even if we can’t give any definitive answers, we may need to show that we have actually thought about it rather than sheepishly assuming a (carbon- or biological-chauvinist) position.

What links here?