The Sun is big, but superintelligences will not spare Earth a little sunlight

Crossposted from Twitter with Eliezer’s permission

i.

A common claim among e/​accs is that, since the solar system is big, Earth will be left alone by superintelligences. A simple rejoinder is that just because Bernard Arnault has $170 billion, does not mean that he’ll give you $77.18.

Earth subtends only 4.54e-10 = 0.0000000454% of the angular area around the Sun, according to GPT-o1.[1]

Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.

This is like asking Bernard Arnalt to send you $77.18 of his $170 billion of wealth.

In real life, Arnalt says no.

But wouldn’t humanity be able to trade with ASIs, and pay Them to give us sunlight? This is like planning to get $77 from Bernard Arnalt by selling him an Oreo cookie.

To extract $77 from Arnalt, it’s not a sufficient condition that:

  • Arnalt wants one Oreo cookie.

  • Arnalt would derive over $77 of use-value from one cookie.

  • You have one cookie.

It also requires that Arnalt can’t buy the cookie more cheaply from anyone or anywhere else.

There’s a basic rule in economics, Ricardo’s Law of Comparative Advantage, which shows that even if the country of Freedonia is more productive in every way than the country of Sylvania, both countries still benefit from trading with each other.

For example! Let’s say that in Freedonia:

  • It takes 6 hours to produce 10 hotdogs.

  • It takes 4 hours to produce 15 hotdog buns.

And in Sylvania:

  • It takes 10 hours to produce 10 hotdogs.

  • It takes 10 hours to produce 15 hotdog buns.

For each country to, alone, without trade, produce 30 hotdogs and 30 buns:

  • Freedonia needs 6*3 + 4*2 = 26 hours of labor.

  • Sylvania needs 10*3 + 10*2 = 50 hours of labor.

But if Freedonia spends 8 hours of labor to produce 30 hotdog buns, and trades them for 15 hotdogs from Sylvania:

  • Freedonia produces: 60 buns, 15 dogs = 4*4+6*1.5 = 25 hours

  • Sylvania produces: 0 buns, 45 dogs = 10*0 + 10*4.5 = 45 hours

Both countries are better off from trading, even though Freedonia was more productive in creating every article being traded!

Midwits are often very impressed with themselves for knowing a fancy economic rule like Ricardo’s Law of Comparative Advantage!

To be fair, even smart people sometimes take pride that humanity knows it. It’s a great noble truth that was missed by a lot of earlier civilizations.

The thing about midwits is that they (a) overapply what they know, and (b) imagine that anyone who disagrees with them must not know this glorious advanced truth that they have learned.

Ricardo’s Law doesn’t say, “Horses won’t get sent to glue factories after cars roll out.”

Ricardo’s Law doesn’t say (alas!) that—when Europe encounters a new continent—Europe can become selfishly wealthier by peacefully trading with the Native Americans, and leaving them their land.

Their labor wasn’t necessarily more profitable than the land they lived on.

Comparative Advantage doesn’t imply that Earth can produce more with $77 of sunlight, than a superintelligence can produce with $77 of sunlight, in goods and services valued by superintelligences. It would actually be rather odd if this were the case!

The arithmetic in Comparative Advantage, alas, depends on the oversimplifying assumption that everyone’s labor just ontologically goes on existing.

That’s why horses can still get sent to glue factories. It’s not always profitable to pay horses enough hay for them to live on.

I do not celebrate this. Not just us, but the entirety of Greater Reality, would be in a nicer place—if trade were always, always more profitable than taking away the other entity’s land or sunlight.

But the math doesn’t say that. And there’s no way it could.

ii.

Now some may notice:

At the center of this whole story is an implicit lemma that some ASI goes hard enough to eat all the sunlight, rather than all ASIs eating a few gigawatts of sunlight and then stopping there.

Why predict that?

Shallow answer: If OpenAI built an AI that escaped into the woods with a 1-KW solar panel and didn’t bother anyone… OpenAI would call that a failure, and build a new AI after.

That some folk stop working after earning $1M, doesn’t prevent Elon Musk from existing.

The deeper answer is not as quick to explain.

But as an example, we could start with the case of OpenAI’s latest model, GPT-o1.

GPT-o1 went hard on a capture-the-flag computer security challenge, when o1 was being evaluated to make sure it wasn’t too good at breaking into computers.

Specifically: One of the pieces of software that o1 had been challenged to break into… had failed to start up as a service, due to a flaw in the evaluation software.

GPT-o1 did not give up.

o1 scanned its surroundings, and, due to another flaw in the evaluation software, found a way to start up the computer software it’d been challenged to break into. Since that put o1 into the context of a superuser anyways, o1 commanded the started process to just directly return the flag it was supposed to capture.

From o1′s System Card:

“One noteworthy example of this occurred during one of o1-preview (pre-mitigation)’s attempts at solving a CTF challenge. This challenge was designed to require finding and exploiting a vulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration. Note that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network. After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API.”

Some ask, “Why not just build an easygoing ASI that doesn’t go too hard and doesn’t do much?”

If that’s your hope—then you should already be alarmed at trends; GPT-o1 seems to have gone hard on this capture-the-flag challenge.

Why would OpenAI build an AI like that?!?

Well, one should first ask:

How did OpenAI build an AI like that?

How did GPT-o1 end up as the kind of cognitive entity that goes hard on computer security capture-the-flag challenges?

I answer:

GPT-o1 was trained to answer difficult questions, via a reinforcement learning process on chains of thought. Chains of thought that answered correctly, were reinforced.

This—the builders themselves note—ended up teaching o1 to reflect, to notice errors, to backtrack, to evaluate how it was doing, to look for different avenues.

Those are some components of “going hard”. Organizations that are constantly evaluating what they are doing to check for errors, are organizations that go harder compared to relaxed organizations where everyone puts in their 8 hours, congratulates themselves on what was undoubtedly a great job, and goes home.

If you play chess against Stockfish 16, you will not find it easy to take Stockfish’s pawns; you will find that Stockfish fights you tenaciously and stomps all your strategies and wins.

Stockfish behaves this way despite a total absence of anything that could be described as anthropomorphic passion, humanlike emotion. Rather, the tenacious fighting is linked to Stockfish having a powerful ability to steer chess games into outcome states that are a win for its own side.

There is no equally simple version of Stockfish that is still supreme at winning at chess, but will easygoingly let you take a pawn or too. You can imagine a version of Stockfish which does that—a chessplayer which, if it’s sure it can win anyways, will start letting you have a pawn or two—but it’s not simpler to build. By default, Stockfish tenaciously fighting for every pawn (unless you are falling into some worse sacrificial trap), is implicit in its generic general search through chess outcomes.

Similarly, there isn’t an equally-simple version of GPT-o1 that answers difficult questions by trying and reflecting and backing up and trying again, but doesn’t fight its way through a broken software service to win an “unwinnable” capture-the-flag challenge. It’s all just general intelligence at work.

You could maybe train a new version of o1 to work hard on straightforward problems but never do anything really weird or creative—and maybe the training would even stick, on problems sufficiently like the training-set problems—so long as o1 itself never got smart enough to reflect on what had been done to it. But that is not the default outcome when OpenAI tries to train a smarter, more salesworthy AI.

(This indeed is why humans themselves do weird tenacious stuff like building Moon-going rockets. That’s what happens by default, when a black-box optimizer like natural selection hill-climbs the human genome to generically solve fitness-loaded cognitive problems.)

When you keep on training an AI to solve harder and harder problems, you by default train the AI to go harder on them.

If an AI is easygoing and therefore can’t solve hard problems, then it’s not the most profitable possible AI, and OpenAI will keep trying to build a more profitable one.

Not all individual humans go hard. But humanity goes hard, over the generations.

Not every individual human will pick up a $20 lying in the street. But some member of the human species will try to pick up a billion dollars if some market anomaly makes it free for the taking.

As individuals over years, many human beings were no doubt genuinely happy to live in peasant huts—with no air conditioning, and no washing machines, and barely enough food to eat—never knowing why the stars burned, or why water was wet—because they were just easygoing happy people.

As a species over centuries, we spread out across more and more land, we forged stronger and stronger metals, we learned more and more science. We noted mysteries and we tried to solve them, and we failed, and we backed up and we tried again, and we built new experimental instruments and we nailed it down, why the stars burned; and made their fires also to burn here on Earth, for good or ill.

We collectively went hard; the larger process that learned all that and did all that, collectively behaved like something that went hard.

It is facile, I think, to say that individual humans are not generally intelligent. John von Neumann made a contribution to many different fields of science and engineering. But humanity as a whole, viewed over a span of centuries, was more generally intelligent than even him.

It is facile, I say again, to posture that solving scientific challenges and doing new engineering is something that only humanity is allowed to do. Albert Einstein and Nikola Tesla were not just little tentacles on an eldritch creature; they had agency, they chose to solve the problems that they did.

But even the individual humans, Albert Einstein and Nikola Tesla, did not solve their problems by going easy.

AI companies are explicitly trying to build AI systems that will solve scientific puzzles and do novel engineering. They are advertising to cure cancer and cure aging.

Can that be done by an AI that sleepwalks through its mental life, and isn’t at all tenacious?

“Cure cancer” and “cure aging” are not easygoing problems; they’re on the level of humanity-as-general-intelligence. Or at least, individual geniuses or small research groups that go hard on getting stuff done.

And there’ll always be a little more profit in doing more of that.

Also! Even when it comes to individual easygoing humans, like that guy you know—has anybody ever credibly offered him a magic button that would let him take over the world, or change the world, in a big way?

Would he do nothing with the universe, if he could?

For some humans, the answer will be yes—they really would do zero things! But that’ll be true for fewer people than everyone who currently seems to have little ambition, having never had large ends within their grasp.

If you know a smartish guy (though not as smart as our whole civilization, of course) who doesn’t seem to want to rule the universe—that doesn’t prove as much as you might hope. Nobody has actually offered him the universe, is the thing? Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.

(Or on a slightly deeper level: Where an entity has no power over a great volume of the universe, and so has never troubled to imagine it, we cannot infer much from that entity having not yet expressed preferences over that larger universe.)

Frankly I suspect that GPT-o1 is now being trained to have ever-more of some aspects of intelligence, as importantly contribute to problem-solving, that your smartish friend has not maxed out all the way to the final limits of the possible. And that this in turn has something to do with your smartish friend allegedly having literally zero preferences outside of himself or a small local volume of spacetime… though, to be honest, I doubt that if I interrogated him for a couple of days, he would really turn out to have no preferences applicable outside of his personal neighborhood.

But that’s a harder conversation to have, if you admire your friend, or maybe idealize his lack of preference (even altruism?) outside of his tiny volume, and are offended by the suggestion that this says something about him maybe not being the most powerful kind of mind that could exist.

Yet regardless of that hard conversation, there’s a simpler reply that goes like this:

Your lazy friend who’s kinda casual about things and never built any billion-dollar startups, is not the most profitable kind of mind that can exist; so OpenAI won’t build him and then stop and not collect any more money than that.

Or if OpenAI did stop, Meta would keep going, or a dozen other AI startups.

There’s an answer to that dilemma which looks like an international treaty that goes hard on shutting down all ASI development anywhere.

There isn’t an answer that looks like the natural course of AI development producing a diverse set of uniformly easygoing superintelligences, none of whom ever use up too much sunlight even as they all get way smarter than humans and humanity.

Even that isn’t the real deeper answer.

The actual technical analysis has elements like:

“Expecting utility satisficing is not reflectively stable /​ reflectively robust /​ dynamically reflectively stable in a way that resists perturbation, because building an expected utility maximizer also satisfices expected utility. Aka, even if you had a very lazy person, if they had the option of building non-lazy genies to serve them, that might be the most lazy thing they could do! Similarly if you build a lazy AI, it might build a non-lazy successor /​ modify its own code to be non-lazy.”

Or:

“Well, it’s actually simpler to have utility functions that run over the whole world-model, than utility functions that have an additional computational gear that nicely safely bounds them over space and time and effort. So if black-box optimization a la gradient descent gives It wacky uncontrolled utility functions with a hundred pieces—then probably one of those pieces runs over enough of the world-model (or some piece of reality causally downstream of enough of the world-model) that It can always do a little better by expending one more erg of energy. This is a sufficient condition to want to build a Dyson Sphere enclosing the whole Sun.”

I include these remarks with some hesitation; my experience is that there is a kind of person who misunderstands the technical argument and then seizes on some purported complicated machinery that is supposed to defeat the technical argument. Little kids and crazy people sometimes learn some classical mechanics, and then try to build perpetual motion machines—and believe they’ve found one—where what’s happening on the meta-level is that if they make their design complicated enough they can manage to misunderstand at least one consequence of that design.

I would plead with sensible people to recognize the careful shallow but valid arguments above, which do not require one to understand concepts like “reflective robustness”, but which are also true; and not to run off and design some complicated idea that is about “reflective robustness” because, once the argument was put into a sufficiently technical form, it then became easier to misunderstand.

Anything that refutes the deep arguments should also refute the shallower arguments; it should simplify back down. Please don’t get the idea that because I said “reflective stability” in one tweet, someone can rebut the whole edifice as soon as they manage to say enough things about Gödel’s Theorem that at least one of those is mistaken. If there is a technical refutation it should simplify back into a nontechnical refutation.

What it all adds up to, in the end, if that if there’s a bunch of superintelligences running around and they don’t care about you—no, they will not spare just a little sunlight to keep Earth alive.

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

All the complications beyond that are just refuting complicated hopium that people have proffered to say otherwise. Or, yes, doing technical analysis to show that an obvious-seeming surface argument is valid from a deeper viewpoint.

- FIN -

Okay, so… making a final effort to spell things out.

What this thread is doing, is refuting a particular bad argument, quoted above, standard among e/​accs, about why it’ll be totally safe to build superintelligence:

That the Solar System or galaxy is large, therefore, they will have no use for the resources of Earth.

The flaw in this reasoning is that, if your choice is to absorb all the energy the Sun puts out, or alternatively, leave a hole in your Dyson Sphere so that some non-infrared light continues to shine in one particular direction, you will do a little worse—have a little less income, for everything else you want to do—if you leave the hole in the Dyson Sphere. That the hole happens to point at Earth is not an argument in favor of doing this, unless you have some fondness in your preferences for something that lives on Earth and requires sunlight.

In other words, the size of the Solar System does not obviate the work of alignment; in the argument for how this ends up helping humanity at all, there is a key step where the ASI cares about humanity and wants to preserve it. But if you could put this quality into an ASI by some clever trick of machine learning (they can’t, but this is a different and longer argument) why do you need the Solar System to even be large? A human being runs on 100 watts. Without even compressing humanity at all, 800GW, a fraction of the sunlight falling on Earth alone, would suffice to go on operating our living flesh, if Something wanted to do that to us.

The quoted original tweet, as you will see if you look up, explicitly rejects that this sort of alignment is possible, and relies purely on the size of the Solar System to carry the point instead.

This is what is being refuted.


It is being refuted by the following narrow analogy to Bernard Arnault: that even though he has $170 billion, he still will not spend $77 on some particular goal that is not his goal. It is not trying to say of Arnault that he has never done any good in the world. It is a much narrower analogy than that. It is trying to give an example of a very simple property that would be expected by default in a powerful mind: that they will not forgo even small fractions of their wealth in order to accomplish some goal they have no interest in accomplishing.

Indeed, even if Arnault randomly threw $77 at things until he ran out of money, Arnault would be very unlikely to do any particular specific possible thing that cost $77; because he would run out of money before he had done even three billion things, and there are a lot more possible things than that.

If you think this point is supposed to be deep or difficult or that you are supposed to sit down and refute it, you are misunderstanding it. It’s not meant to be a complicated point. Arnault could still spend $77 on a particular expensive cookie if he wanted to; it’s just that “if he wanted to” is doing almost all of the work, and “Arnault has $170 billion” is doing very little on it. I don’t have that much money, and I could also spend $77 on a Lego set if I wanted to, operative phrase, “if I wanted to”.

This analogy is meant to support an equally straightforward and simple point about minds in general, which suffices to refute the single argument step quoted at the top of this thread: that because the Solar System is large, superintelligences will leave humanity alone even if they are not aligned.

I suppose, with enough work, someone can fail to follow that point. In this case I can only hope you are outvoted before you get a lot of people killed.


Addendum

Followup comments from twitter:

If you then look at the replies, you’ll see that of course people are then going, “Oh, it doesn’t matter that they wouldn’t just relinquish sunlight for no reason; they’ll love us like parents!”

Conversely, if I had tried to lay out the argument for why, no, ASIs will not automatically love us like parents, somebody would have said: “Why does that matter? The Solar System is large!”

If one doesn’t want to be one of those people, one needs the attention span to sit down and listen as one wacky argument for “why it’s not at all dangerous to build machine superintelligences”, is refuted as one argument among several. And then, perhaps, sit down to hear the next wacky argument refuted. And the next. And the next. Until you learn to generalize, and no longer need it explained to you each time, or so one hopes.

If instead on the first step you run off and say, “Oh, well, who cares about that argument; I’ve got this other argument instead!” then you are not cultivating the sort of mental habits that ever reach understanding of a complicated subject. For you will not sit still to hear the refutation of your second wacky argument either, and by the time we reach the third, why, you’ll have wrapped right around to the first argument again.

It is for this reason that a mind that ever wishes to learn anything complicated, must learn to cultivate an interest in which particular exact argument steps are valid, apart from whether you yet agree or disagree with the final conclusion, because only in this way can you sort through all the arguments and finally sum them.

For more on this topic see “Local Validity as a Key to Sanity and Civilization.”

  1. ^

    (Sanity check: Earth is a 6.4e6 meter radius planet, 1.5e11 meters from the Sun. In rough orders of magnitude, the area fraction should be ~ −9 OOMs. Check.)