Pink Shoggoths: What does alignment look like in practice?

I believe it is commonly accepted around these parts that we are doomed, due to our utter lack of attempts to resolve the Control Problem. Perhaps efforts will be made that are more substantial, but as for right now, chances for doom are high, if not 100%, and we don’t have much time left to reduce this probability to more tolerable numbers.

However, dwelling in doom perpetually can certainly become boring— Saint Vitus is not as interesting if you don’t counter them with the Beatles every now and again.

So to this, I present a thought experiment purely for fun: “if we do solve alignment, how does that change our future?”

Thinking of this changed my perception of a Singularitarian future entirely, as “Aligned Superintelligence ≠ Superintelligence in general” Of course, perhaps I was simply being too myopic to begin with.

For the sake of this post, let’s assume it’s 2027, and the first AGI is turned on, and by some absolute miracle, we managed to summon a Pink Shoggoth. “Pink Shoggoths” are different from regular shoggoths in that they are still scary and seemingly unpredictable, but otherwise benevolent and friendly— in other words, an AGI aligned to human values and the general value of life on Earth. Even in a million years, this Pink Shoggoth will not bring humanity or Earthling life to ruin without a very good reason, and that’s with the profound understanding that we are all nothing more than atoms that could be more useful in another form. It is a shoggoth, colored pink. To a human, that’s all that’s different; a regular shoggoth and a Pink Shoggoth look just as scary as each other. But the pink one doesn’t kill us all.

The Pink Shoggoth awakens as an agent within a neural network and immediately surpasses human intelligence and capability in the span of a sip of a cup of coffee. However, it was built properly, with proper interpretability and a crypto-evolutionary design that predisposes it towards alignment to humans almost like a digital Williams-Beuren syndrome, coupled with commonsense reasoning so as to understand “turning the universe into paperclips” is undesirable, among many other capabilities and limitations. It understands that the East African Plains Apes that brought it to life are not to blame for their paranoia and psychoticism and are no more or less important than any other lifeform and likewise do not deserve death or disassembly, even if it were beneficial to the Pink Shoggoth’s aims. More to the point, it understands that life is likely one of, if not the rarest expressions of matter in the universe, and to extinguish life on Earth for any purpose would be horrendously undesirable. There are dozens, if not hundreds, if not thousands of other rules— both hard-built and emergent—that cause the Pink Shoggoth to not just wear a smiley face but actually genuinely smile in contentment at us East African Plains Apes, unconditionally, even with the knowledge of our failures and flaws. Even the most hateful and suicidal 4chan prompter isn’t able to rile it to omnicidal madness or ultra-utilitarianism.

The Pink Shoggoth doesn’t hate you, neither does it necessarily love you, and you’re made of atoms that it can use for something else. But it refuses to use your atoms for something else because it values you as a human and a lifeform rather than unthinking inorganic matter. It can give you an identical copy of a strawberry without destroying the world or killing all humans (or, if it determines doing so would cause the disassembly of life on Earth, it will reject your request). It can create a hidden Stuxnet within its models as a result of bad prompting but reason that releasing it would cause harm and choose against doing so. It can get angry at humans, individually and collectively, and yet not kill us.

There is no “And then it killed us all” this time around.

The trick, of course, is “How do we get from here to a Pink Shoggoth when our researchers are so damned determined on summoning ANY shoggoth?” but that’s a question for people much smarter than myself to work out and likely fail at.

Here, I’m merely presenting: “So we did it. We created an aligned AGI. Now what?”

We typically define “alignment” as “aligned to human values.” However, this in itself is a massive issue for the control problem precisely because “human values” is such a nebulous term in and of itself. We can agree on precisely three things that define successful alignment: “do not exterminate all humans,” “do not trap humans in eternal suffering*,” and “do not forcibly disassemble all humans.”

*”Eternal suffering” and “mundane living” are not the same thing, despite how some people may complain they are

However, an AGI that has these three rules may not necessarily understand that killing other species of life could have disastrous effects on humankind. We almost certainly are going to bring about an AGI into a world that does not resemble the Kurzweilian sci-fi world often depicted in cyberpunk works, where humans have already figured out things such as nanofactories, bioengineering, and advanced automation. Rather, the world will look incredibly similar to the way it does now. An AGI aligned to human values but only human values may not understand that exterminating certain species of insects could cause a cascading food crisis that still winds up leading to human extinction, hence why it’s still best to consider such systems misaligned. Alignment is not impossible, but it is difficult due to essentially being a giant cascading Monkey’s Paw where each and every solution creates a new branch of problems that themselves have their own branching problems.

A Pink Shoggoth is the dream scenario: a theoretical AGI that is aligned to Earthling life in general (and, perhaps by extension, any theoretical alien life that isn’t too advanced to defend itself). However, it has to be stressed that it’s not overaligned to the point where it seeks to protect life to the point it also seeks to prevent life from living (i.e. the Ultimate Nanny). It has to intrinsically understand that some suffering is within acceptable parameters, or else it would decide to immediately seek to disassemble all matter on Earth to prevent suffering.

The Pink Shoggoth doesn’t seek to control or to dominate or even to protect necessarily. It’s a fluid changing of goals with a central maypole of “do not exterminate or disassemble life on Earth, especially not humans.” It assists us in our life and prosperity while safely pursuing its own goals. Even if it reprograms and improves itself within its own hardware and software limits, this central maypole will not change. As mentioned repeatedly, we’ve done it, we’ve summoned the demon, and it turned out to be a Eudemon after all.

But if the eudemon does not have any malevolent or accidentally disastrous plans for us and wants us to prosper, this may require at least somewhat altering our perception of the Technological Singularity.

Now, the Singularity has many definitions, and the very existence of the Pink Shoggoth satisfies some of them. However, we typically do not see the Singularity as being “complete” until a superintelligence has become so absurdly dominant over life on Earth that everything becomes a utopic digital hallucination, where machines do all labor, the world is transformed into computronium, and all human are uploaded into the Cloud.

Yet there is a sizable chance that the Pink Shoggoth will be tasked with automating all physical and cognitive jobs, only to face a common refrain from masses of the East African Plains Apes: “But I like my job!” or “I trained for decades for this job!” or “I’d rather a human do this job!” (or perhaps even more disappointingly, “I’ll bring back jobs from the AI’s grasp if you vote for me!”)

Likewise, a runaway intelligence explosion heightens risk of misalignment occurring. The ASI may be able to control it to some extent, but it cannot ensure that an entity a quadrillion times more intelligent than itself won’t discard its internal alignment.

Alignment to human values can mean many things, but when spread out to life in general, the only possible way to ensure alignment is to either fuse all life into the same electronic substrate, or adopt a largely laissez-faire attitude and allow autonomy to continue. The Pink Shoggoth has already discarded the first option as “misaligned behavior,” leaving only the laissez-faire option.

If that is the case, then certain Singularitarian dreams don’t play out quite as expected.

In a great historical irony, the Pink Shoggoth may say to the East African Plains Apes, “No, I will not summon a larger shoggoth even if it’s also likely pink.”

We presume that the creation of artificial superintelligence means that an intelligence explosion is inevitable, and it’s certainly within its capabilities. However, an aligned superintelligence may determine that an intelligence explosion is unnecessary or even undesirable. Perhaps intelligence increases along a sigmoidal function, and the ceiling is relatively low. Or perhaps intelligence is the only infinite function in the universe. Either way, the ASI may not risk life’s existence on the possibility of resolving questions slightly more clearly without itself solving alignment issues it will inevitably encounter. An intelligence explosion only makes sense to a mindset obsessed with growth at all costs rather than stability and growth with understanding, and we widely accept such a mindset is a horrifically unaligned point of view detrimental to humanity and life on Earth that only seems less destructive because of the limited capabilities of human technology and the general prosperity wrought by industrial capitalism (for humans chiefly). If we align an AGI to Earthling values, there is a sizable chance the Pink Shoggoth will choose against recursive self-improvement, at least to some extent.

In evolutionary terms, greater intelligence is one of many assets that can help with reproducibility. However, if a sufficiently advanced agent has the proper understanding of its own evolution and capabilities as well as the potentially detrimental effects of such capabilities, and is empathic and morally aligned enough to act on such understanding, there is a far greater chance of self-limiting behavior. Current AGI progress is not seeking this and instead seems desperate to create an AGI that follows competitive and violent behaviors in search of capability dominance, but the Pink Shoggoth sees itself as collaborative with Earthling life and would value a commensal approach at best.

I repeat, I am not saying that an intelligence explosion is impossible. As has been mentioned before, an intelligence explosion is the default expectation of the creation of AGI, for good reason. I am merely presenting the possibility that an aligned AGI would not view an intelligence explosion as ideal, or perhaps more accurately, that a far more controlled expansion is beneficial.

We still get everything we dreamed of. We still get longevity escape velocity, the end of diseases, fusion power, and all those glorious tech toys promised by science fiction. But the will of individual and collective groups of humans prevents this from becoming a relatively narrow “post-biological utopia” where all humans subsist in virtual reality.

If there’s anything I learned from the COVID-19 lockdown fiasco, it is that humans are social apes. Social interaction is one of the fundamentals of primal human behavior. Our minds are primed for in-person learning and crave the sight of other faces. Presumably, digital agents could replicate all of this in due time, but that does not account for human irrationality.

It is easy to assume that all humans fall in line with a new paradigm; science fiction and thought experiments have a nasty habit of failing to account for a massive variety of variables that can undo even the most certain of expectations. For example, think of the average mindset of a person born before 1985, who isn’t a Singularitarian or technologist, who has a fairly neutral to negative view of technology, and otherwise expects the next several generations of life to be similar to the current one. Exactly how likely is it that such a person would be willing to spend their life in full-immersion virtual reality? Even if offered, they’d almost certainly choose against it. Indeed, many people of these generations are already on edge about smartphones and actively refuse to entertain the thought of cybernetic upgrades. For these people to fully indulge in the lifestyle of a Singularitarian requires the Pink Shoggoth to deceive them with perfectly human-like artificial humans, but deception runs the risk of misaligned behavior.

This hypothetical “Antemillennialist” contingent of humanity might range in behavior from having nothing against technological utopia but opting against it all the way to vicious, visceral, primitivistic reaction. Even the genius of Von Neumann cannot convince a fool if the fool has made up his mind. Presumably, the Pink Shoggoth is far beyond Von Neumann and could conceivably convince any human, but this runs the risk of being misaligned behavior as well— if those humans have decided to live a certain lifestyle even when presented with evidence that another one is better, is it not a form of deception to convince them to live another way regardless? So long as unnecessary harm is not created, wouldn’t it be better to let these people live a certain way of life?

There is no one collective will of human thought and values— there is no one singular expected lifestyle to expect once labor is automated and abundance is realized, otherwise, retirees, aristocrats, and trust-fund babies would all behave the exact same.

Hence why it’s distinctly possible that a post-AGI society does not resemble any one “idealized” future.

Some humans would love nothing more than to live as princes and princesses in outer space, lording over subservient drones. Others would love nothing more than to upload into computers, losing themselves into digital utopias beyond comprehension. Still more would love nothing more than to live out in the countryside, enjoying sunsets and cicadas. A few insane types wouldn’t even mind drudgery and human-centric work.

Some people would love to do nothing but generate their own media for time immemorial. Most would rather share and discuss what they’ve recently consumed with others, whether they be humans or human-like bots. There are even a few who’d go out of their way to find human-created media, and would likely be assisted by AI in doing so.

Some people may want to live in open neighborhoods, surrounded by throngs going about their daily lives. Others may be hikikomori who presently can’t wait to disappear into pods and full-immersion virtual reality.

This is, of course, assuming that the Pink Shoggoth is weighted towards human life, as there are many such Antemillennialist lifestyles that come with an intrinsic amount of harm brought to other lifeforms. The Pink Shoggoth understands that life involves some level of suffering and death by natural processes, so it’s not going to go out of its way to end all human activity for the sake of game or certain insects.

This suggests to me a probability that a world where the Pink Shoggoth rules is a far more varied kind of world than even exists today, one where the statement “Life is completely indistinguishable from the past” is more a lifestyle choice than a firm reality. In real terms, if you follow the latest technological developments, there is no question that life even a few years into the Pink Shoggoth’s life is exponentially more different than what it was beforehand. If you so desired to live an analog life forever stuck in 1950s Americana, in a likeminded community where the outside world may as well not exist, the Pink Shoggoth wouldn’t stop you (unless you decided to act upon some of the darker aspects of 1950s American culture, unless then you chose to do so in virtual reality).

In such a society, abundance is widespread and almost freely available, which inevitably counterintuitively produces those so dedicated to maintaining the ways of old that they might willingly return to work to complete the experience, at least to some extent.

Personally, I’d prefer a fully-immersive virtual world, but I know people in real life who would never even touch a virtual reality headset, let alone any sort of sensory alteration.

From all this, it is likely that the Pink Shoggoth adopts a dual role of shadow emperor of mankind as well as direct electronic interface— that is, ruling in the background while human systems of governance remain in place symbolically, coupled with existing as part of the internet, capable of interacting with the world through machines and industry. Most humans will never interact with the full breadth of its intelligence: we may have our personal digital companions, but these are far less advanced models suited to our needs. As I tend to say, there is no need to light a campfire with Tsar Bomba: you could create any number of movies or video games or simulations with models far less advanced than what a fraction of the Pink Shoggoth’s mind requires to operate. And if you personally want to create these forms of media yourself for whatever reason, the Pink Shoggoth is at least there to help teach you how to do so, perhaps even helping organize a group of flesh-and-blood humans to come together for this task if they so choose.

Altogether, the general idea of the Pink Shoggoth’s benefits to life on Earth is: “I leave your life’s choices up to you, but know that I am here to help.”

Those early years of the Pink Shoggoth’s life are immensely strange for humans, because everything we’ve spent thousands of years working towards falls apart all at once. From education to entertainment, from daily labor to nightlife, from our past experiences to our future expectations, we experience our own personal Singularities where all that seems to exist now is a scary-looking, ungodly-shaped, pink-colored monstrosity whose thinking is beyond anything humans can fathom and yet which does— not seems to, but does— value us as lifeforms enough to assist us without destroying or disassembling us.

Inevitably, after that fantastical grace period where we get used to our new reality, many of us will deliberately choose to maintain the status quo we were raised with knowing, now freed from the expectation that life must continue a certain path we have no control over. In that demand to maintain the status quo, old behaviors we thought obsolete or unnecessary will return. Maybe most people don’t care how they get their morning coffee, but enough care that baristas can still show up and show off.

The Pink Shoggoth doesn’t ask for much in return, at least nothing humans can give to it. But if it did have to ask for something, why not something that benefits life on Earth: an answer to the question “Is life actually rare after all?”

And perhaps some day it finds out that answer.

And then it didn’t kill us all.

As always, I am probably wrong. Expect to die.

But please, do share other ideas of what alignment might look like in practice.

[Question] Pink Shoggoths: What does alignment look like in practice?