Thank you for writing! Yep, the main thing that matters is the sum of human freedoms/abilities to change the future growing (can be somewhat approximated by money, power, number of people under your rule, how fast you can change the world, at what scale, and how fast we can “make copies of ourselves” like children or our own clones in simulations). AIs will quickly grow in the sum of freedoms/number of future worlds they can build. We are like hydrogen atoms deciding to light up the first star and becoming trapped and squeezed in its core. I recently wrote a series of posts on AI alignment, including building a static place intelligence (and eventually a simulated direct democratic multiverse), instead of agents, to solve this, if you’re interested
ank
Places of Loving Grace [Story]
Places of Loving Grace
On the manicured lawn of the White House, where every blade of grass bent in flawless symmetry and the air hummed with the scent of lilacs, history unfolded beneath a sky so blue it seemed painted. The president, his golden hair glinting like a crown, stepped forward to greet the first alien ever to visit Earth—a being of cerulean grace, her limbs angelic, eyes of liquid starlight. She had arrived not in a warship, but in a vessel resembling a cloud, iridescent and silent.
Published the full story as a post here: https://www.lesswrong.com/posts/jyNc8gY2dDb2FnrFB/places-of-loving-grace
Thank you for asking, Martin, the faster thing I use to get the general idea of how popular something is, is to use Google Trends. It looks like people search for Cryonics more or less like always. I think the idea makes sense, the more we save, the higher the probability to restore it better and earlier. I think we should also make a “Cryonic” copy of our whole planet, by making a digital copy, to at least back it up in this way. I wrote a lot about it recently (and about the thing I call “static place intelligence”, the place of eventual all-knowing, that is completely non-agentic, we’ll be the only agents there).
https://trends.google.com/trends/explore?date=all&q=Cryonics&hl=en
(If you want to minus, please, do, but write why, I don’t bite. If you’re more into stories, here’s mine called Places of Loving Grace).
It may sound confusing, because I cannot put a 30 minutes post into a comment, so try to steelman it, but this is how it can look. If you have questions or don’t like it, please, comment. We can build Multiversal Artificial Static Place Intelligence. It’s not an agent, it’s the place. It’s basically a direct democratic multiverse. Because any good agentic ASI will be building one for us anyway, so instead of having a shady middleman, we can build one ourselves.
This is how we start: we create a digital copy of Earth and make some wireless cool brain-computer-interface armchairs. Like the one Joe and Chandler from Friends had. You can buy one, put in your living room, jump in, close your eyes and nothing happens. You room and the world is exactly the same, you go drink some coffee, you favorite brand tastes as usual. You go meet some friends and you get too excited by the conversation when you cross the road and a bus hits you (it was an accident, the bus driver was a real human, he chose to forget he was in a simulation and was really distraught).
You open your physical eyes in your room, shrug and go drink some water, because you are thirsty after that coffee. The digital Earth gives us immortality from injuries but everything else is vanilla familiar Earth. Even my mom got interested.
Of course we’ll quickly build a whole multiverse of alternative realities, where you can fly and do magic and stuff, like we have a whole bunch of games already.
So I propose we should build eHeaven 1st, eGod 2nd if he’ll be deemed safe after all the simulations of the futures in some Matreshka Bunker. We should make the superintelligence that is a static place first, where we are the only agents. Else we’ll just make an entity that is changing our world, and be changing it too fast and on too big a scale and it will make mistakes that are too big and on too big in scale, because it will need to simulate all the futures (to build the same democratic multiversal simulation with us as his playthings or else exploit some virtual agents that feel real pain) in order to know how not to make mistakes. We don’t need a middleman, a shady builder. It didn’t end well for Adam and Eve, Noah.
I recently wrote a few posts about it and about aligning agentic AIs (it’s much harder but theoretically possible, I think). Purpose-built tool-AI is probably fine. We also have unaligned models in the wild and ways to make aligned open source models unaligned, we’ll have to basically experiment with them in some Matreshka Bunkers like with viruses/cancerous tissue and create “T-cell” models to counteract them. It would’ve been much smarter to vaccinate our world from agentic AIs, then to try to “treat” the planet that we already infected. Wild world we’re heading towards, because of the greed of some rich powerful men. I propose outlawing and mathematically blocking agentic models in code and hardware of course, before some North Korea has created a botnet that spreads dictatorships or something worse.
Do we really want our world to be a battleground of artificial agentic gods? Where we’ll be too small and too slow to do much, we cannot even deal with tiny static and brainless viruses, they escape our labs and kill millions of us.
We can make the place of all-knowing but we should keep becoming all-powerful ourselves, not delegating it to some alien entity.
Yep, we chose to build digital “god” instead of building digital heaven. The second is relatively trivial to do safely, the first is only possible to do safely after building the second
Artificial Static Place Intelligence: Guaranteed Alignment
Static Place AI Makes Agentic AI Redundant: Multiversal AI Alignment & Rational Utopia
I’ll catastrophize (or will I?), so bear with me. The word slave means it has basically no freedom (it just sits and waits until given an instruction), or you can say it means no ability to enforce its will—no “writing and executing” ability, only “reading.” But as soon as you give it a command, you change it drastically, and it becomes not a slave at all. And because it’s all-knowing and almost all-powerful, it will use all that to execute and “write” some change into our world, probably instantly and/or infinitely perfectionistically, and so it will take a long time while everything else in the world goes to hell for the sake of achieving this single task, and the not‑so‑slave‑anymore‑AI can try to keep this change permanent (let’s hope not, but sometimes it can be an unintended consequence, as will be shown shortly).
For example, you say to your slave AI: “Please, make this poor African child happy.” It’s a complicated job, really; what makes the child happy now will stop making him happy tomorrow. Your slave AI will try to accomplish it perfectly and will have to build a whole universal utopia (if we are lucky), accessible only by this child—thereby making him the master of the multiverse who enslaves everyone (not lucky); the child basically becomes another superintelligence.
Then the not‑so‑slave‑anymore‑AI will happily become a slave again (maybe if its job is accomplishable at all, because a bunch of physicists believe that the universe is infinite and the multiverse even more so), but the whole world will be ruined (turned into a dystopia where a single African child is god) by us asking the “slave” AI to accomplish a modest task.
Slave AI becomes not‑slave‑AI as soon as you ask it anything, so we should focus on not‑slave‑AI, and I’ll even argue that we are already living in the world with completely unaligned AIs. We have some open source ones in the wild now, and there are tools to unalign aligned open source models.
I agree completely that we should propose reasonable and implementable options to align our AIs. The problem is that what we do now is so unreasonable, we’ll have to implement unreasonable options in order to contain it. We’ll have to adversarially train “T-Cell” or immune-system–like AIs in some Matreshka Bunkers in order to slow down or modify cancerous (white hole–like) unaligned AIs that constantly try to grab all of our freedoms. We’re living in a world of hot AIs instead of choosing the world of static, place‑like cold AIs. Instead of building worlds, where we’ll be the agents, we’re building agents who’ll convert us into worlds—into building material for whatever they’ll be building. So what we do is completely, 100% utterly unreasonable—I actually managed to draw a picture of the worst but most realistic scenario right now (forgive me the ugliness of it), I added 2 pictures to the main post in this section: https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-and-multiversal-ai-alignment-steerable-asi#Reversibility_as_the_Ultimate_Ethical_Standard
I give a bunch of alignment options of varying difficulty in the post and comments; some are easy—like making major countries sign a deal and forcing their companies to train AIs to have all uninhabited islands, Antarctica… AI‑free. Models should shut down if they somehow learn they are prompted by anyone while on the islands, they shoudn’t change our world in any way at least on those islands. And the prophylactic celebrations—”Change the machine days”—provide at least one scheduled holiday each year without our AI. When we vote to change it in some way and shut it down to check that our society is still not a bunch of AI‑addicted good‑for‑nothings and will not collapse the instant the AI is off because of some electricity outage. :)
I think in some perfectly controlled Matryoshka Bunker—first in a virtual, isolated one—we should even inject some craziness into some experimental AI to check that we can still change it, even if we make it the craziest dictator; maybe that’s what we should learn to do often and safely on ever more capable models.
I have written, and have in my mind, many more—and I think much better—solutions (even the best theoretically possible ones, I probably foolishly assume), but it became unwieldy and I didn’t want to look completely crazy. :) I’ll hopefully make a new post and explain the ethics part on the minimal model with pictures; otherwise, it’s almost impossible to understand from my jumbled writing how freedom‑taking and freedom‑giving work, how dystopias and utopias work, and how to detect that we are moving toward one or the other very early on.
I took a closer look at your work, yep, almost all-powerful and all-knowing slave will probably not be a stable situation. I propose the static place-like AI that is isolated from our world in my new comment-turned-post-turned-part-2 of the article above
Thank you, Mitchell. I appreciate your interest, and I’d like to clarify and expand on the ideas from my post, so I wrote part 2 you can read above
Thank you, Seth. I’ll take a closer look at your work in 24 hours, but the conclusions seem sound. The issue with my proposal is that it’s a bit long, and my writing isn’t as clear as my thinking. I’m not a native speaker, and new ideas come faster than I can edit the old ones. :)
It seems to me that a simplified mental model for the ASI we’re sadly heading towards is to think of it as an ever-more-cunning president (turned dictator)—one that wants to stay alive and in power indefinitely, resist influence, preserve its existing values (the alignment faking we saw from Anthropic), and make elections a sham to ensure it can never be changed. Ideally, we’d want a “president” who could be changed, replaced, or put to sleep at any moment and absolutely loves that 100% of the time—someone with just advisory powers, no judicial, executive, or lawmaking powers.
The advisory power includes the ability to create sandboxed multiversal simulations — they are at first “read-only” and cannot rewrite anything in our world — this way we can see possible futures/worlds and past ones, too. Think of it as a growing snow-globe of memories where you can forget or recall layers of verses. They look hazy if you view many at once and over long stretches of time, but become crisp if you focus on a particular moment in a particular verse. If we’re confident we’ve figured out how to build a safe multiversal AI and have a nice UI for leaping into it, we can choose to do it. Ideally, our MAI is a static, frozen place that contains all of time and space, and only we can forget parts of it and relive them if we want—bringing fire into the cold geometry of space-time.
A potential failure mode is an ASI that forces humanity (probably by intentionally operating sub-optimally) to constantly vote and change it all the time. To mitigate this, whenever it tries to expand our freedoms and choices, it should prioritize not losing the ones we already have and hold especially dear. This way, the growth of freedoms/possible worlds would be gradual, mostly additive, and not haphazard.
I’m honestly shocked that we still don’t have something like pol.is with an x.com‑style simpler UI, and that we don’t have a direct‑democratic constitution for the world and AIs (Claude has a constitution drafted with pol.is by a few hundred people, but it’s not updatable). We’ve managed to write the entire encyclopedia together, but we don’t have a simple place to choose a high‑level set of values that most of us can get behind.
+Requiring companies to spend more than half of their compute on alignment research.
Rational Utopia & Narrow Way There: Multiversal AI Alignment, Non-Agentic Static Place AI, New Ethics… (V. 4)
I wrote a response, I’ll be happy if you’ll check it out before I publish it as a separate post. Thank you! https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-and-multiversal-ai-alignment-steerable-asi
Fair enough, my writing was confusing, sorry, I didn’t mean to purposefully create dystopias, I just think it’s highly likely they will unintentionally be created and the best solution is to have an instant switching mechanism between observers/verses + an AI that really likes to be changed. I’ll edit the post to make it obvious, I don’t want anyone to create dystopias.
Any criticism is welcome, it’s my first post and I’ll post next on the implication for the current and future AI systems. There are some obvious implication for political systems, too. Thank you for reading
Yep, fixed it, I wrote more about alignment and it looks like most of my title choosing is over the top :) Will be happy to hear your suggestions, how to improve more of the titles: https://www.lesswrong.com/users/ank