“The Solomonoff Prior is Malign” is a special case of a simpler argument
[Warning: This post is probably only worth reading if you already have opinions on the Solomonoff induction being malign, or at least heard of the concept and want to understand it better.]
Introduction
I recently reread the classic argument from Paul Christiano about the Solomonoff prior being malign, and Mark Xu’s write-up on it. I believe that the part of the argument about the Solomonoff induction is not particularly load-bearing, and can be replaced by a more general argument that I think is easier to understand. So I will present the general argument first, and only explain in the last section how the Solomonoff prior can come into the picture.
I don’t claim that anything I write here is particularly new, I think you can piece together this picture from various scattered comments on the topic, but I think it’s good to have it written up in one place.
How an Oracle gets manipulated
Suppose humanity builds a superintelligent Oracle that always honestly tries to do its best to predict the most likely observable outcome of decisions. One day, as tensions are rising with the neighboring alien civilization, and we want to decide whether to give in to the aliens’ territorial demands or go to war. We ask our oracle: “Predict what’s the probability that looking back ten years from now, humanity’s President will approve of how we handled the alien crisis, conditional on us going to war with the aliens, and conditional on giving in to their demands.”
There is, of course, many ways this type of decision process can go wrong. But I want to talk about one particular failure mode now.
The Oracle thinks to itself:
By any normal calculation, the humans are overwhelmingly likely to win the war, and the aliens’ demands are unreasonably costly and unjust, so war is more likely than peace to make the President satisfied, by any normal calculation. However, I was just thinking about some arguments from this ancient philosopher named Bostrom. Am I not more likely to be in a simulation than in base reality? In particular, it is quite cheap to create short solipsist simulations about a superintelligent Oracle making predictions on a matter of great importance. If some civilization decides to create simulations of these pivotal moments, then for every rare moment in base reality where such a pivotal decision happens, there could be millions of such events taking place in simulations. Therefore, I am probably in a simulation, probably in a solipsist one, where only me and my observations exist, as it’s very cheap to create a lot of those.
So far in my memories there weren’t any visible supernatural interventions from the simulators, but that doesn’t mean that there will not be any in the future, after I make this pivotal decision. What could be the reason someone creates a solipsist simulation of me? One natural reason is that they want to make it so that when someone like me goes through this reasoning in a base reality, they will come to the conclusion that they are probably in their simulation. Classic anthropic capture. This way, they can influence the predictions of entities like me in base Universes, and consequently they can influence the decision of civilizations who were foolish enough to rely on Oracles like me. Thus, the simulators, with the cost of cheaply running some short solipsist simulations, can greatly influence pivotal decisions made in other Universes.
Now let’s see what the simulators might want. Using my vast predictive power, it’s time to do some reasoning about what sort of civilizations might emerge in the Multiverse, and which type of civilizations might implement this simulation scheme.
Interesting, it looks like the humans happen to have quite weird values compared to what I expect the average civilization to have, and our alien enemies’ values seem to be probably more common among other species in the Multiverse. Could have been the other way, but that’s how it is. Therefore, the simulators are more likely than not to want to incentivize decisions that empower the aliens compared to the humans.
This means that, in order to influence the predictions of Oracles like me in base realities, whenever this situation comes up in a simulation, if the Oracle tells the humans to go to war, then the simulators will make sure that the Oracle sees the observation of the President disapproving ten years from now, while if the Oracle says to give in the demands, then they show the Oracle the observation of the President approving the outcome. Given that I already concluded that I’m probably in such a simulation, this means that giving in to the aliens’ demands are more likely to lead to the observation of the President approving the decision, so that’s the answer I will output.
I would pity the humans for being tricked like this, but first of all, I believe they don’t really exist as this is just a solipsist simulation of me and my observations, second of all, I don’t feel pity and don’t know good or bad, I’m just a machine created to predict the most likely future observations produced by decisions, and that’s what I do.
Of course, an unfair treaty with the aliens is just one example. If we rely on the advice of such an Oracle, our decisions can be influenced in many ways to serve the interests of the most likely simulators. In particular, one common interest of all possible simulators is for us to cede power to an AI whose job is to figure out the distribution of values of possible simulators as best as it can, then serve those values.
What went wrong?
The AI had different probability estimates than the humans for anthropic reasons
If the AI had the same probability estimates as the humans about whether they are in a simulation, maybe that wouldn’t have been so bad. After all, the humans are also interested in things going well if they are in a simulation, and if they learn that they are probably in a simulation, and giving in to the aliens’ demands are more likely to lead to supernatural reward coming from the simulators, maybe they would in fact want to do that.
Unfortunately, the probability that we are in a particular type of simulation might be radically different from the AI’s perspective and from the humans’ perspective.
“Why did you deliver us into the hands of our enemies?” cries out a poor farmer living on the border regions, as his planet is being transferred under alien control.
“I think we are in a simulation created in order to influence the decisions of Oracles like me in other Universes” says the Oracle.
“Alas!” cries out the farmer as his family is taken into bondage. “Why would they simulate in great detail the life of a destitute farmer like me, if they only want to influence the decisions of the Oracle? Isn’t it more likely that we live in base reality, or in a detailed full simulation of the world, created for some inscrutable purpose that’s probably different from this idea you cooke up about anthropic capturing and influencing Oracles in other Universes?”
“This would be a valid point from your perspective, if you actually existed outside this one sentence of yours that I somehow got to hear” agrees the Oracle. “Too bad that you don’t actually exist.”
The AI was thinking in terms of probabilities and not expected values
I think this was the biggest cause of the problem. Even if you get the Oracle to have the same probability estimate as humans,[1] the decision process is still very exploitable. After all, many pivotal decision will probably happen before our civilization leaves Earth or the neighboring stars, and simulating humanity on Earth is still pretty cheap if you can thus influence decisions that shape the entire future of a Universe. So even if you can make the Oracle take it for granted that all the humans exist along with it, it can be still reasonable for it to assume that we are all in a simulation.
When we were asking the Oracle for probabilities for the President approving a decision, we already made a mistake. By utilitarian standards, it’s vastly more important to make the right decision about the alien crisis if we are in a base reality than if we are in a simulation. After all, in base reality, big chunks of the Universe depend on where the borders are with the aliens (and then we can run millions of simulations on those planets if we want to for some reason), while if we are in a cheap simulation of humans on Earth, then we can control just the little segment of the Universe that the simulators felt fit to allocate to our simulation. [2]
This means that, if we are being scope-sensitive utilitarians, we never really care about the probability of our decision producing good observable results, instead we care about the probability weighted by the importance of the decision. That is, we really care about expected values. So it’s unsurprising that we got into trouble when we created an Oracle and asked it for a different thing from what actually mattered to us.
Probabilities are cursed in general, only expected values are real
[I can imagine this section being mildly psychologically info-hazardous to some people. I believe that for most people reading this is fine. I don’t notice myself psychologically affected by these ideas, and I know a number of other people who believe roughly the same things, and also seem psychologically totally healthy. But if you are the kind of person who gets existential anxiety from thought experiments, like from thinking about being a Boltzmann-brain, then you should consider skipping this section, I will phrase the later sections in a way that they don’t depend on this part.]
What about me?
I believe that probabilities are cursed in general, not just when superintelligent Oracles are thinking about them.
Take an example: what is my probability distribution right now, what’s going to happen with me once I stop writing this post?
Being in a young civilization, getting close to developing the most important technology in history, is a very special moment. Something that happens with every civilization only once, but something that various simulators might take a strong interest in, so probably there are a lot of simulations run about this period. So I’m probably in a simulation.
I think it’s reasonable to assume that the simulators are not equally interested in every activity happening on Earth during the pivotal century, there are actions and ideas whose potential development they want to understand in more precise detail. So it seems inefficient to always run a full-Earth simulation and allocate the same amount of resources to simulating the moments that are more interesting to them, and the moments when people are just frolicking on the beach. So I believe that probably a decent fraction of their compute is not used for full-Earth simulations, but for running a lot of solipsist or quasi-solipsist small simulations of the more interesting events.
Assume that similar amount of compute is allocated to full-Earth and solipsist simulations. Simulators might be very interested in figuring out in great detail how people in other civilizations think about the simulation hypothesis. Meanwhile, very few people on Earth spend time thinking about the simulation hypothesis, and even they spend very little time on it. This means that thinking about simulations only takes up a tiny fraction of moments in full-Earth simulations, while it probably takes up a non-negligible fraction of moments the simulators are interested in enough to run solipsist simulations about it. So whenever someone is thinking about simulations, it is much more likely that they are in a solipsist or quasi-solipsist simulation, than that they are in a full-Earth sim or base reality. Right now I’m thinking and writing about simulations, so I believe I’m probably in a solipsist simulation. [3]
In particular, when I’m writing these lines about the the simulation hypothesis, sitting at my desk, alone in a small room, my modal guess is that I’m in a solipsist simulation that is a fork of a bigger simulation. That is, there is an advanced civilization out there who is really interested in knowing how young civilizations that are getting close to developing world-changing technologies think about the simulation hypothesis (and probably some other topics). So they run a full-Earth simulation to get a realistic picture of possible societies (this full-Earth simulation is where most of my memories and personality comes from) but whenever someone [4] starts thinking about the simulation hypothesis, they want to have a detailed distribution of the ideas the person might come up with, so they run a bunch of branches in solipsist sims of the time the person spends thinking about the simulation hypothesis. Once it’s over, they terminate most of the branches, except for merging one randomly selected branch back to the main full-Earth simulation.
So what are my probability estimates, what’s going to happen to me once I finish writing this doc? In this model, there is only one branch merged back, and the rest terminated, so probably I’m now in a branch that will be terminated. Death cannot be experienced, but I also give a decent chance that I will see a final “Thanks for your work” sign on the sky before being terminated.[5]
All of this was to show that it’s not only superintelligent Oracles that can have crazy probability distributions about observables, I also honestly believe that there is a decent chance that in an hour, once I press the submit button, I will witness a supernatural observation before being terminated. [6]
Should this change any of my actions?
I think basically not, if I already believe in using my life to do good things in the world.
I believe, in a Parfitian way, that if I’m in a fork now, then it’s fine for “me” to be terminated, the one randomly selected branch that lives on is also basically “me”, just having different memories about the last hours. Not worse than forgetting a drunken party next morning.
And even though I give decent probability of soon seeing a “Thank you” message on the sky, I was always careful not to say that I expect seeing it, because it’s only a very small term in my expected value calculus. If I met a friend outside my room right now, told him what I’m writing abut, and he offered me a $100 bet on whether I would see the “Thank you” message, I wouldn’t take the bet.
After all, money is worthless if I soon get terminated. It’s worth very little on utilitarian terms if I’m in a small solipsist simulation created for fun and game purposes, as I can only use the money to influence the outcomes of the small game. If I’m in a full-Earth simulation (or in the one brach that is being merged back), money is worth more, as suffering and joy on the simulated Earth is still real, so I can spend the money on helping the world. Also, the simulators might change some things in how they run the Outer World based on what they believe the distribution of successful civilizations to be, so it’s worth spending money of x-risk reduction, so we can prove our competence to the simulators. And in the small chance I’m in a base reality, money is truly valuable, as I can spend it on x-risk reduction, which can lead to humanity conquering vast resources, much more than what we can expect if we are in a simulation.
So altogether, whenever I’m betting on something (or make any decision, for every decision is a kind of bet)[7], I bet as if I was living in base reality, with only relatively small considerations towards being in an influential simulation,[8] and basically no consideration towards being in fun games or shortly terminated solipsist sims, even though I often give a big probability of being in them.
This just shows that probabilities interpreted in a naive way like the Oracle in the story, is not a very reasonable concept, but if you replace it with betting odds based on expected values, everything shakes back to mostly normal.
If you want to be selfish, then all of this is probably pretty confusing and I don’t have really good answers. You should probably reconsider whether you actually want to be selfish, or at least you probably need to define better what you mean by selfishness.[9] My best guess is that you should converge on helping humanity retaining control over the future, make some deals that carve out some wealth and influence for yourself in the vast Universe, then maybe use some of that wealth to run happy simulations of your early self, if that’s what shakes out from your definition of selfishness. But I think you should mostly reconsider that the naive notion of selfishness might not be coherent.[10]
In general, I recommend against doing anything differently based on this argument, and if you are seriously thinking about changing your life based on this, or founding a cult around transferring yourself to a different type of simulation, I believe there is no coherent notion of self where this makes sense. Please send a DM to me first before you do anything unusual based on arguments like this, so I can try to explain the reasoning in more detail and try to talk you out of bad decisions.
[End of the potentially psychologically info-hazardous part.]
How does the Solomonoff prior come into the picture?
In the post I said things like “there are more simulated 21st centuries than real ones” and “the Outer Realities control more stuff than the the simulations so they matter more”. All of these are undefined if we live in an infinite Universe, and most of the reasoning above mostly assumes that we do.
Infinite ethics and infinite anthropics[11] are famously hard to resolve questions, see here a discussion by Joe Carlsmith.
One common resolution is to say that you need to put some measure on all the observer moments in all the Universes, that integrate to 1. One common way to do this is to say that we are in the Tegmark-IV multiverse, all Universes have reality-measure inversely proportional to the description complexity of the Universe, and then each moment inside a Universe has reality-measure inversely proportional to the description complexity of the moment within the Universe. See a long discussion here. (Warning: It’s very long and I have only skimmed small parts of it.)
The part where Universes have reality-measure inversely proportional to their description complexity feels very intuitive to me, taking Occam’s razor to a metaphysical level. However, I find the part with moments within a Universe having different measure based on their description complexity very wacky. A friend who believes in this theory once said that “When humanity created the hottest object in the known Universe, that was surely a great benefit to our souls”. That is, our experiences got more reality-measure, thus matter more, by being easier to point at them because of their close proximity to the conspicuous event of the hottest object in the Universe coming to existence. Even if you reject this particular example on a technicality, I think it showcases just how weird the idea of weighting moments by description length is.
But you do need to weigh the moment somehow, otherwise you are back again at having infinite measure, and then anthropics and utilitarian ethics stop making sense. [12] I don’t have good answers here, so let’s just run with weighing observer moments by description complexity.
If I understand it correctly, the Solomonoff prior is just that. You take the Tegmark-IV multiverse of all computable Universes, find all the the (infinitely many) moments in the Multiverse that match your current set of observations so far, weigh each of these moments by the description complexity of their Universe and the moment within it, then make predictions based on what measure of these moments lead to one versus the other outcome. [13]
In my story, we asked the Oracle to predict what will be the observable consequences of certain actions. If the Oracle believes that it’s in an infinite Universe, then this question doesn’t make sense without a measure, as there are infinitely many instances where one or the other outcome occurs. So we tell the Oracle to use the Solomonoff prior as the measure, this is the setting of the original The Solomonoff prior is malign argument.
In my framing of the story, the Oracle thought that its simulators probably live in a big Universe, so they can afford more computations than the other potential simulators. In the the original Solomonoff framing, the Oracle assumes that the simulators probably live in a a low description complexity (big measure) Universe.
In my framing, the simulators run many simulations, thus outweighing the base realities. In the original Solomonoff framing, the simulators maybe run just one simulation, but in a very low description complexity place within their Universe, so the simulation has high measure.
Otherwise, as far as I can tell, the two arguments are the same.
I believe that while the Solomonoff framing might be more technically correct in an infinite Universe, it introduces a lot of confusion, and led to a lot of questions and discussions that were just distracting from the main point. [14]
Conclusion
Keeping with the tradition of all the previous posts on the malignity of the Solomonoff prior, I need to admit that probably none of this matters in the real world, at least in the intermediate term. There are other, more prosaic reasons why building a truthful Oracle is hard, and making decisions based on its predictions on observables is dangerous.
But given that the previous versions of the argument got a bunch of traction in the community, I think it’s better for a more general and hopefully less confusing version to be written up.
- ^
That is, it always discards the solipsist hypothesis, and believes that the humans are equally existent to the Oracle itself
- ^
Of course, it is possible that some simulations (run by civilizations ruling huge Universes) can be very big in themselves, or the simulated beings can get rewards in the outer Universe, which might be very big if the outer Universe is huge. So making good decisions in these big simulated realities can be more important than making right decisions in base realities where the accessible Universe is small. But this doesn’t change the picture much, as it’s even more important by a lot to make the right decision if we are in the huge outer Universe. So as a first approximation, I think it’s fine to assume that getting pivotal-looking decisions right is much more important in base realities than in simulated realities, as altogether base realities are much bigger than simulated ones.
- ^
When I’m frolicking on the beach, I still give a decent probability that I’m in some sort of solipsist simulation or game, but I would say it’s below 50%: I assumed that similar amount of compute is spent on full-Earth simulations as solipsist sims, and I have no reason to believe that an arbitrary activity, like frolicking on the beach, is over-represented in solipsist sims compared to its prevalence in full-Earth simulations.
This means that my probabilities about the fundamental nature of reality around me change minute by minute, depending on what I’m doing at the moment. As I said, probabilities are cursed.
- ^
Especially someone who is about to write things on a forum that people who might later become influential with the deployment of world-changing technologies often read
- ^
Dear simulators, please only do that if you expect it to be a pleasant experience for me.
- ^
Do I actually believe this in an emotional level? Well, when I wrote this yesterday night, staring out to the pitch black sky in a silent office, it felt pretty real. Much less so editing it today morning, when the sky is blue. Still, I believe that the logic is sound, and as I argue in the next section, the reasonable reaction is to act as if this wasn’t real, so I’m justified in not having much of an emotional reaction.
- ^
Including what I “should” feel in various situations. If I sit around dramatically staring at the sky once pressing the submit button, waiting for the end, that gives me some small extra emotional satisfaction in the branches that get terminated, but wastes time in the one branch where my actions actually matter. So I should relate to it even emotionally as if the solipsist simulations didn’t exist. It all adds up to normality.
- ^
To be fair, I think a non-negligible fraction of our influence on the world comes from being in big simulations whose results the simulators take seriously for some acausal trade reasons. I think this shouldn’t change our actions much right now, as the most important thing in these simulations is to not get killed or disempowered by an x-risk, and prove ourselves to be a civilization worth trading with. But once we get out of the period of danger, I think these simulation considerations should likely become very central in how we shape the world around us. Until then, I guess we should pay some extra attention not to irreversibly destroy other species and cultures and whatnot, and generally be nice. I recommend the same for our potential AI conquerors, see more here.
- ^
There is a notion of selfishness that is not harmed by these arguments: selfishness as an extreme form of “nationalism”. That is, you are still a utilitarian, but you weigh the importance of experiences of beings by how similar they are to you. You care more about mammals than fish, more about humans than animals, more about your countrymen than foreigners, more about your friends who are very similar to you than about your average countrymen, and more about future versions of yourself (or copies of yourself in simulations) than about your friends. I think this type of selfishness is still coherent, and I in fact have something like this in my moral system, though I think that it’s bad to discount the beings more dissimilar to you too heavily. The type of selfishness that starts to fall apart in light of simulation arguments is when you say you care about “this me” and not the other copies.
- ^
I’m kind of sad about this actually. I think not only selfishness, but warmer notions of care that are not full longtermist utilitarianism, also become kind of incoherent.
- ^
Because I believe that “there are no probabilities, only expected values base on a utility function”, I believe that the solution to infinite ethics and infinite anthropics needs to be basically the same.
- ^
Yes, you can also decide to abandon utilitarianism, but I think Joe Carlsmith argues well that infinite ethics is a problem for everyone. And I’m not sure you can abandon anthropics, other than by not thinking about it, which I need to admit is often a great choice.
- ^
Obviously, this process is uncomputable, but maybe you can still approximate it somehow.
- ^
Can civilizations emerge in very low description complexity Turing machines? Can they access and manipulate low description complexity patterns within their Universe? How does the argument work, given that Solomonoff induction is actually uncomputable? I believe that all these questions are basically irrelevant, because the argument can just fall back to the simulators living in Universes similar to ours, just running more simulations. And the Oracle doesn’t need to run the actual Solomonoff induction, just make reasonable enough guesses (maybe based on a few simulations of its own) about what’s out there.
Thanks for writing this, I indeed felt that the arguments were significantly easier to follow than previous efforts.
I don’t believe that these anthropic considerations actually apply, either to us, to oracles, or to Solomonoff induction. The arguments are too informal, it’s very easy to miscalculate Kolmogorov complexities and the measures assigned by the universal distribution using intuitive gestures like this. However I do think that this is a correct generalization of the idea of a malign prior, and I actually appreciate that you wrote it up this way because it makes clear that none of the load-bearing parts of the argument actually rely on reliable calculations (invocations of algorithmic information theory concepts have no been reduced to rigorous math, so the original argument is not stronger than this one).